BibTeX issues

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* BibTeX issues
@ 2019-08-27  8:40 Joost Kremers
  2019-08-28 17:45 ` Roland Winkler
  0 siblings, 1 reply; 16+ messages in thread
From: Joost Kremers @ 2019-08-27  8:40 UTC (permalink / raw)
  To: emacs-devel

Hi all,

I'm running into some issues with bibtex.el (specifically 
`bibtex-generate-autokey`), but before I report them as bugs, I 
wanted to ask about them here first, mainly because I'm not sure 
whether to report them as one single bug or as separate bugs.

First, `bibtex-generate-autokey` does not strip accents from 
characters when creating a key. So if you have an author 
`Fernández`, the key will contain the á. This is not a problem if 
you use XeLaTeX or LuaLaTeX as LaTeX engine, but it is if you use 
pdflatex.

I know that stripping accents is more easily said than done, so 
perhaps this is not really a bug at all but intended behaviour, 
which I would understand.

The other issues seem to be real problems though:

First, the date field does not seem to be recognised at all. In 
biblatex, the date field replaces the year field, in that it is 
considered the preferred way of providing the year of publication 
for an entry. However, even with `bibtex-dialect` set to 
`biblatex`, only the year field is considered for the year part of 
the autogenerated key.

Second, it isn't clear to me how `bibtex-generate-autokey` handles 
macros in titles, specifically \emph. For example, the following 
two entries seem to be handled differently. First:

@InCollection{arévalo13:_pa_marion,
  chapter =	 2,
  pages =	 {49--86},
  subtitle =	 {La Cumbiamba Eneyé Returns to San Jacinto},
  crossref =	 {fernández13:_cumbia},
  title =	 {\emph{¿Pa' dónde vas Marioneta? ¿Pa' dónde va la
                  gaita?}},
  author =	 {Arévalo Mateus, Jorge and Martín Vejarano}
}

Here, `bibtex-generate-autokey` does seem to read the contents of 
\emph, since the autogenerated key contains "pa_marion", which is 
taken from the title. But consider the following entry:

@InCollection{alarcón13,
  year =	 2013,
  crossref =	 {fernández13:_cumbia},
  pages =	 {213--225},
  chapter =	 9,
  title =	 {\emph{Feliz}, \emph{feliz}},
  author =	 {Cristian Alarcón}
}

Here, the autogenerated key does not contain any title part. 
Reducing the two \emph's to one does not change it, but deleting 
it (so that the title is just {Feliz, feliz}), does. Then the 
autogenerated key is "alarcón13:_feliz".

Last, but certainly not least, doing `bibtex-clean-entry` in an 
entry with a valid `crossref' field doesn't seem to work. Instead, 
I get the following error:

bibtex-format-entry: Alternative mandatory field ‘(date year)’ is 
missing

Obviously, I made sure the cross-referenced entry does have a year 
field.

I've included the file test.bib below. Note that the first three 
entries cross-reference the fourth entry, but they contain a year 
field nonetheless, because, as just mentioned, 
`bibtex-cleanup-entry' won't work without it.

All this was tested on Emacs 26.1.91 with `emacs -Q'.

Thank for any comments, suggestions,

Joost

========================================

@InCollection{damico13:_cumbia_music_colom,
  year =	 2013,
  chapter =	 1,
  subtitle =	 {Origins, Transformations, and Evolution of a 
  Coastal
                  Music Genre},
  pages =	 {29--48},
  crossref =	 {fernández13:_cumbia},
  title =	 {Cumbia Music in Colombia},
  author =	 {Leonardo D'Amico}
}

@InCollection{arévalo13:_pa_marion,
  chapter =	 2,
  pages =	 {49--86},
  subtitle =	 {La Cumbiamba Eneyé Returns to San Jacinto},
  crossref =	 {fernández13:_cumbia},
  title =	 {\emph{¿Pa' dónde vas Marioneta? ¿Pa' dónde va la
                  gaita?}},
  author =	 {Arévalo Mateus, Jorge and Martín Vejarano}
}

@InCollection{alarcón13,
  year =	 2013,
  crossref =	 {fernández13:_cumbia},
  pages =	 {213--225},
  chapter =	 9,
  title =	 {\emph{Feliz}, \emph{feliz}},
  author =	 {Cristian Alarcón}
}

@Collection{fernández13:_cumbia,
  editor =	 {Fernández L'Hoeste, Héctor and Pablo Vila},
  title =	 {Cumbia!},
  year =	 2013,
  subtitle =	 {scenes of a migrant Latin American music genre},
  publisher =	 {Duke University Press},
  location =	 {Durham},
  isbn =	 {978-0-8223-5433-8},
  pagetotal =	 312
}

@Comment Local Variables:
@Comment bibtex-dialect: biblatex
@Comment End:

========================================

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BibTeX issues
  2019-08-27  8:40 BibTeX issues Joost Kremers
@ 2019-08-28 17:45 ` Roland Winkler
  2019-08-28 18:45   ` Eli Zaretskii
  2019-08-29  7:49   ` BibTeX issues Joost Kremers
  0 siblings, 2 replies; 16+ messages in thread
From: Roland Winkler @ 2019-08-28 17:45 UTC (permalink / raw)
  To: emacs-devel

On Tue, Aug 27 2019, Joost Kremers wrote:
> I know that stripping accents is more easily said than done, so
> perhaps this is not really a bug at all but intended behaviour,
> which I would understand.

Stripping accents is really not a matter specific to BibTeX.
(I vaguely remember there was a thread on this list discussing this
topic some time ago.)

If there was a generic function strip-accents, then BibTeX mode could
certainly use it within its bibtex-generate-autokey machinery.

> First, the date field does not seem to be recognised at all. In
> biblatex, the date field replaces the year field, in that it is
> considered the preferred way of providing the year of publication
> for an entry.

How about allowing the possibility that the first arg FIELD of
bibtex-autokey-get-field can also be a list of fields so that the
elements can be treated as alternatives?  Assuming that a bib(la)tex
entry has either a year or a date field, then bibtex-autokey-get-year
could use one or the other.

If you really want your own thing, you can also have a custom
bibtex-autokey-before-presentation-function that ignores its arg.

> Second, it isn't clear to me how `bibtex-generate-autokey` handles
> macros in titles, specifically \emph.

I never had such a problem.  Details probably depend on your use cases.
A generic parser for LaTeX code that can drop such things is probably
not all trivial.  (But maybe something of that kind exists alread (at
some level) for auctex or org mode or some other package?)

Also, you can always customize bibtex-autokey-titleword-change-strings.

> Last, but certainly not least, doing `bibtex-clean-entry` in an
> entry with a valid `crossref' field doesn't seem to work. Instead, I
> get the following error:
>
> bibtex-format-entry: Alternative mandatory field ‘(date year)’ is
> missing

I am not a biblatex expert.  Since BibTeX mode picked up biblatex
support in 2013, it has treated the alternative fields date and year as
mandatory, see the default of bibtex-biblatex-entry-alist.  Do you say
that these fields should be treated as crossref fields instead?

Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BibTeX issues
  2019-08-28 17:45 ` Roland Winkler
@ 2019-08-28 18:45   ` Eli Zaretskii
  2019-08-29  3:26     ` strip accents and sorting [was: BibTeX issues] Roland Winkler
  2019-08-29  7:49   ` BibTeX issues Joost Kremers
  1 sibling, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-28 18:45 UTC (permalink / raw)
  To: Roland Winkler; +Cc: emacs-devel

> From: Roland Winkler <winkler@gnu.org>
> Date: Wed, 28 Aug 2019 12:45:33 -0500
> 
> On Tue, Aug 27 2019, Joost Kremers wrote:
> > I know that stripping accents is more easily said than done, so
> > perhaps this is not really a bug at all but intended behaviour,
> > which I would understand.
> 
> Stripping accents is really not a matter specific to BibTeX.
> (I vaguely remember there was a thread on this list discussing this
> topic some time ago.)
> 
> If there was a generic function strip-accents, then BibTeX mode could
> certainly use it within its bibtex-generate-autokey machinery.

I don't think we have such a function, but it shouldn't be hard to
write one, using the facilities in ucs-normalize.el.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* strip accents and sorting [was: BibTeX issues]
  2019-08-28 18:45   ` Eli Zaretskii
@ 2019-08-29  3:26     ` Roland Winkler
  2019-08-29  6:15       ` martin rudalics
  2019-08-29  7:10       ` Eli Zaretskii
  0 siblings, 2 replies; 16+ messages in thread
From: Roland Winkler @ 2019-08-29  3:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Wed Aug 28 2019 Eli Zaretskii wrote:
> > From: Roland Winkler <winkler@gnu.org>
> > If there was a generic function strip-accents, then BibTeX mode could
> > certainly use it within its bibtex-generate-autokey machinery.
> 
> I don't think we have such a function, but it shouldn't be hard to
> write one, using the facilities in ucs-normalize.el.

Interesting! What are the intended use cases for ucs-normalize.el
and the algorithms that it implements?

I had never much thought about this.  But there is obviously a
problem when one tries to sort a database where the keys may contain
more fancy utf characters. (This problem must be well-known in the
utf world).  Naivly one might hope that the following lines are
properly sorted according to string-lessp

  ä-combine
  ä-umlaut
  ö-combine
  ö-umlaut

But (string-lessp "ä-umlaut" "ö-combine") gives nil so that sort-lines gives

  ä-combine
  ö-combine
  ä-umlaut
  ö-umlaut

Of course, this is due to the fact that a German umlaut can be
represented with its own character or with a combining diaeresis.
These two ways of presenting an umlaut look the same, but they are
not the same for string-lessp.

This can be particularly annoying when a database (be it BibTeX,
BBDB, or whatever) is often enough populated by copying records from
different sources that may represent such fancy utf characters in
different ways.

Now, one solution would be to simply strip off the combining
characters by decomposing the characters.  Or is there a possibility
to teach a sorting algorithm that the first letter of ä-combine is
"the same" as the first letter of ä-umlaut and all this should
appear near a-plain instead of past o-plain?

Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-29  3:26     ` strip accents and sorting [was: BibTeX issues] Roland Winkler
@ 2019-08-29  6:15       ` martin rudalics
  2019-08-30 16:27         ` Roland Winkler
  2019-08-29  7:10       ` Eli Zaretskii
  1 sibling, 1 reply; 16+ messages in thread
From: martin rudalics @ 2019-08-29  6:15 UTC (permalink / raw)
  To: Roland Winkler, Eli Zaretskii; +Cc: emacs-devel

 > But (string-lessp "ä-umlaut" "ö-combine") gives nil

But (string-collate-lessp "ä-umlaut" "ö-combine") gives t
so it should be fairly easy to fix `sort-lines' and friends
accordingly.

martin




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-29  3:26     ` strip accents and sorting [was: BibTeX issues] Roland Winkler
  2019-08-29  6:15       ` martin rudalics
@ 2019-08-29  7:10       ` Eli Zaretskii
  2019-08-30 16:29         ` Roland Winkler
  1 sibling, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-29  7:10 UTC (permalink / raw)
  To: Roland Winkler; +Cc: emacs-devel

> Date: Wed, 28 Aug 2019 22:26:38 -0500
> From: "Roland Winkler" <winkler@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> On Wed Aug 28 2019 Eli Zaretskii wrote:
> > > From: Roland Winkler <winkler@gnu.org>
> > > If there was a generic function strip-accents, then BibTeX mode could
> > > certainly use it within its bibtex-generate-autokey machinery.
> > 
> > I don't think we have such a function, but it shouldn't be hard to
> > write one, using the facilities in ucs-normalize.el.
> 
> Interesting! What are the intended use cases for ucs-normalize.el
> and the algorithms that it implements?

To implement the functionalities described in UAX#15 Unicode
Normalization Forms (http://www.unicode.org/reports/tr15/).  We
already use some of that in implementing the utf8-hfs file-name
encoding (used by macOS).

> I had never much thought about this.  But there is obviously a
> problem when one tries to sort a database where the keys may contain
> more fancy utf characters. (This problem must be well-known in the
> utf world).  Naivly one might hope that the following lines are
> properly sorted according to string-lessp

As Martin points out, you should use string-collate-lessp instead for
these use cases.

> Of course, this is due to the fact that a German umlaut can be
> represented with its own character or with a combining diaeresis.
> These two ways of presenting an umlaut look the same, but they are
> not the same for string-lessp.

The Unicode Standard mandates that they be handled identically,
including in searching and sorting.  We don't yet implement that 100%,
but see char-fold.el for a partial (and not very efficient)
implementation during search.

> Now, one solution would be to simply strip off the combining
> characters by decomposing the characters.  Or is there a possibility
> to teach a sorting algorithm that the first letter of ä-combine is
> "the same" as the first letter of ä-umlaut and all this should
> appear near a-plain instead of past o-plain?

Both should be possible.  To entirely strip the combining accents, you
can use ucs-normalize, and then filter out all characters whose
canonical combining class is non-zero.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BibTeX issues
  2019-08-28 17:45 ` Roland Winkler
  2019-08-28 18:45   ` Eli Zaretskii
@ 2019-08-29  7:49   ` Joost Kremers
  2019-08-30 19:18     ` Roland Winkler
  1 sibling, 1 reply; 16+ messages in thread
From: Joost Kremers @ 2019-08-29  7:49 UTC (permalink / raw)
  To: emacs-devel

Hi Roland,

First, thanks for your answer. Perhaps I should have given some 
background in my previous mail: I maintain a package called Ebib, 
which implements a BibTeX database manager for Ebib. It uses some 
of the functionality of bibtex.el (specifically, the entry type 
definitions and the autokey machinery), which is how I (or rather, 
a user of Ebib) ran into these issues.

On Wed, Aug 28 2019, Roland Winkler wrote:
> On Tue, Aug 27 2019, Joost Kremers wrote:
>> First, the date field does not seem to be recognised at all. In
>> biblatex, the date field replaces the year field, in that it is
>> considered the preferred way of providing the year of 
>> publication
>> for an entry.
>
> How about allowing the possibility that the first arg FIELD of
> bibtex-autokey-get-field can also be a list of fields so that 
> the
> elements can be treated as alternatives?  Assuming that a 
> bib(la)tex
> entry has either a year or a date field, then 
> bibtex-autokey-get-year
> could use one or the other.

Yes, that would be great. Biblatex requires that either date or 
year be present, so it's a safe assumption that one of them will 
be. Biblatex favours date over year, so I'd suggest date be 
checked first.

One thing to keep in mind is that the date field can contain a 
full date + time, not just a year, and even date ranges, so in 
order to produce a year part for the autokey, the date field needs 
to be parsed. This shouldn't be too difficult, though, since the 
format of the date field is clearly defined. (The biblatex doc has 
all the details.)

>> Second, it isn't clear to me how `bibtex-generate-autokey` 
>> handles
>> macros in titles, specifically \emph.
>
> I never had such a problem.  Details probably depend on your use 
> cases.
> A generic parser for LaTeX code that can drop such things is 
> probably
> not all trivial.  (But maybe something of that kind exists 
> alread (at
> some level) for auctex or org mode or some other package?)

I don't know about AUCTeX or Org mode (perhaps org-ref has 
something), but I do something like this in Ebib: In order to 
display the title in a user-friendly manner, Ebib strips all TeX 
commands from a title, leaving only the obligatory argument. (It 
also does some fontification, BTW.) It works well enough for my 
use-case, but it'll break in more complicated cases; e.g., it 
doesn't take into account multiple obligatory arguments, and it 
doesn't handle extensions to the default LaTeX syntax that some 
packages (most notably biblatex...) implement, such as arguments 
delimited with parentheses or pointy brackets, and optional 
commands in between obligatory ones.

>> Last, but certainly not least, doing `bibtex-clean-entry` in an
>> entry with a valid `crossref' field doesn't seem to work. 
>> Instead, I
>> get the following error:
>>
>> bibtex-format-entry: Alternative mandatory field ‘(date year)’ 
>> is
>> missing
>
> I am not a biblatex expert.  Since BibTeX mode picked up 
> biblatex
> support in 2013, it has treated the alternative fields date and 
> year as
> mandatory, see the default of bibtex-biblatex-entry-alist.  Do 
> you say
> that these fields should be treated as crossref fields instead?

Yes. In fact, both the BibTeX and the biblatex documentation state 
that *all* fields are inherited if they are present in the parent 
and not in the child. (With biblatex, this is a customisable 
option, but it's on by default.) Obviously, for fields such as 
author and title, inheriting them doesn't make much sense, but for 
year and date it usually does.

BTW, `bibtex-generate-autokey` does in fact treat the year field 
as inheritable. It's `bibtex-clean-entry` that protests when the 
year field is missing.

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-29  6:15       ` martin rudalics
@ 2019-08-30 16:27         ` Roland Winkler
  2019-08-30 17:51           ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Winkler @ 2019-08-30 16:27 UTC (permalink / raw)
  To: martin rudalics; +Cc: Eli Zaretskii, emacs-devel

On Thu Aug 29 2019 martin rudalics wrote:
>  > But (string-lessp "ä-umlaut" "ö-combine") gives nil
> 
> But (string-collate-lessp "ä-umlaut" "ö-combine") gives t

...not for me, which is likely due to my locale LC_COLLATE=C

I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
call of string-collate-lessp yields t.  But this also implies case
folding and ignoring dots in directory listings, which is not what I
want.  In other words, these locales have too many features bundled
together.

Maybe these feature sets of different locales are documented
*somewhere* in a neat way, and there is a locale with a feature set
that does exactly what I want.  But to the best of my knowledge this
documentation resides outside emacs so that things get rather
complicated when this affects an emacs session in important or
possibly subtle ways.

> so it should be fairly easy to fix `sort-lines' and friends
> accordingly.

In that sense I am not sure I would like to see `sort-lines' and
friends be fixed "accordingly".  If at all, I'd vote for a user
option that likely I'd use to disable such things.

On the other hand, as Eli pointed out in his reply about accented
characters being represented via a single character as compared to
using combining characters

> The Unicode Standard mandates that they be handled identically,
> including in searching and sorting.  We don't yet implement that
> 100%, but see char-fold.el for a partial (and not very efficient)
> implementation during search.

So I would assume that the locale should not matter at all in the
context of unicode combining characters. (Or there should be a way
to control exactly this aspect of unicode combining characters with
no additional (mis)features bundled with it.)

I understand that it is a different matter how accented characters
are sorted relative to each other and also relative to un-accented
characters.  So it can make a lot of sense to have different locales
for that aspect.

Maybe I am missing something here.  (And I have not yet looked in
more detail at char-fold.el mentioned by Eli, which could be a
better way to go within the emacs world.)

Roland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-29  7:10       ` Eli Zaretskii
@ 2019-08-30 16:29         ` Roland Winkler
  0 siblings, 0 replies; 16+ messages in thread
From: Roland Winkler @ 2019-08-30 16:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On Thu Aug 29 2019 Eli Zaretskii wrote:
> > From: "Roland Winkler" <winkler@gnu.org>
> > Now, one solution would be to simply strip off the combining
> > characters by decomposing the characters.  Or is there a possibility
> > to teach a sorting algorithm that the first letter of ä-combine is
> > "the same" as the first letter of ä-umlaut and all this should
> > appear near a-plain instead of past o-plain?
> 
> Both should be possible.  To entirely strip the combining accents, you
> can use ucs-normalize, and then filter out all characters whose
> canonical combining class is non-zero.

Thanks, I need to look at this more carefully.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 16:27         ` Roland Winkler
@ 2019-08-30 17:51           ` Eli Zaretskii
  2019-08-30 18:38             ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-30 17:51 UTC (permalink / raw)
  To: Roland Winkler; +Cc: rudalics, emacs-devel

> Date: Fri, 30 Aug 2019 11:27:33 -0500
> From: "Roland Winkler" <winkler@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,
>     emacs-devel@gnu.org
> 
> > But (string-collate-lessp "ä-umlaut" "ö-combine") gives t
> 
> ...not for me, which is likely due to my locale LC_COLLATE=C
> 
> I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
> call of string-collate-lessp yields t.  But this also implies case
> folding and ignoring dots in directory listings, which is not what I
> want.  In other words, these locales have too many features bundled
> together.

You could set LC_COLLATE=en_US.utf8 inside Emacs, or even bind it
around the call to string-collate-lessp.  I think we support that on
GNU/Linux.

> > The Unicode Standard mandates that they be handled identically,
> > including in searching and sorting.  We don't yet implement that
> > 100%, but see char-fold.el for a partial (and not very efficient)
> > implementation during search.
> 
> So I would assume that the locale should not matter at all in the
> context of unicode combining characters.

Not entirely true, as some aspects of this equivalence can be
locale-dependent.  See UAX#10 (http://www.unicode.org/reports/tr10/)
for more about that.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 17:51           ` Eli Zaretskii
@ 2019-08-30 18:38             ` Eli Zaretskii
  2019-08-30 19:09               ` Roland Winkler
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-30 18:38 UTC (permalink / raw)
  To: winkler; +Cc: rudalics, emacs-devel

> Date: Fri, 30 Aug 2019 20:51:32 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: rudalics@gmx.at, emacs-devel@gnu.org
> 
> > Date: Fri, 30 Aug 2019 11:27:33 -0500
> > From: "Roland Winkler" <winkler@gnu.org>
> > Cc: Eli Zaretskii <eliz@gnu.org>,
> >     emacs-devel@gnu.org
> > 
> > > But (string-collate-lessp "ä-umlaut" "ö-combine") gives t
> > 
> > ...not for me, which is likely due to my locale LC_COLLATE=C
> > 
> > I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
> > call of string-collate-lessp yields t.  But this also implies case
> > folding and ignoring dots in directory listings, which is not what I
> > want.  In other words, these locales have too many features bundled
> > together.
> 
> You could set LC_COLLATE=en_US.utf8 inside Emacs, or even bind it
> around the call to string-collate-lessp.  I think we support that on
> GNU/Linux.

Actually, string-collate-lessp accepts an optional argument LOCALE
that can be used for that.  So it's even easier than I remembered.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 18:38             ` Eli Zaretskii
@ 2019-08-30 19:09               ` Roland Winkler
  2019-08-30 19:19                 ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Winkler @ 2019-08-30 19:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rudalics, emacs-devel

On Fri Aug 30 2019 Eli Zaretskii wrote:
> > You could set LC_COLLATE=en_US.utf8 inside Emacs, or even bind it
> > around the call to string-collate-lessp.  I think we support that on
> > GNU/Linux.
> 
> Actually, string-collate-lessp accepts an optional argument LOCALE
> that can be used for that.  So it's even easier than I remembered.

Thanks!  Unfortunately, string-collate-lessp with locale en_US.utf8
folds case,

(sort '("b" "A" "B" "a")
      (lambda (s1 s2) (string-collate-lessp s1 s2 "en_US.utf8")))
      ⇒ ("a" "A" "b" "B")

whereas

(sort '("b" "A" "B" "a")
      (lambda (s1 s2) (string-collate-lessp s1 s2 "C")))
      ⇒ ("A" "B" "a" "b")

though in both cases the optional arg IGNORE-CASE of
string-collate-lessp is nil.  (I guess this is not a bug of
string-collate-lessp, but it is an intended "feature" of the locale
en_US.utf8.)

Similarly, the locale en_US.utf8 ignores dots "." which for my taste
bundles too many features.  (Does anybody know where the feature
bundles of different locales are described?  So far, I have not
found anything.)

But something like bibtex-mode could introduce a new user option
bibtex-sort-locale that is used as optional arg when sorting BibTeX
records with string-collate-lessp.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BibTeX issues
  2019-08-29  7:49   ` BibTeX issues Joost Kremers
@ 2019-08-30 19:18     ` Roland Winkler
  0 siblings, 0 replies; 16+ messages in thread
From: Roland Winkler @ 2019-08-30 19:18 UTC (permalink / raw)
  To: Joost Kremers; +Cc: emacs-devel

On Thu, Aug 29 2019, Joost Kremers wrote:
> On Wed, Aug 28 2019, Roland Winkler wrote:
>> How about allowing the possibility that the first arg FIELD of
>> bibtex-autokey-get-field can also be a list of fields so that the
>> elements can be treated as alternatives?  Assuming that a bib(la)tex
>> entry has either a year or a date field, then bibtex-autokey-get-year
>> could use one or the other.
>
> Yes, that would be great. Biblatex requires that either date or year
> be present, so it's a safe assumption that one of them will be.

I am currently playing with this.

> In order to display the title in a user-friendly manner, Ebib strips
> all TeX commands from a title, leaving only the obligatory argument.

A simple scheme of that kind can probably be added to
bibtex-autokey-transcriptions, though false-positives could be annoying.

> BTW, `bibtex-generate-autokey` does in fact treat the year field as
> inheritable. It's `bibtex-clean-entry` that protests when the year
> field is missing.

`bibtex-clean-entry' is for testing whether an entry has the proper
format or not and for protesting if it believes there is a mistake.
But `bibtex-generate-autokey' is for, well, auto-generating a key,
assuming that the record is what the user wants / needs.  So I think
this behavior is intentional.

I noticed that the year/date alternative becomes a bit clumsy when it is
downgraded from "mandatory" to "crossref" in bibtex-biblatex-entry-alist
because bibtex-make-optional-field will give them both an ALT and OPT prefix

  ALTOPTyear =   {},
  ALTOPTdate =   {},

(which is of course in line with the usual behavior of bibtex-mode).
Also, this may break some code of bibtex-mode that assumes historically
that fields have either the ALT or OPT prefix.  Possibly, one can add a
user option not to insert any such prefix, which, I believe, is not
needed by `bibtex-clean-entry' either.  I need to check this more
carefully.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 19:09               ` Roland Winkler
@ 2019-08-30 19:19                 ` Eli Zaretskii
  2019-08-30 19:49                   ` Roland Winkler
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-30 19:19 UTC (permalink / raw)
  To: Roland Winkler; +Cc: rudalics, emacs-devel

> Date: Fri, 30 Aug 2019 14:09:47 -0500
> From: "Roland Winkler" <winkler@gnu.org>
> Cc: rudalics@gmx.at,
>     emacs-devel@gnu.org
> 
> Similarly, the locale en_US.utf8 ignores dots "." which for my taste
> bundles too many features.  (Does anybody know where the feature
> bundles of different locales are described?  So far, I have not
> found anything.)

I think it comes from CLDR, the Unicode Common Locale Data
Repository.  See UTS#10 to which I already pointed.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 19:19                 ` Eli Zaretskii
@ 2019-08-30 19:49                   ` Roland Winkler
  2019-08-31  6:45                     ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Winkler @ 2019-08-30 19:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rudalics, emacs-devel

On Fri Aug 30 2019 Eli Zaretskii wrote:
> I think it comes from CLDR, the Unicode Common Locale Data
> Repository.  See UTS#10 to which I already pointed.

Thanks.  I have just downloaded from github the latest version of
the Unicode Common Locale Data Repository.  Unfortunately, it seems
that this stuff is "for experts only".  I have no idea how ordinary
users can learn anything useful about what different locales are
doing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: strip accents and sorting [was: BibTeX issues]
  2019-08-30 19:49                   ` Roland Winkler
@ 2019-08-31  6:45                     ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2019-08-31  6:45 UTC (permalink / raw)
  To: Roland Winkler; +Cc: rudalics, emacs-devel

> Date: Fri, 30 Aug 2019 14:49:45 -0500
> From: "Roland Winkler" <winkler@gnu.org>
> Cc: rudalics@gmx.at,
>     emacs-devel@gnu.org
> 
> On Fri Aug 30 2019 Eli Zaretskii wrote:
> > I think it comes from CLDR, the Unicode Common Locale Data
> > Repository.  See UTS#10 to which I already pointed.
> 
> Thanks.  I have just downloaded from github the latest version of
> the Unicode Common Locale Data Repository.  Unfortunately, it seems
> that this stuff is "for experts only".  I have no idea how ordinary
> users can learn anything useful about what different locales are
> doing.

It comes with documentation.  But yes, it isn't for the faint at
heart.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-08-31  6:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27  8:40 BibTeX issues Joost Kremers
2019-08-28 17:45 ` Roland Winkler
2019-08-28 18:45   ` Eli Zaretskii
2019-08-29  3:26     ` strip accents and sorting [was: BibTeX issues] Roland Winkler
2019-08-29  6:15       ` martin rudalics
2019-08-30 16:27         ` Roland Winkler
2019-08-30 17:51           ` Eli Zaretskii
2019-08-30 18:38             ` Eli Zaretskii
2019-08-30 19:09               ` Roland Winkler
2019-08-30 19:19                 ` Eli Zaretskii
2019-08-30 19:49                   ` Roland Winkler
2019-08-31  6:45                     ` Eli Zaretskii
2019-08-29  7:10       ` Eli Zaretskii
2019-08-30 16:29         ` Roland Winkler
2019-08-29  7:49   ` BibTeX issues Joost Kremers
2019-08-30 19:18     ` Roland Winkler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).