unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
@ 2022-09-10  4:32 Ihor Radchenko
  2022-09-10  4:39 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-09-10  4:32 UTC (permalink / raw)
  To: 57712

Hello,

Org mode is relying upon bibtex.el to parse BibTeX bibliographies.

Recently, we have got a bug report where curly braces are used inside
title field:

https://orgmode.org/list/CAF+0kSg8O3RQBG1wXoHjMEHwnGFz0gaDkTTSGv+ZaOt4d4myCA@mail.gmail.com

@InCollection{Geyer2011,
  author          = {Geyer, Charles J},
  title           = {{Introduction to Markov Chain Monte Carlo}},
                     ^                                        ^
  year            = 2011,
  booktitle       = {{Handbook of Markov Chain Monte Carlo}},
  editor          = {Brooks, Steve and Gelman, Andrew and Jones, Galin and Meng,
		  Xiao-Li},
  publisher       = {CRC press},
  pages           = 45,
}

The curly braces inside fields are a part of BibTeX specification:
http://www.bibtex.org/SpecialSymbols/

Yet, bibtex-parse-entry simply ignores all the curly braces inside the
field and returns the field as is:

(("=type=" . "InCollection")
... ("title" . "{Introduction to Mark\\.{o}v Chain Monte Carlo}") ...)

The same goes for the \.{o} special symbol.

It would be useful if bibtex-parse-entry could process the escapes when,
say, its first optional argument CONTENT is set to 'parse.

Would it be possible?

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-10  4:32 bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields? Ihor Radchenko
@ 2022-09-10  4:39 ` Lars Ingebrigtsen
  2022-09-10 16:07   ` Roland Winkler
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-10  4:39 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 57712, Roland Winkler

Ihor Radchenko <yantar92@gmail.com> writes:

> It would be useful if bibtex-parse-entry could process the escapes when,
> say, its first optional argument CONTENT is set to 'parse.
>
> Would it be possible?

Perhaps Roland has some comments; added to the CCs.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-10  4:39 ` Lars Ingebrigtsen
@ 2022-09-10 16:07   ` Roland Winkler
  2022-09-11  5:11     ` Roland Winkler
  2022-09-12  5:06     ` Ihor Radchenko
  0 siblings, 2 replies; 12+ messages in thread
From: Roland Winkler @ 2022-09-10 16:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57712, Ihor Radchenko

> It would be useful if bibtex-parse-entry could process the escapes when,
> say, its first optional argument CONTENT is set to 'parse.

I must be missing something.  With `emacs -q' (28.1) and the proposed
test case bibtex-parse-entry honors all braces.  Same with any other
entries I tested.  Try

M-: (bibtex-parse-entries)

Certainly, stripping off the braces would turn valid LaTeX code into
gibberish.

There is also bibtex-autokey-get-field that comes with an optional arg
CHANGE-LIST which is most often bibtex-autokey-transcriptions.  This
massages the field values in all kinds of ways.  But it serves an
entirely different purpose.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-10 16:07   ` Roland Winkler
@ 2022-09-11  5:11     ` Roland Winkler
  2022-09-12  5:20       ` Ihor Radchenko
  2022-09-12  5:06     ` Ihor Radchenko
  1 sibling, 1 reply; 12+ messages in thread
From: Roland Winkler @ 2022-09-11  5:11 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57712, Ihor Radchenko

On Sat, Sep 10 2022, Roland Winkler wrote:
> I must be missing something.

I looked at the original org bug report that triggered the present
report.  I guess that now I understand what the OP is concerned about.
I believe that bibtex-parse-entry is not the right place to try to fix
this.  The problem is to define what an optional arg CONTENT should do.

I do not know what the "basic org cite export processor" mentioned by
the OP is doing in detail.  But it reminds me of bibtex-summary which is
the default value of bibtex-summary-function.  This function generates a
"human-readable" summary of a BibTeX entry.  It uses the autokey
machinery of bibtex-mode that was originally developed for generating
keys for new BibTeX entries.  But this machinery can easily be "misused"
for things like bibtex-summary.  The point is that it offers rather many
options to customize what such a summary should look like.  Say, an
entry has 20 authors.  Do you want to display three or four authors?  Do
you want to just put dots after the third author or "et al."?  Should the
author(s) come first or should the title come first?  Should the year
appear before or after the title?  (Essentially, you can go through all
the questions relevant for BibTeX style files; but the autokey machinery
comes with the power of emacs :-)

So I suggest that the  "basic org cite export processor" could use
something similar to bibtex-summary.  But this should just be the
default value of something similar to bibtex-summary-function so that
users can customize this more easily.

(Letting users define their personal bibtex-summary function is probably
easier than trying to define some user variables that can control this.
There are just too many possibilities how one might want to customize
things.  My personal bibtex-summary function does a couple things that
nobody else might like, but they are important for me.)





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-10 16:07   ` Roland Winkler
  2022-09-11  5:11     ` Roland Winkler
@ 2022-09-12  5:06     ` Ihor Radchenko
  1 sibling, 0 replies; 12+ messages in thread
From: Ihor Radchenko @ 2022-09-12  5:06 UTC (permalink / raw)
  To: Roland Winkler; +Cc: 57712, Lars Ingebrigtsen

Roland Winkler <winkler@gnu.org> writes:

> M-: (bibtex-parse-entries)

Note that bibtex.el does not have such function.

> Certainly, stripping off the braces would turn valid LaTeX code into
> gibberish.

Note the BibTeX braces, escapes, special symbols, a subset of LaTeX
commands, and LaTeX math inside BibTeX entries are a part of BibTeX
syntax (http://www.bibtex.org/SpecialSymbols/). I expect bibtex.el to
understand that syntax and parse it, so that the user of bibtex.el does
not need to implement extra parsing on top.

I understand that the BibTeX syntax is fully compatible with LaTeX and
for the purposes of LaTeX processing, there is no need to parse the
BibTeX entry contents. However, BibTeX may be used outside LaTeX. Then,
it is reasonable to expect the parsing to be done inside bibtex.el.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-11  5:11     ` Roland Winkler
@ 2022-09-12  5:20       ` Ihor Radchenko
  2022-09-12 13:50         ` Roland Winkler
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-09-12  5:20 UTC (permalink / raw)
  To: Roland Winkler; +Cc: 57712, Lars Ingebrigtsen

Roland Winkler <winkler@gnu.org> writes:

> On Sat, Sep 10 2022, Roland Winkler wrote:
>> I must be missing something.
>
> I looked at the original org bug report that triggered the present
> report.  I guess that now I understand what the OP is concerned about.
> I believe that bibtex-parse-entry is not the right place to try to fix
> this.  The problem is to define what an optional arg CONTENT should do.

To clarify, I do not expect bibtex-parse-entry to strip the braces. What
I'd like to see is _parsing_ braces (say, as sexp) and other special
BibTeX syntax. At least, as long as appropriate option is passed to
bibtex-parse-entry.

What to do with the braces is another question and has nothing to do
with bibtex.el. It is to be decided by Org.

> I do not know what the "basic org cite export processor" mentioned by
> the OP is doing in detail.  But it reminds me of bibtex-summary which is
> the default value of bibtex-summary-function.  This function generates a
> "human-readable" summary of a BibTeX entry.  It uses the autokey
> machinery of bibtex-mode that was originally developed for generating
> keys for new BibTeX entries.  But this machinery can easily be "misused"
> for things like bibtex-summary.  The point is that it offers rather many
> options to customize what such a summary should look like.  Say, an
> entry has 20 authors.  Do you want to display three or four authors?  Do
> you want to just put dots after the third author or "et al."?  Should the
> author(s) come first or should the title come first?  Should the year
> appear before or after the title?  (Essentially, you can go through all
> the questions relevant for BibTeX style files; but the autokey machinery
> comes with the power of emacs :-)
>
> So I suggest that the  "basic org cite export processor" could use
> something similar to bibtex-summary.  But this should just be the
> default value of something similar to bibtex-summary-function so that
> users can customize this more easily.
>
> (Letting users define their personal bibtex-summary function is probably
> easier than trying to define some user variables that can control this.
> There are just too many possibilities how one might want to customize
> things.  My personal bibtex-summary function does a couple things that
> nobody else might like, but they are important for me.)

bibtex-summary approach might be an option, although it is clearly an
abuse and begs for future bugs.

However, there are at least several issues with, say,
`bibtex-autokey-get-title':
1. Its docstring does not clarify which variables affect the return
   value and does not even mention that the actual title contents is
   transformed in any way.
2. It simply strips the curly braces.
   `bibtex-autokey-titleword-case-convert' is applied regardless of the
   braces.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-12  5:20       ` Ihor Radchenko
@ 2022-09-12 13:50         ` Roland Winkler
  2022-09-13  2:34           ` Ihor Radchenko
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Winkler @ 2022-09-12 13:50 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 57712, Lars Ingebrigtsen

On Mon, Sep 12 2022, Ihor Radchenko wrote:
> To clarify, I do not expect bibtex-parse-entry to strip the braces. What
> I'd like to see is _parsing_ braces (say, as sexp) and other special
> BibTeX syntax. At least, as long as appropriate option is passed to
> bibtex-parse-entry.

Can you give some examples of what you believe bibtex-parse-entry should
do if it had an optional arg CONTENT?  What should it return instead of
what it returns without such an arg?

> bibtex-summary approach might be an option, although it is clearly an
> abuse and begs for future bugs.

My point is: the meaning of CONTENT may largely depend on what the
caller of bibtex-parse-entry wants to achieve.  What appears perfectly
reasonable from your perspective may be meaningless from another
perspective.  That's why the autokey machinery comes with lots of
options in terms of user variables, plus the option of letting the user
ignore all of this and define her own function (both for automatically
generating a key and for generating a summary for an entry). -- It's not
a perfect solution.  But it has worked well for many years.

A single arg CONTENT (trying to guess "do what I mean") cannot cover all
this in a satisfactory way.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-12 13:50         ` Roland Winkler
@ 2022-09-13  2:34           ` Ihor Radchenko
  2022-09-13  4:08             ` Roland Winkler
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-09-13  2:34 UTC (permalink / raw)
  To: Roland Winkler; +Cc: 57712, Lars Ingebrigtsen

Roland Winkler <winkler@gnu.org> writes:

> On Mon, Sep 12 2022, Ihor Radchenko wrote:
>> To clarify, I do not expect bibtex-parse-entry to strip the braces. What
>> I'd like to see is _parsing_ braces (say, as sexp) and other special
>> BibTeX syntax. At least, as long as appropriate option is passed to
>> bibtex-parse-entry.
>
> Can you give some examples of what you believe bibtex-parse-entry should
> do if it had an optional arg CONTENT?  What should it return instead of
> what it returns without such an arg?

Consider the following title:

  title           = {{Introduction $3^5$ to Mark\.{o}v Chain {MOnte} Carlo \LaTeX}},

(bibtex-parse-entry '(symbols braces mathmode latex strings))

will return

'("Introduction " (mathmode "$3^5$") " to Markȯv Chain " (braces "{MOnte}") "Carlo" (latex "\LaTeX"))

that is
1. Escaped symbols are replaced by their unicode
2. Braces are indicated by (braces "string")
3. LaTeX math is indicated by (mathmode "math string")
4. LaTeX commands are indicated by (latex "command")
5. @strings are replaced appropriately

>> bibtex-summary approach might be an option, although it is clearly an
>> abuse and begs for future bugs.
>
> My point is: the meaning of CONTENT may largely depend on what the
> caller of bibtex-parse-entry wants to achieve.  What appears perfectly
> reasonable from your perspective may be meaningless from another
> perspective.  That's why the autokey machinery comes with lots of
> options in terms of user variables, plus the option of letting the user
> ignore all of this and define her own function (both for automatically
> generating a key and for generating a summary for an entry). -- It's not
> a perfect solution.  But it has worked well for many years.
>
> A single arg CONTENT (trying to guess "do what I mean") cannot cover all
> this in a satisfactory way.

It can, if it is something like a plist. Note that I do not insist that
CONTENT value must be the only way to control the function behaviour.
Using let-bound variables is another valid option. But all the affecting
variables should be documented in the docstring then.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-13  2:34           ` Ihor Radchenko
@ 2022-09-13  4:08             ` Roland Winkler
  2022-09-14  2:07               ` Ihor Radchenko
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Winkler @ 2022-09-13  4:08 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 57712, Lars Ingebrigtsen

On Tue, Sep 13 2022, Ihor Radchenko wrote:
>   title = {{Introduction $3^5$ to Mark\.{o}v Chain {MOnte} Carlo
> \LaTeX}},
>
> (bibtex-parse-entry '(symbols braces mathmode latex strings))
>
> will return
>
> '("Introduction " (mathmode "$3^5$") " to Markȯv Chain " (braces
> "{MOnte}") "Carlo" (latex "\LaTeX"))
>
> that is
> 1. Escaped symbols are replaced by their unicode
> 2. Braces are indicated by (braces "string")
> 3. LaTeX math is indicated by (mathmode "math string")
> 4. LaTeX commands are indicated by (latex "command")
> 5. @strings are replaced appropriately

I believe this is much beyond BibTeX mode and yet more beyond
bibtex-parse-entry.  It is mostly about parsing LaTeX while BibTeX plays
only a marginal role (adding support for @string's will be cheap once you
get the rest of this parser working).  Then, you also need to deal with
the question: what do you want to do with a return value you illustrated
above?  From what I vaguely understood, your real goal is to convert
this into something human-readable.

This is a fairly substantial project.  I am not sure whether it would be
worth the effort.  And I would not want to bury such a pretty complex
machinery in a new optional arg of a function that is intended to do
something very different.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-13  4:08             ` Roland Winkler
@ 2022-09-14  2:07               ` Ihor Radchenko
  2022-09-14 17:02                 ` Roland Winkler
  0 siblings, 1 reply; 12+ messages in thread
From: Ihor Radchenko @ 2022-09-14  2:07 UTC (permalink / raw)
  To: Roland Winkler; +Cc: 57712, Lars Ingebrigtsen

Roland Winkler <winkler@gnu.org> writes:

>> will return
>>
>> '("Introduction " (mathmode "$3^5$") " to Markȯv Chain " (braces
>> "{MOnte}") "Carlo" (latex "\LaTeX"))
>>
>> that is
>> 1. Escaped symbols are replaced by their unicode
>> 2. Braces are indicated by (braces "string")
>> 3. LaTeX math is indicated by (mathmode "math string")
>> 4. LaTeX commands are indicated by (latex "command")
>> 5. @strings are replaced appropriately
>
> I believe this is much beyond BibTeX mode and yet more beyond
> bibtex-parse-entry.  It is mostly about parsing LaTeX while BibTeX plays
> only a marginal role (adding support for @string's will be cheap once you
> get the rest of this parser working).  Then, you also need to deal with
> the question: what do you want to do with a return value you illustrated
> above?  From what I vaguely understood, your real goal is to convert
> this into something human-readable.

> This is a fairly substantial project.  I am not sure whether it would be
> worth the effort.  And I would not want to bury such a pretty complex
> machinery in a new optional arg of a function that is intended to do
> something very different.

I am looking at this differently. Similar to BibTeX fields, text in the
fields is a subject of a specific format. That format is _not_ exactly
the same with TeX (e.g. see
https://tex.stackexchange.com/questions/26338/how-to-code-%C3%9F-german-sharp-s-in-bibtex)

I expect bibtex.el to handle all the peculiarities of BibTeX format, so
that external packages do not need to perform extra parsing.

If you dislike modifying bibtex-parse-entry, bibtex-parse-field-text
looks like a reasonable place to handle field text parsing.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-14  2:07               ` Ihor Radchenko
@ 2022-09-14 17:02                 ` Roland Winkler
  2022-12-30  6:39                   ` Roland Winkler
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Winkler @ 2022-09-14 17:02 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 57712, Lars Ingebrigtsen

On Wed, Sep 14 2022, Ihor Radchenko wrote:
> I am looking at this differently. Similar to BibTeX fields, text in the
> fields is a subject of a specific format. That format is _not_ exactly
> the same with TeX (e.g. see
> https://tex.stackexchange.com/questions/26338/how-to-code-%C3%9F-german-sharp-s-in-bibtex)

No, the problem discussed there exists exactly the same way within any
LaTeX document.  I deal with this frequently, both in the context of
LaTeX and in the context of BibTeX.  The content of BibTeX fields must
always be valid from LaTeX's perspective that will digest them.

(BibTeX never checks itself whether the user made an error from
LaTeX's perspective.  BibTeX generates bbl files that are then processed
by LaTeX.  LaTeX will choke over malformed BibTeX fields the same way it
will choke over invalid constructs in any LaTeX document.)

> I expect bibtex.el to handle all the peculiarities of BibTeX format, so
> that external packages do not need to perform extra parsing.

There are only two issues beyond the oddities of LaTeX itself

- BibTeX string constants

- BibTeX crossref'ed entries when a field is absent in an entry because
  it is present in a "parent" entry.

(There is also the odd behavior that standard BibTeX style files want to
downcase the content of the BibTeX "title" field, and this can be
suppressed by putting curly braces around the characters.  But the way
this works is that these braces are preserved in the bbl file generated
by BibTeX, and LaTeX will simply ignore them.  The braces do not violate
the rule that the content of BibTeX fields must always be valid from
LaTeX's perspective.)

> If you dislike modifying bibtex-parse-entry, bibtex-parse-field-text
> looks like a reasonable place to handle field text parsing.

Both BibTeX string constants and BibTeX crossref'ed entries are handled
by bibtex-text-in-field.  (As we discussed previously, expanding string
constants requires bibtex-expand-strings to be non-nil.)

Then bibtex-text-in-field always returns valid LaTeX code (provided the
user did not make any LaTeX mistakes in her fields, see above).

As I said before: if this doesn't fit your needs for org mode, I suggest
you develop a LaTeX parser that can process LaTeX code according to your
needs.  Then you can feed it with any valid LaTeX code including the
return values of bibtex-text-in-field.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields?
  2022-09-14 17:02                 ` Roland Winkler
@ 2022-12-30  6:39                   ` Roland Winkler
  0 siblings, 0 replies; 12+ messages in thread
From: Roland Winkler @ 2022-12-30  6:39 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Lars Ingebrigtsen, 57712-done

On Wed, Sep 14 2022, Roland Winkler wrote:
> As I said before: if this doesn't fit your needs for org mode, I
> suggest you develop a LaTeX parser that can process LaTeX code
> according to your needs.  Then you can feed it with any valid LaTeX
> code including the return values of bibtex-text-in-field.

Closing.





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-12-30  6:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-10  4:32 bug#57712: 29.0.50; bibtex.el: Should `bibtex-parse-entry' handle curly braces inside fields? Ihor Radchenko
2022-09-10  4:39 ` Lars Ingebrigtsen
2022-09-10 16:07   ` Roland Winkler
2022-09-11  5:11     ` Roland Winkler
2022-09-12  5:20       ` Ihor Radchenko
2022-09-12 13:50         ` Roland Winkler
2022-09-13  2:34           ` Ihor Radchenko
2022-09-13  4:08             ` Roland Winkler
2022-09-14  2:07               ` Ihor Radchenko
2022-09-14 17:02                 ` Roland Winkler
2022-12-30  6:39                   ` Roland Winkler
2022-09-12  5:06     ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).