unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
@ 2024-02-19  9:14 Ihor Radchenko
  2024-02-23 12:07 ` Arash Esbati
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-02-19  9:14 UTC (permalink / raw)
  To: 69266

Consider the following bibtex entry:

@InCollection{Geyer2011,
  title           = {Introduction to Markov Chain Monte  \} Carlo},
  pages           = 45,
}

According to https://www.bibtex.org/SpecialSymbols/, characters that
conflict with Bibtex format description can be \-escaped.

In the above, with point at the beginning of the entry, M-:
(bibtex-parse-entry t), yields

("=type=" . "InCollection")
("=key=" . "Geyer2011")
("title" . "Introduction to Markov Chain Monte  \\")

The escaped \} is treated as closing }, which is incorrect.

Expected: escaping is properly processed.

In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.41, cairo version 1.18.0) of 2024-02-18 built on localhost
Repository revision: 951379a0983ea66b1396d07628bb726f033ea24b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101011
System Description: Gentoo Linux

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-19  9:14 bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \} Ihor Radchenko
@ 2024-02-23 12:07 ` Arash Esbati
  2024-02-23 15:25   ` Roland Winkler
  0 siblings, 1 reply; 10+ messages in thread
From: Arash Esbati @ 2024-02-23 12:07 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 69266, Roland Winkler

Ihor Radchenko <yantar92@posteo.net> writes:

> Consider the following bibtex entry:
>
> @InCollection{Geyer2011,
>   title           = {Introduction to Markov Chain Monte  \} Carlo},
>   pages           = 45,
> }
>
> According to https://www.bibtex.org/SpecialSymbols/, characters that
> conflict with Bibtex format description can be \-escaped.
>
> In the above, with point at the beginning of the entry, M-:
> (bibtex-parse-entry t), yields
>
> ("=type=" . "InCollection")
> ("=key=" . "Geyer2011")
> ("title" . "Introduction to Markov Chain Monte  \\")
>
> The escaped \} is treated as closing }, which is incorrect.
>
> Expected: escaping is properly processed.

I think the issue is that `bibtex-parse-entry' calls
`bibtex-text-in-field-bounds' which calls `bibtex-parse-field-string'
which is defined like this:

--8<---------------cut here---------------start------------->8---
(defun bibtex-parse-field-string ()
  "Parse a BibTeX field string enclosed by braces or quotes.
If a syntactically correct string is found, a pair containing the start and
end position of the field string is returned, nil otherwise.
Do not move point."
  (let ((end-point
         (or (and (eq (following-char) ?\")
                  (save-excursion
                    (with-syntax-table bibtex-quoted-string-syntax-table
                      (forward-sexp 1))
                    (point)))
             (and (eq (following-char) ?\{)
                  (save-excursion
                    (with-syntax-table bibtex-braced-string-syntax-table
                      (forward-sexp 1))
                    (point))))))
    (if end-point
        (cons (point) end-point))))
--8<---------------cut here---------------end--------------->8---

`bibtex-braced-string-syntax-table' is defined as:

--8<---------------cut here---------------start------------->8---
(defconst bibtex-braced-string-syntax-table
  (let ((st (make-syntax-table)))
    (modify-syntax-entry ?\{ "(}" st)
    (modify-syntax-entry ?\} "){" st)
    (modify-syntax-entry ?\[ "." st)
    (modify-syntax-entry ?\] "." st)
    (modify-syntax-entry ?\( "." st)
    (modify-syntax-entry ?\) "." st)
    (modify-syntax-entry ?\\ "." st)
    (modify-syntax-entry ?\" "." st)
    st)
  "Syntax-table to parse matched braces.")
--8<---------------cut here---------------end--------------->8---

where the backslash gets the punctuation class.  Hence, the
(forward-sexp 1) call above goes wrong.  You can eval this in scratch

--8<---------------cut here---------------start------------->8---
(defconst bibtex-braced-string-syntax-table
  (let ((st (make-syntax-table)))
    (modify-syntax-entry ?\{ "(}" st)
    (modify-syntax-entry ?\} "){" st)
    (modify-syntax-entry ?\[ "." st)
    (modify-syntax-entry ?\] "." st)
    (modify-syntax-entry ?\( "." st)
    (modify-syntax-entry ?\) "." st)
    ;; "." changed to "\\"
    (modify-syntax-entry ?\\ "\\" st)
    (modify-syntax-entry ?\" "." st)
    st)
  "Syntax-table to parse matched braces.")
--8<---------------cut here---------------end--------------->8---

and try your test case again -- it should give the expected result.  I
can't tell why the backslash doesn't get the escape-char syntax, it
would make sense IMO, but that's something Roland W. (CC'ed) has to
decide.

Best, Arash





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-23 12:07 ` Arash Esbati
@ 2024-02-23 15:25   ` Roland Winkler
  2024-02-24 12:19     ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: Roland Winkler @ 2024-02-23 15:25 UTC (permalink / raw)
  To: Arash Esbati; +Cc: Ihor Radchenko, 69266

Ihor Radchenko <yantar92@posteo.net> writes:
> According to https://www.bibtex.org/SpecialSymbols/, characters that
> conflict with Bibtex format description can be \-escaped.

I believe the above webpage is incorrect.  If I put something like \}
into a BibTeX field, BibTeX complains about unbalanced braces.
This is with BibTeX, Version 0.99d (TeX Live 2022/Debian).
The parsing algoritm used by BibTeX is very simple.  Generally,
BibTeX fields should contain valid LaTeX code.  So something
like
     title = "$\}$tex",

should work with BibTeX, but it gives the same error message
"unbalanced braces".

Emacs bibtex mode follows the capabilities of BibTeX itself.  I believe
it would not make sense to try to be smarter than that if, in the end,
this is not compatible anymore with BibTeX itself.

I guess one could submit here a bug report / feature request for BibTeX.
But BibTeX has been around for many decades with this limited feature set.

Nowadays, there is also biblatex.  It's intended as a successor for
BibTeX.  But it is something I know little to nothing about because in
my world (physics), everyone I know still uses BibTeX.  Essentially,
biblatex entries still use the same format as BibTeX entries.  I do not
know whether biblatex would deal with something like the above in a
smarter way.  

These BibTeX fields should contain valid LaTeX code.  But from (La)TeX's
perspective, in the string "$\}$" the backslash is not an escape
character for what follows, but \} is a macro that's defined inside TeX
math mode.  Also, the meaning of the statement "valid LaTeX code" can be
heavily redefined within (La)TeX.  Take a look at texinfo files that TeX
can digest.  So a smart BibTeX parser would require (La)TeX itself working
in the background.

In practical terms (for many years) I have never experienced this to be
a constraint when working with BibTeX.  So changing / improving this may
have a low priority among BibTeX / biblatex maintainers.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-23 15:25   ` Roland Winkler
@ 2024-02-24 12:19     ` Ihor Radchenko
  2024-02-24 16:05       ` Roland Winkler
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-02-24 12:19 UTC (permalink / raw)
  To: Roland Winkler; +Cc: Arash Esbati, 69266

Roland Winkler <winkler@gnu.org> writes:

> Ihor Radchenko <yantar92@posteo.net> writes:
>> According to https://www.bibtex.org/SpecialSymbols/, characters that
>> conflict with Bibtex format description can be \-escaped.
>
> I believe the above webpage is incorrect.  If I put something like \}
> into a BibTeX field, BibTeX complains about unbalanced braces.
> This is with BibTeX, Version 0.99d (TeX Live 2022/Debian).
> The parsing algoritm used by BibTeX is very simple.  Generally,
> BibTeX fields should contain valid LaTeX code.  So something
> like
>      title = "$\}$tex",
>
> should work with BibTeX, but it gives the same error message
> "unbalanced braces".

I am wondering if there exists a full Bibtex format description
somewhere. I can see some hints scattered over documentation in
https://ctan.org/pkg/bibtex, but nothing is complete.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-24 12:19     ` Ihor Radchenko
@ 2024-02-24 16:05       ` Roland Winkler
  2024-02-25 17:50         ` Arash Esbati
  0 siblings, 1 reply; 10+ messages in thread
From: Roland Winkler @ 2024-02-24 16:05 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Arash Esbati, 69266

On Sat, Feb 24 2024, Ihor Radchenko wrote:
> I am wondering if there exists a full Bibtex format description
> somewhere. I can see some hints scattered over documentation in
> https://ctan.org/pkg/bibtex, but nothing is complete.

The first reference I am aware of is "BibTeXing" by the author of
BibTeX, Oren Patashnik.  My version dated 1988-02-08 refers to BibTeX
version 0.99b.  The current version of BibTeX is something like 0.99d,
which gives you some idea of how BibTeX has evolved during the past 36
years.

I am not aware of anything significantly more substantial beyond this
document, which is, I guess, some indicator what kind of questions
people worry about when they are using BibTeX.

None of the documents I am aware discusses in more detail the question
of escaping that you addressed in your bug report for emacs bibtex-mode.
I guess part of the reason for this is that the notion of escaping is
orthogonal to how (La)TeX works and BibTeX follows (La)TeX in that
respect.  There is no escaping as it exists in C or bash.  In standard
(La)TeX, "\}" is defined only in math mode, (La)TeX can handle "$\}$",
but outside math mode, "\}" it throws an error.  If you don't like this
you can change this.

(La)TeX is both for typesetting itself, and it is also a powerful
programing language for how typesetting is supposed to happen.  But
there is no formal distinction between these very different aspects.  In
a (La)TeX document, all rules for how typesetting is supposed to happen
can be modified on the fly.

This gets off-topic.  Emacs bibtex-mode follows the philosophy
underlying BibTeX itself and (La)TeX.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-24 16:05       ` Roland Winkler
@ 2024-02-25 17:50         ` Arash Esbati
  2024-02-26  0:50           ` Roland Winkler
  0 siblings, 1 reply; 10+ messages in thread
From: Arash Esbati @ 2024-02-25 17:50 UTC (permalink / raw)
  To: Roland Winkler; +Cc: Ihor Radchenko, 69266

Roland Winkler <winkler@gnu.org> writes:

> In standard (La)TeX, "\}" is defined only in math mode, (La)TeX can
> handle "$\}$", but outside math mode, "\}" it throws an error.

The following code doesn't throw an error for me with LaTeX2e
<2023-11-01> patch level 1:

--8<---------------cut here---------------start------------->8---
\documentclass{article}

\begin{document}

\noindent \{x\}\\
\textbraceleft x\textbraceright

\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End:
--8<---------------cut here---------------end--------------->8---

> This gets off-topic.  Emacs bibtex-mode follows the philosophy
> underlying BibTeX itself and (La)TeX.

Would you reconsider this if Biber handles \} correctly?  I didn't test
it, though.

Best, Arash





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-25 17:50         ` Arash Esbati
@ 2024-02-26  0:50           ` Roland Winkler
  2024-02-26 14:42             ` Arash Esbati
  2024-02-26 17:08             ` Ihor Radchenko
  0 siblings, 2 replies; 10+ messages in thread
From: Roland Winkler @ 2024-02-26  0:50 UTC (permalink / raw)
  To: Arash Esbati; +Cc: Ihor Radchenko, 69266

On Sun, Feb 25 2024, Arash Esbati wrote:
> The following code doesn't throw an error for me with LaTeX2e
> <2023-11-01> patch level 1:
>
> \documentclass{article}
>
> \begin{document}
>
> \noindent \{x\}\\
> \textbraceleft x\textbraceright
>
> \end{document}

Indeed, "\{" must have become available as a command outside math mode
after I learned LaTeX many years ago.  (I believe it is still correct to
say that from LaTeX's perspective "\{" is a macro that is very different
from escaping available in other programming languages.)

> Would you reconsider this if Biber handles \} correctly?  I didn't
> test it, though.

I am happy to take advice from more advanced biber / biblatex users.
For the fun of it, I just created my first LaTeX document using biber /
biblatex.  That worked fine with balanced braces.  But it choked with a
biber error message when a biblatex field contained a single "\{".

Biber uses btparse to parse BibTeX files. The BibTeX data language, as
recognized by btparse is explained here

https://metacpan.org/dist/Text-BibTeX/view/btparse/doc/bt_language.pod

My reading of this document is that btparse does no take steps to deal
with "escaped braces" inside a field.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-26  0:50           ` Roland Winkler
@ 2024-02-26 14:42             ` Arash Esbati
  2024-02-26 17:08             ` Ihor Radchenko
  1 sibling, 0 replies; 10+ messages in thread
From: Arash Esbati @ 2024-02-26 14:42 UTC (permalink / raw)
  To: Roland Winkler; +Cc: Ihor Radchenko, 69266

Roland Winkler <winkler@gnu.org> writes:

> My reading of this document is that btparse does no take steps to deal
> with "escaped braces" inside a field.

Thanks for looking at this.  Then I suggest to close this report as
"notabug".  It seems that people who need a literal brace inside a field
have to deal with \lbrace or \textbraceleft and their right
counterparts.

Best, Arash





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-26  0:50           ` Roland Winkler
  2024-02-26 14:42             ` Arash Esbati
@ 2024-02-26 17:08             ` Ihor Radchenko
  2024-02-26 18:56               ` Roland Winkler
  1 sibling, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-02-26 17:08 UTC (permalink / raw)
  To: Roland Winkler; +Cc: Arash Esbati, 69266

Roland Winkler <winkler@gnu.org> writes:

>> Would you reconsider this if Biber handles \} correctly?  I didn't
>> test it, though.
>
> I am happy to take advice from more advanced biber / biblatex users.
> For the fun of it, I just created my first LaTeX document using biber /
> biblatex.  That worked fine with balanced braces.  But it choked with a
> biber error message when a biblatex field contained a single "\{".
>
> Biber uses btparse to parse BibTeX files. The BibTeX data language, as
> recognized by btparse is explained here
>
> https://metacpan.org/dist/Text-BibTeX/view/btparse/doc/bt_language.pod
>
> My reading of this document is that btparse does no take steps to deal
> with "escaped braces" inside a field.

Well. btparse does not really try to implement bibtex syntax accurately.
The docs just say that there is no formal description other than the
code and give up, pointing that some more accurate parsers do exist, but
btparse is not one of these more accurate parsers.

That said, since bibtex itself chokes on \}, this bug report is notabug.

Also, talking about biblatex, I do note that
https://ctan.org/pkg/biblatex has a more detailed description of the
syntax with notable differences from bibtex. For example, biblatex
allows "crossref" field that defines entry inheritance; field aliases;
and special key-value format like
AUTHOR = {given=Hans, family=Harman and given=Simon, prefix=de, family=Beumont}
See https://ctan.org/pkg/biblatex

It would be nice to have those supported. I may open another bug report
if you think that this idea is worth tracking.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \}
  2024-02-26 17:08             ` Ihor Radchenko
@ 2024-02-26 18:56               ` Roland Winkler
  0 siblings, 0 replies; 10+ messages in thread
From: Roland Winkler @ 2024-02-26 18:56 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Arash Esbati, 69266

On Mon, Feb 26 2024, Ihor Radchenko wrote:
> Also, talking about biblatex, I do note that
> https://ctan.org/pkg/biblatex has a more detailed description of the
> syntax with notable differences from bibtex. For example, biblatex
> allows "crossref" field that defines entry inheritance;

"crossref" fields are already part of BibTeX.  Do you say biblatex
"crossref" fields go beyond what BibTeX "crossref" fields offer?
Please be specific in a new bug report.

> field aliases;

Agreed, field aliases including customizable field aliases could be
useful.  Aliases for entries would probably be yet more useful.
If I remember correctly, biblatex defines a bunch of entry types that
are really just aliases for other entries.

> and special key-value format like
> AUTHOR = {given=Hans, family=Harman and given=Simon, prefix=de,
> family=Beumont}

If you submit a separate bug report, please include a description what a
meaningful support could look like.  As far as I know, emacs does not
provide a good general scheme for filling a set of fields like "given",
"prefix", and "family".  I am not sure the above justifies to come up
with a sophisticated scheme for that.





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-26 18:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19  9:14 bug#69266: 30.0.50; bibtex-parse-entry misreads escaped \} Ihor Radchenko
2024-02-23 12:07 ` Arash Esbati
2024-02-23 15:25   ` Roland Winkler
2024-02-24 12:19     ` Ihor Radchenko
2024-02-24 16:05       ` Roland Winkler
2024-02-25 17:50         ` Arash Esbati
2024-02-26  0:50           ` Roland Winkler
2024-02-26 14:42             ` Arash Esbati
2024-02-26 17:08             ` Ihor Radchenko
2024-02-26 18:56               ` Roland Winkler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).