From: Ikumi Keita <ikumi@ikumi.que.jp>
To: emacs-devel@gnu.org
Subject: Giving "text quotes" syntax in font-lock-syntax-table only
Date: Wed, 20 May 2020 15:53:58 +0900 [thread overview]
Message-ID: <50114.1589957638@localhost> (raw)
[-- Attachment #1: Type: text/plain, Size: 7030 bytes --]
Dear Emacs developers,
I was recommended to post to emacs-devel by Tassilo, AUCTeX maintainer.
I'd like to ask about SYNTAX-ALIST entry of `font-lock-defaults'. It
enables to redefine syntax table entries temporarily during font lock,
but is it allowed to assign "text quotes" syntax to a particular char
(dollar sign)?
Background:
In AUCTeX's TeX and LaTeX modes, $ normally has "paired delimiter ('$')"
syntax. But during font lock (by font-latex.el) we give it "string quote
('"')" syntax via the SYNTAX-ALIST in order to apply the right faces
for math (treated as strings here) through syntactic fontification.
However, we recently discovered that leads to the tricky bug#33139 where
under some special circumstances the syntax recognition is confused and
after a certain position of buffer math and non-math is fontified in
inverted way.
We think that the problem comes from running `syntax-ppss'. We
demonstrate by the attached toy example how font-latex.el implements the
syntactic fontification of math expression "$...$" and that
`syntax-ppss' actually interferes badly with that method. Here, the
example file hw.txt is taken and adopted from the report of bug#33139
mentioned above. Open it by
emacs -Q -geometry 140x35 -l bug-repro.el hw.txt
and type C-v three times. You will see that the paragraph beginning with
"Let $B$ be the subset ..." has inverted fontification.
(In case you didn't, do
xterm -geometry 80x25
and in that terminal
emacs -Q -nw -l bug-repro.el hw.txt
and type C-v eight times. The part beginning with "N$ and $c^{\mathcal
M}..." will be showed in inverted style.)
If the line adding `syntax-ppss' to `pre-command-hook' is deleted from
bug-repro.el, the above problem doesn't appear.
We think that the reason is that `syntax-ppss' is called with different
syntax-tables and that leads to some faulty cache entry:
(a) In font lock `font-lock-fontify-syntactically-region' calls
`syntax-ppss' within
(with-syntax-table (or syntax-ppss-table (syntax-table)) ... ).
Here, (syntax-table) returns the `font-lock-syntax-table' which
gives "string quote" syntax to "$" according to SYNTAX-ALIST.
(b) On the other hand, `pre-command-hook' calls `syntax-ppss' with
`bug-repro-mode-syntax-table', which gives "paired delimiter" syntax
to "$".
Thus the two `syntax-ppss' can return different results conflicting to
each other which are cached deep in the guts of syntax.el.
Of course, the repro is contrieved but it seems something similar
happens to many users during normal LaTeX editing work. I suspect that
the bug#40930 and examples listed in its report are the same as the one
discussed now. I guess that user-enabled minor modes (or packages) use
syntax-aware functions extensibly in those examples, typically in
`pre-command-hook', and produce syntax ppss caches which conflict with
font-latex.el.
If we give $ string-quote syntax also in the mode's syntax-table, the
bug cannot be reproduced anymore. But that's not something we want to
do in actual AUCTeX, as it has other unwanted effects. Obviously,
inline math is not really a string.
It is certain that the key is the syntax ppss cache, which I could
confirm by the following try in the above example:
----------------------------------------------------------------------
--- bug-repro.el~ 2020-05-19 13:17:23.000000000 +0900
+++ bug-repro.el 2020-05-19 21:41:12.232250000 +0900
@@ -18,4 +18,17 @@
(setq font-lock-defaults
'(nil nil nil ((?$ . "\""))
(font-lock-syntactic-face-function
- . bug-repro-syntactic-face-function))))
+ . bug-repro-syntactic-face-function)
+ (font-lock-fontify-region-function
+ . bug-repro-fontify-region))))
+
+(defvar bug-repro-syntax-ppss-cache (list nil nil -1))
+(make-variable-buffer-local 'bug-repro-syntax-ppss-cache)
+
+(defun bug-repro-fontify-region (beg end loudly)
+ (let ((syntax-ppss-wide (pop bug-repro-syntax-ppss-cache))
+ (syntax-ppss-narrow (pop bug-repro-syntax-ppss-cache))
+ (syntax-propertize--done (pop bug-repro-syntax-ppss-cache)))
+ (font-lock-default-fontify-region beg end loudly)
+ (setq bug-repro-syntax-ppss-cache
+ (list syntax-ppss-wide syntax-ppss-narrow syntax-propertize--done))))
----------------------------------------------------------------------
This change attempts to use separate cache for syntax ppss during font
lock, and it really eliminates the wrong fontification for the above
example. However, it is unsatisfactory by at least two reasons:
1. When a similar attempt is adopted in AUCTeX (font-latex.el), editing
the LaTeX document is fragile. font-latex.el uses syntax-based
fontification to fontify the argument of the \verb macro also, but it
stops giving the right face when I insert "\verb|xyz|" with the above
change. (But typing "$" just after it immediately makes "|xyz|"
fontified.)
I suppose the reason of this trouble is that the new cache isn't
flushed properly. Whereas sytax.el uses `syntax-ppss-flush-cache' to
update cache as the editing in the buffer proceeds, it doesn't touch
the new cache hided in `*-syntax-ppss-cache' in the above patch.
2. It depends heavily on the implementation of caching of syntax.el.
The above patch assumes that syntax ppss cache is managed by three
variables `syntax-ppss-wide', `syntax-ppss-narrow' and
`syntax-propertize--done'. But I don't know whether this assumption
continues to hold in future nor it is appropriate for even current
syntax.el.
If the caching mechanism of syntax.el is not uniform across emacs
versions, the maintenance cost of such codes would be much
expensive.
And even if syntax.el is stable enough, it is still not good to use
such dependent code, especially to use internal variable with "--" in
its name.
So I'd like to ask that:
(1) It doesn't work reliably to give "string quotes" syntax to $ in
SYNTAX-ALIST. Is this an emacs bug or an intented restriction?
(2) If not a bug, is it reasonable to ask to extend the font lock
framework to allow syntactic fontification of $...$ form? E.g.
[A] Implement some kind of "separate cache" in syntax-ppss and
make it available from font lock.
[B] Extend syntax parse state to include information about "the
position is in-math state or not" and make
`font-lock-fontify-syntactically-region' responsive to the
in-math state in addition to in-comment and in-string states.
This way, it would no longer be necessary to redefine syntax of
$ in SYNTAX-ALIST and we can naturally implement syntactic
fontification of $...$ via `font-lock-syntactic-face-function'.
This approach would be beneficial for standard tex-mode.el, and
potentially other program modes as well, if realized.
(3) Or is there some smart way to achieve syntactic fontification of
$...$ in the current font lock scheme?
I'm not on emacs-devel list, so please keep me in CC when replying.
Best regards,
Ikumi Keita
[-- Attachment #2: test kit --]
[-- Type: application/x-gzip, Size: 6199 bytes --]
next reply other threads:[~2020-05-20 6:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-20 6:53 Ikumi Keita [this message]
2020-05-20 11:53 ` Giving "text quotes" syntax in font-lock-syntax-table only Stefan Monnier
2020-05-20 13:24 ` Ikumi Keita
2020-05-22 8:37 ` Ikumi Keita
2020-06-02 18:46 ` Stefan Monnier
2020-06-03 8:12 ` Ikumi Keita
2020-06-03 14:12 ` Stefan Monnier
2020-06-04 13:36 ` Ikumi Keita
2020-06-04 14:00 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50114.1589957638@localhost \
--to=ikumi@ikumi.que.jp \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).