Giving "text quotes" syntax in font-lock-syntax-table only

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Ikumi Keita <ikumi@ikumi.que.jp>
To: emacs-devel@gnu.org
Subject: Giving "text quotes" syntax in font-lock-syntax-table only
Date: Wed, 20 May 2020 15:53:58 +0900	[thread overview]
Message-ID: <50114.1589957638@localhost> (raw)

[-- Attachment #1: Type: text/plain, Size: 7030 bytes --]

Dear Emacs developers,
I was recommended to post to emacs-devel by Tassilo, AUCTeX maintainer.

I'd like to ask about SYNTAX-ALIST entry of `font-lock-defaults'. It
enables to redefine syntax table entries temporarily during font lock,
but is it allowed to assign "text quotes" syntax to a particular char
(dollar sign)?

Background:
In AUCTeX's TeX and LaTeX modes, $ normally has "paired delimiter ('$')"
syntax. But during font lock (by font-latex.el) we give it "string quote
('"')" syntax via the SYNTAX-ALIST in order to apply the right faces
for math (treated as strings here) through syntactic fontification.

However, we recently discovered that leads to the tricky bug#33139 where
under some special circumstances the syntax recognition is confused and
after a certain position of buffer math and non-math is fontified in
inverted way.

We think that the problem comes from running `syntax-ppss'. We
demonstrate by the attached toy example how font-latex.el implements the
syntactic fontification of math expression "$...$" and that
`syntax-ppss' actually interferes badly with that method. Here, the
example file hw.txt is taken and adopted from the report of bug#33139
mentioned above. Open it by
emacs -Q -geometry 140x35 -l bug-repro.el hw.txt
and type C-v three times. You will see that the paragraph beginning with
"Let $B$ be the subset ..." has inverted fontification.
(In case you didn't, do
xterm -geometry 80x25
and in that terminal
emacs -Q -nw -l bug-repro.el hw.txt
and type C-v eight times. The part beginning with "N$ and $c^{\mathcal
M}..." will be showed in inverted style.)
If the line adding `syntax-ppss' to `pre-command-hook' is deleted from
bug-repro.el, the above problem doesn't appear.

We think that the reason is that `syntax-ppss' is called with different
syntax-tables and that leads to some faulty cache entry:
(a) In font lock `font-lock-fontify-syntactically-region' calls
    `syntax-ppss' within
(with-syntax-table (or syntax-ppss-table (syntax-table)) ... ).
    Here, (syntax-table) returns the `font-lock-syntax-table' which
    gives "string quote" syntax to "$" according to SYNTAX-ALIST.
(b) On the other hand, `pre-command-hook' calls `syntax-ppss' with
    `bug-repro-mode-syntax-table', which gives "paired delimiter" syntax
    to "$".
Thus the two `syntax-ppss' can return different results conflicting to
each other which are cached deep in the guts of syntax.el.

Of course, the repro is contrieved but it seems something similar
happens to many users during normal LaTeX editing work. I suspect that
the bug#40930 and examples listed in its report are the same as the one
discussed now. I guess that user-enabled minor modes (or packages) use
syntax-aware functions extensibly in those examples, typically in
`pre-command-hook', and produce syntax ppss caches which conflict with
font-latex.el.

If we give $ string-quote syntax also in the mode's syntax-table, the
bug cannot be reproduced anymore.  But that's not something we want to
do in actual AUCTeX, as it has other unwanted effects.  Obviously,
inline math is not really a string.

It is certain that the key is the syntax ppss cache, which I could
confirm by the following try in the above example:
----------------------------------------------------------------------
--- bug-repro.el~	2020-05-19 13:17:23.000000000 +0900
+++ bug-repro.el	2020-05-19 21:41:12.232250000 +0900
@@ -18,4 +18,17 @@
   (setq font-lock-defaults
         '(nil nil nil ((?$ . "\""))
               (font-lock-syntactic-face-function
-               . bug-repro-syntactic-face-function))))
+               . bug-repro-syntactic-face-function)
+	      (font-lock-fontify-region-function
+	       . bug-repro-fontify-region))))
+
+(defvar bug-repro-syntax-ppss-cache (list nil nil -1))
+(make-variable-buffer-local 'bug-repro-syntax-ppss-cache)
+
+(defun bug-repro-fontify-region (beg end loudly)
+  (let ((syntax-ppss-wide (pop bug-repro-syntax-ppss-cache))
+	(syntax-ppss-narrow (pop bug-repro-syntax-ppss-cache))
+	(syntax-propertize--done (pop bug-repro-syntax-ppss-cache)))
+    (font-lock-default-fontify-region beg end loudly)
+    (setq bug-repro-syntax-ppss-cache
+	  (list syntax-ppss-wide syntax-ppss-narrow syntax-propertize--done))))
----------------------------------------------------------------------
This change attempts to use separate cache for syntax ppss during font
lock, and it really eliminates the wrong fontification for the above
example. However, it is unsatisfactory by at least two reasons:

1. When a similar attempt is adopted in AUCTeX (font-latex.el), editing
   the LaTeX document is fragile.  font-latex.el uses syntax-based
   fontification to fontify the argument of the \verb macro also, but it
   stops giving the right face when I insert "\verb|xyz|" with the above
   change. (But typing "$" just after it immediately makes "|xyz|"
   fontified.)

   I suppose the reason of this trouble is that the new cache isn't
   flushed properly. Whereas sytax.el uses `syntax-ppss-flush-cache' to
   update cache as the editing in the buffer proceeds, it doesn't touch
   the new cache hided in `*-syntax-ppss-cache' in the above patch.
2. It depends heavily on the implementation of caching of syntax.el.
   The above patch assumes that syntax ppss cache is managed by three
   variables `syntax-ppss-wide', `syntax-ppss-narrow' and
   `syntax-propertize--done'. But I don't know whether this assumption
   continues to hold in future nor it is appropriate for even current
   syntax.el.

   If the caching mechanism of syntax.el is not uniform across emacs
   versions, the maintenance cost of such codes would be much
   expensive.

   And even if syntax.el is stable enough, it is still not good to use
   such dependent code, especially to use internal variable with "--" in
   its name.

So I'd like to ask that:
(1) It doesn't work reliably to give "string quotes" syntax to $ in
    SYNTAX-ALIST. Is this an emacs bug or an intented restriction?
(2) If not a bug, is it reasonable to ask to extend the font lock
    framework to allow syntactic fontification of $...$ form? E.g.
    [A] Implement some kind of "separate cache" in syntax-ppss and
        make it available from font lock.
    [B] Extend syntax parse state to include information about "the
        position is in-math state or not" and make
	`font-lock-fontify-syntactically-region' responsive to the
        in-math state in addition to in-comment and in-string states.
        This way, it would no longer be necessary to redefine syntax of
        $ in SYNTAX-ALIST and we can naturally implement syntactic
        fontification of $...$ via `font-lock-syntactic-face-function'.
        This approach would be beneficial for standard tex-mode.el, and
        potentially other program modes as well, if realized.
(3) Or is there some smart way to achieve syntactic fontification of
    $...$ in the current font lock scheme?

I'm not on emacs-devel list, so please keep me in CC when replying.

Best regards,
Ikumi Keita

[-- Attachment #2: test kit --]
[-- Type: application/x-gzip, Size: 6199 bytes --]

next             reply	other threads:[~2020-05-20  6:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20  6:53 Ikumi Keita [this message]
2020-05-20 11:53 ` Giving "text quotes" syntax in font-lock-syntax-table only Stefan Monnier
2020-05-20 13:24   ` Ikumi Keita
2020-05-22  8:37     ` Ikumi Keita
2020-06-02 18:46     ` Stefan Monnier
2020-06-03  8:12       ` Ikumi Keita
2020-06-03 14:12   ` Stefan Monnier
2020-06-04 13:36     ` Ikumi Keita
2020-06-04 14:00       ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50114.1589957638@localhost \
    --to=ikumi@ikumi.que.jp \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).