Hi Stefan,

I attach an updated patch for tex-mode and etags, in which I've
attempted to include all of your recommendations.  A few notes:

1. I changed the name as well as the doc string of the variable
holding the TeX escape and grouping chars (now
`tex-thingatpt-exclude-chars`).  I hope this makes it clearer.

2. I removed the regexp for detecting expl3 constructs, and now rely
on the mechanism I outlined in my previous work-in-progress patch:

(a) Test for expl3 class or package, set `tex-expl-file-p` to t.

(b) If not (a), add `tex-expl-region-set` to the
    `syntax-propertize-extend-region-functions` hook to list all
    regions between \ExplSyntaxOn and \ExplSyntaxOff
    (`tex-expl-region-list`).

(c) Add test in `tex-font-lock-suscript` for (a) then (b), don't
    subscript after the underscore when either is t.

3. I tried benchmarking `syntax-ppss-flush-cache` and
   `font-lock-flush` before and after the changes.  The former had a
   maximum slowdown of 0.5% (usually less) and the latter a maximum of
   0.2%, but if you want to see my methodology or suggest something to
   try please let me know.

4. I left the bespoke `syntax-propertize-function` in the
   `xref-backend-references` method uncompiled, as simple benchmarking
   suggested no perceptible gain from byte compiling it.  Using
   `syntax-ppss-flush-cache` to restore the status quo ante in each
   file-visiting buffer streamlined the code and made it do what it
   was supposed to do.

Thanks again for your advice, and please let me know what still needs
work.

Best, David.

On Fri, 3 May 2024 at 15:11, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> Hi,
>
> Apparently I'm the `tex-mode.el` guy, so I tried to take a look.
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..d990a2dbfa9 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -695,7 +696,25 @@ tex-verbatim-environments
> >       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
> >        (1 (ignore
> >            (tex-env-mark (match-beginning 0)
> > -                        (match-beginning 1) (match-end 1))))))))
> > +                        (match-beginning 1) (match-end 1)))))
> > +     ;; The next two rules change the syntax of `:' and `_' in expl3
> > +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> > +     ;; more accurately.
> > +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> > +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> > +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> > +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.
>
> Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
> we need.
> Side note: backslash doesn't need to be backslashed in [...].
>
> > +      (1 (ignore
> > +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> > +                                                       (match-end 1)))
> > +                 (list (seq-positions expr ?_)))
> > +            (dolist (pos list)
> > +              (put-text-property (+ pos (match-beginning 1))
> > +                                 (1+ (+ pos (match-beginning 1)))
> > +                                 'syntax-table (string-to-syntax "_"))))))
> > +      (2 "_"))
> > +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> > +      (1 "_")))))
>
> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> > +  "The current TeX escape and grouping characters.
>
> I recommend you backslash escape the { and } above (although it's not
> indispensable, `emacs-lisp-mode` will parse the code better).
> More importantly, the docstring doesn't explain what this list
> means/does.  E.g. does the order matter?  Can it be longer than 3 elements?
>
> From the current docstring I can't guess what would be the consequence
> of adding/removing elements to/from this list.
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(defvar semantic-symref-filepattern-alist)
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> We know `semantic-symref-filepattern-alist` will exist when
> `semantic/symref/grep` is loaded, but not before, so I'd put the
> `defvar` inside the `with-eval-after-load`.
>
> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?
>
> > +;; `xref-find-references' currently may need this when called from a
> > +;; latex-mode buffer in order to search files or buffers with a .tex
> > +;; suffix (including the buffer from which it has been called).  We
> > +;; append it to `auto-mode-alist' so as not to interfere with the usual
> > +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> > +;; render it unnecessary.
> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
> > +  (let (bufs texbufs
> > +             (mode major-mode))
> > +    (dolist (buf (buffer-list))
> > +      (if (eq (buffer-local-value 'major-mode buf) mode)
> > +          (push buf bufs)
> > +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> > +          (push buf texbufs))))
> > +    (unless (seq-set-equal-p tex--buffers-list bufs)
> > +      (let* ((amalist (tex--collect-file-extensions))
> > +          (extlist (alist-get mode semantic-symref-filepattern-alist))
> > +          (extlist-new (seq-uniq
> > +                           (seq-union amalist extlist #'string-match-p))))
>
> After sinking the `defvar` above, you'll need to add a new `defvar` for
> `semantic-symref-filepattern-alist` just after the `require`.
>
> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.
>
>
>         Stefan
>