unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Arash Esbati <arash@gnu.org>
Cc: 53749@debbugs.gnu.org, Ikumi Keita <ikumi@ikumi.que.jp>,
	Dmitry Gutov <dgutov@yandex.ru>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	Tassilo Horn <tsdh@gnu.org>, Eli Zaretskii <eliz@gnu.org>,
	stefankangas@gmail.com
Subject: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 29 Apr 2024 15:15:41 +0100	[thread overview]
Message-ID: <CADF+RtiOECRmdoFD1qP=gRzS+P3tJqh_WNTnuW-1zPrY9g4VBw@mail.gmail.com> (raw)
In-Reply-To: <m24jbtim2p.fsf@macmutant.fritz.box>

[-- Attachment #1: Type: text/plain, Size: 2615 bytes --]

Hi Dmitry and Arash,

Here's my third attempt at a working xref backend for TeX. I'll try
quickly to summarize what's in it:

1. I've modified etags so that it creates findable tags for as many
different sorts of TeX construct as possible, including those written
in the new expl3 syntax. I've now removed the escape character from
the tag names, as this simplifies code all around.

2. 4 of the 6 xref backend functions just call the etags backend.

3. xref-backend-identifier-at-point is modified to provide new regexps
for delineating TeX symbols, and there's also code to cope with expl3
constructs slightly differently in M-? than in the other two main xref
commands.

4. xref-backend-references is a wrapper for the standard backend, the
wrapper doing two things: first, it tries to accumulate as many file
extensions for the current major-mode as emacs knows about, and second
it creates a bespoke syntax-propertize-function for strings that
aren't entirely composed of symbol or word characters. It applies this
function to file-visiting buffers and lets xref apply it in the
*xref-temp buffer, though I had to add a one-liner in xref.el to fix
what I believe is a minor bug there preventing syntax-propertize from
doing its work when the temp buffer holds text from a new file. (I can
provide a recipe for this if you want.)

5. Slightly unrelatedly, I've added new syntax-propertize-rules to
latex-mode so that expl3 constructs with the underscore aren't
fontified as subscripts, which makes such code unreadable. I'm happy
to split this off as another patch.

All comments gratefully received, and thanks,

David.

On Mon, 22 Apr 2024 at 14:06, Arash Esbati <arash@gnu.org> wrote:
>
> David Fussner <dfussner@googlemail.com> writes:
>
> > Thanks for the clarifications. If you look at the current patch to
> > tex-mode.el, there's one function call added to TeX-mode-hook, mainly
> > for my own testing purposes, but no matter what the final patch looks
> > like it should only similarly require a single function call in an
> > AUCTeX hook to activate the new xref code there, along with one in
> > tex-common-initialization for the in-tree modes. If and when all
> > parties are satisfied by the patch I'll certainly be in touch with you
> > to find out how you'd prefer to handle activating it (or not) in
> > AUCTeX. The current state of affairs is a convenience for me and for
> > anyone else who cares to test the code.
>
> Hi David,
>
> I justed wanted to come back on this report and ask if there is any
> progress?  It would be nice to get Xref working within TeX buffers.
>
> TIA.  Best, Arash

[-- Attachment #2: 0001-Provide-a-modified-xref-backend-for-TeX-buffers.patch --]
[-- Type: text/x-patch, Size: 30939 bytes --]

From 64a4f7c7b89b4475a3841b54288c25bcc4ebde3d Mon Sep 17 00:00:00 2001
From: David Fussner <dfussner@googlemail.com>
Date: Mon, 29 Apr 2024 15:05:03 +0100
Subject: [PATCH] Provide a modified xref backend for TeX buffers

* lib-src/etags.c (TeX_commands): Improve parsing of commands in TeX
buffers.
(TEX_defenv): Expand list of commands to tag by default in TeX
buffers.
(TeX_help):
* doc/emacs/maintaining.texi (Tag Syntax): Document new tagged
commands.
(Identifier Search): Add note about semantic-symref-filepattern-alist,
auto-mode-alist, and xref-find-references.

* lisp/progmodes/xref.el (xref--collect-matches): Ensure
syntax-propertize actually runs in the *xref-temp buffer for each
new file searched.
* lisp/textmodes/tex-mode.el (tex-font-lock-suscript): Disable
subscript face in expl3 constructs.
(latex-syntax-propertize-rules): Add two new rules to give symbol
syntax to the standard components of expl3 constructs.
(tex-common-initialization): Set up xref backend for in-tree TeX
modes.
(tex--thing-at-point, tex-thingatpt--beginning-of-symbol)
(tex-thingatpt--end-of-symbol, tex--bounds-of-symbol-at-point):
New functions to return 'thing-at-point' for xref backend.
(tex-esc-and-group-chars): New var to do the same.
(xref-backend-identifier-at-point): New TeX backend method to provide
symbols for processing by xref.
(xref-backend-identifier-completion-table)
(xref-backend-identifier-completion-ignore-case)
(xref-backend-definitions, xref-backend-apropos): Placeholders to
call the standard 'etags' xref backend methods.
(xref-backend-references): Wrapper to call the default xref backend
method, finding as many relevant files as possible and using a bespoke
syntax-propertize-function.
(tex--collect-file-extensions, tex-xref-syntax-function): Helper
function and macro for previous.
(tex-find-references-syntax-table, tex--buffers-list)
(tex--last-ref-syntax-flag, tex--old-syntax-function): New vars for
same.
---
 doc/emacs/maintaining.texi |  34 +++-
 lib-src/etags.c            | 183 ++++++++++++++++++--
 lisp/progmodes/xref.el     |   1 +
 lisp/textmodes/tex-mode.el | 336 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 537 insertions(+), 17 deletions(-)

diff --git a/doc/emacs/maintaining.texi b/doc/emacs/maintaining.texi
index 579098c81b1..2fbb964a7a0 100644
--- a/doc/emacs/maintaining.texi
+++ b/doc/emacs/maintaining.texi
@@ -2529,6 +2529,15 @@ Identifier Search
 referenced.  The XREF mode commands are available in this buffer, see
 @ref{Xref Commands}.
 
+When invoked in a buffer whose major mode uses the @code{etags} backend,
+@kbd{M-?} searches files and buffers whose major mode matches that of
+the original buffer.  It guesses that mode from file extensions, so if
+@kbd{M-?} seems to be skipping relevant buffers or files, try
+customizing either the variable @code{semantic-symref-filepattern-alist}
+(if your buffer's major mode already has an entry in it), or
+@code{auto-mode-alist} (if not), thereby informing @code{xref} of the
+missing extensions (@pxref{Choosing Modes}).
+
 @vindex xref-auto-jump-to-first-xref
   If the value of the variable @code{xref-auto-jump-to-first-xref} is
 @code{t}, @code{xref-find-references} automatically jumps to the first
@@ -2749,8 +2758,29 @@ Tag Syntax
 @code{\section}, @code{\subsection}, @code{\subsubsection},
 @code{\eqno}, @code{\label}, @code{\ref}, @code{\cite},
 @code{\bibitem}, @code{\part}, @code{\appendix}, @code{\entry},
-@code{\index}, @code{\def}, @code{\newcommand}, @code{\renewcommand},
-@code{\newenvironment} and @code{\renewenvironment} are tags.
+@code{\index}, @code{\def}, @code{\edef}, @code{\gdef}, @code{\xdef},
+@code{\newcommand}, @code{\renewcommand}, @code{\newenvironment},
+@code{\renewenvironment}, @code{\DeclareRobustCommand},
+@code{\newrobustcmd}, @code{\renewrobustcmd}, @code{\providecommand},
+@code{\providerobustcmd}, @code{\NewDocumentCommand},
+@code{\RenewDocumentCommand}, @code{\ProvideDocumentCommand},
+@code{\DeclareDocumentCommand}, @code{\NewExpandableDocumentCommand},
+@code{\RenewExpandableDocumentCommand},
+@code{\ProvideExpandableDocumentCommand},
+@code{\DeclareExpandableDocumentCommand},
+@code{\NewDocumentEnvironment}, @code{\RenewDocumentEnvironment},
+@code{\ProvideDocumentEnvironment},
+@code{\DeclareDocumentEnvironment}, @code{\csdef}, @code{\csedef},
+@code{\csgdef}, @code{\csxdef}, @code{\csletcs}, @code{\cslet},
+@code{\letcs}, @code{\let}, \@code{\cs_new_protected_nopar},
+@code{\cs_new_protected}, @code{\cs_new_nopar}, @code{\cs_new_eq},
+@code{\cs_new}, @code{\cs_set_protected_nopar},
+@code{\cs_set_protected}, @code{\cs_set_nopar}, @code{\cs_set_eq},
+@code{\cs_set}, @code{\cs_gset_protected_nopar},
+@code{\cs_gset_protected}, @code{\cs_gset_nopar}, @code{\cs_gset_eq},
+@code{\cs_gset}, @code{\cs_generate_from_arg_count}, and
+@code{\cs_generate_variant} are tags.  So too are the arguments of any
+starred variants of these commands.
 
 Other commands can make tags as well, if you specify them in the
 environment variable @env{TEXTAGS} before invoking @command{etags}.  The
diff --git a/lib-src/etags.c b/lib-src/etags.c
index 032cfa8010b..8b79e92abf1 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -792,8 +792,24 @@ #define STDIN 0x1001		/* returned by getopt_long on --parse-stdin */
 "In LaTeX text, the argument of any of the commands '\\chapter',\n\
 '\\section', '\\subsection', '\\subsubsection', '\\eqno', '\\label',\n\
 '\\ref', '\\cite', '\\bibitem', '\\part', '\\appendix', '\\entry',\n\
-'\\index', '\\def', '\\newcommand', '\\renewcommand',\n\
-'\\newenvironment' or '\\renewenvironment' is a tag.\n\
+'\\index', '\\def', '\\edef', '\\gdef', '\\xdef', '\\newcommand',\n\
+'\\renewcommand', '\\newenvironment', '\\renewenvironment',\n\
+'\\DeclareRobustCommand', '\\newrobustcmd', '\\renewrobustcmd',\n\
+'\\providecommand', '\\providerobustcmd', '\\NewDocumentCommand',\n\
+'\\RenewDocumentCommand', '\\ProvideDocumentCommand',\n\
+'\\DeclareDocumentCommand', '\\NewExpandableDocumentCommand',\n\
+'\\RenewExpandableDocumentCommand', '\\ProvideExpandableDocumentCommand',\n\
+'\\DeclareExpandableDocumentCommand', '\\NewDocumentEnvironment',\n\
+'\\RenewDocumentEnvironment', '\\ProvideDocumentEnvironment',\n\
+'\\DeclareDocumentEnvironment','\\csdef', '\\csedef', '\\csgdef',\n\
+'\\csxdef', '\\csletcs', '\\cslet', '\\letcs', '\\let',\n\
+'\\cs_new_protected_nopar', '\\cs_new_protected', '\\cs_new_nopar',\n\
+'\\cs_new_eq', '\\cs_new', '\\cs_set_protected_nopar',\n\
+'\\cs_set_protected', '\\cs_set_nopar', '\\cs_set_eq', '\\cs_set',\n\
+'\\cs_gset_protected_nopar', '\\cs_gset_protected', '\\cs_gset_nopar',\n\
+'\\cs_gset_eq', '\\cs_gset', '\\cs_generate_from_arg_count', or\n\
+'\\cs_generate_variant' is a tag.  So is the argument of any starred\n\
+variant of these commands.\n\
 \n\
 Other commands can be specified by setting the environment variable\n\
 'TEXTAGS' to a colon-separated list like, for example,\n\
@@ -5736,11 +5752,25 @@ Scheme_functions (FILE *inf)
 static linebuffer *TEX_toktab = NULL; /* Table with tag tokens */
 
 /* Default set of control sequences to put into TEX_toktab.
-   The value of environment var TEXTAGS is prepended to this.  */
+   The value of environment var TEXTAGS is prepended to this.
+   (2024) Add variants of '\def', some additional LaTeX (and
+   former xparse) commands, common variants from the
+   'etoolbox' package, and the main expl3 commands. */
 static const char *TEX_defenv = "\
-:chapter:section:subsection:subsubsection:eqno:label:ref:cite:bibitem\
-:part:appendix:entry:index:def\
-:newcommand:renewcommand:newenvironment:renewenvironment";
+:label:ref:chapter:section:subsection:subsubsection:eqno:cite:bibitem\
+:part:appendix:entry:index:def:edef:gdef:xdef:newcommand:renewcommand\
+:newenvironment:renewenvironment:DeclareRobustCommand:renewrobustcmd\
+:newrobustcmd:providecommand:providerobustcmd:NewDocumentCommand\
+:RenewDocumentCommand:ProvideDocumentCommand:DeclareDocumentCommand\
+:NewExpandableDocumentCommand:RenewExpandableDocumentCommand\
+:ProvideExpandableDocumentCommand:DeclareExpandableDocumentCommand\
+:NewDocumentEnvironment:RenewDocumentEnvironment\
+:ProvideDocumentEnvironment:DeclareDocumentEnvironment:csdef\
+:csedef:csgdef:csxdef:csletcs:cslet:letcs:let:cs_new_protected_nopar\
+:cs_new_protected:cs_new_nopar:cs_new_eq:cs_new:cs_set_protected_nopar\
+:cs_set_protected:cs_set_nopar:cs_set_eq:cs_set:cs_gset_protected_nopar\
+:cs_gset_protected:cs_gset_nopar:cs_gset_eq:cs_gset\
+:cs_generate_from_arg_count:cs_generate_variant";
 
 static void TEX_decode_env (const char *, const char *);
 
@@ -5799,19 +5829,137 @@ TeX_commands (FILE *inf)
 	      {
 		char *p;
 		ptrdiff_t namelen, linelen;
-		bool opgrp = false;
+		bool opgrp = false, one_esc = false, is_explthree = false;
 
 		cp = skip_spaces (cp + key->len);
+
+		/* 1. The canonical expl3 syntax looks something like this:
+		   \cs_new:Npn \__hook_tl_gput:Nn { \ERROR }.  First, if we
+		   want to tag any such commands, we include only the part
+		   before the colon (cs_new) in TEX_defenv or TEXTAGS.  Second,
+		   etags skips the argument specifier (including the colon)
+		   after the tag token, so that it doesn't become the tag name.
+		   Third, we set the boolean 'is_explthree' to true so that we
+		   can remove the argument specifier from the actual tag name
+		   (__hook_tl_gput).  This all allows us to include expl3
+		   constructs in TEX_defenv or in the environment variable
+		   TEXTAGS without requiring a change of separator, and it also
+		   allows us to find the definition of variant commands (with
+		   different argument specifiers) defined using, for example,
+		   \cs_generate_variant:Nn.  Please note that the expl3 spec
+		   requires etags to pay more attention to whitespace in the
+		   code.
+
+		   2. We also automatically remove the asterisk from starred
+		   variants of all commands, without the need to include the
+		   starred commands explicitly in TEX_defenv or TEXTAGS. */
+		if (*cp == ':')
+		  {
+		    while (!c_isspace (*cp) && *cp != TEX_opgrp)
+		      cp++;
+		    cp = skip_spaces (cp);
+		    is_explthree = true;
+		  }
+		else if (*cp == '*')
+		  cp++;
+
+		/* Skip the optional arguments to commands in the tags list so
+		   that these arguments don't end up as the name of the tag.
+		   The name will instead come from the argument in curly braces
+		   that follows the optional ones. */
+		while (*cp != '\0' && *cp != '%')
+		  {
+		    if (*cp == '[')
+		      {
+			while (*cp != ']' && *cp != '\0' && *cp != '%')
+			  cp++;
+		      }
+		    else if (*cp == '(')
+		      {
+			while (*cp != ')' && *cp != '\0' && *cp != '%')
+			  cp++;
+		      }
+		    else if (*cp == ']' || *cp == ')')
+		      cp++;
+		    else
+		      break;
+		  }
 		if (*cp == TEX_opgrp)
 		  {
 		    opgrp = true;
 		    cp++;
+		    cp = skip_spaces (cp); /* For expl3 code. */
 		  }
+
+		/* Removing the TeX escape character from tag names simplifies
+		   things for editors finding tagged commands in TeX buffers.
+		   This applies to Emacs but also to the tag-finding behavior
+		   of at least some of the editors that use ctags, though in
+		   the latter case this will remain suboptimal.  The
+		   undocumented ctags option '--no-duplicates' may help. */
+		if (*cp == TEX_esc)
+		  {
+		    cp++;
+		    one_esc = true;
+		  }
+
+		/* Testing !c_isspace && !c_ispunct is simpler, but halts
+		   processing at too many places.  The list as it stands tries
+		   both to ensure that tag names will derive from macro names
+		   rather than from optional parameters to those macros, and
+		   also to return findable names while still allowing for
+		   unorthodox constructs. */
 		for (p = cp;
-		     (!c_isspace (*p) && *p != '#' &&
-		      *p != TEX_opgrp && *p != TEX_clgrp);
+		     (!c_isspace (*p) && *p != '#' && *p != '=' &&
+		      *p != '[' && *p != '(' && *p != TEX_opgrp &&
+		      *p != TEX_clgrp && *p != '"' && *p != '\'' &&
+		      *p != '%' && *p != ',' && *p != '|' && *p != '$');
 		     p++)
-		  continue;
+		  /* In expl3 code we remove the argument specification from
+		     the tag name.  More generally we allow only one (deleted)
+		     escape char in a tag name, which (primarily) enables
+		     tagging a TeX command's different, possibly temporary,
+		     '\let' bindings. */
+		  if (is_explthree && *p == ':')
+		    break;
+		  else if (*p == TEX_esc)
+		    { /* Second part of test is for, e.g., \cslet. */
+		      if (!one_esc && !opgrp)
+			{
+			  one_esc = true;
+			  continue;
+			}
+		      else
+			break;
+		    }
+		  else
+		    continue;
+		/* For TeX files, tags without a name are basically cruft, and
+		   in some situations they can produce spurious and confusing
+		   matches.  Try to catch as many cases as possible where a
+		   command name is of the form '\(', but avoid, as far as
+		   possible, the spurious matches. */
+		if (p == cp)
+		  {
+		    switch (*p)
+		      { /* Include =? */
+		      case '(': case '[': case '"': case '\'':
+		      case '\\': case '!': case '=': case ',':
+		      case '|': case '$':
+			p++;
+			break;
+		      case '{': case '}': case '<': case '>':
+			if (!opgrp)
+			  {
+			      p++;
+			      if (*p == '\0' || *p == '%')
+				goto tex_next_line;
+			  }
+			break;
+		      default:
+			break;
+		      }
+		  }
 		namelen = p - cp;
 		linelen = lb.len;
 		if (!opgrp || *p == TEX_clgrp)
@@ -5820,9 +5968,18 @@ TeX_commands (FILE *inf)
 		      p++;
 		    linelen = p - lb.buffer + 1;
 		  }
-		make_tag (cp, namelen, true,
-			  lb.buffer, linelen, lineno, linecharno);
-		goto tex_next_line; /* We only tag a line once */
+		if (namelen)
+		  make_tag (cp, namelen, true,
+			    lb.buffer, linelen, lineno, linecharno);
+		/* Lines with more than one \def or \let are surprisingly
+		   common in TeX files, especially in the system files that
+		   form the basis of the various TeX formats.  This tags them
+		   all. */
+		/* goto tex_next_line; /\* We only tag a line once *\/ */
+		while (*cp != '\0' && *cp != '%' && *cp != TEX_esc)
+		  cp++;
+		if (*cp != TEX_esc)
+		  goto tex_next_line;
 	      }
 	}
     tex_next_line:
diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 755c3db04fd..1d2d4904b06 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -2129,6 +2129,7 @@ xref--collect-matches
           (erase-buffer))
         (insert text)
         (goto-char (point-min))
+        (setq syntax-propertize--done 0)
         (xref--collect-matches-1 regexp file line
                                  (point)
                                  (point-max)
diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
index 97c950267c6..d990a2dbfa9 100644
--- a/lisp/textmodes/tex-mode.el
+++ b/lisp/textmodes/tex-mode.el
@@ -647,7 +647,8 @@ tex-font-lock-suscript
 		  (setq pos (1- pos) odd (not odd)))
 		odd))
     (if (eq (char-after pos) ?_)
-	`(face subscript display (raise ,(car tex-font-script-display)))
+        (unless (equal (get-text-property pos 'syntax-table) '(3))
+	  `(face subscript display (raise ,(car tex-font-script-display))))
       `(face superscript display (raise ,(cadr tex-font-script-display))))))
 
 (defun tex-font-lock-match-suscript (limit)
@@ -695,7 +696,25 @@ tex-verbatim-environments
      ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
       (1 (ignore
           (tex-env-mark (match-beginning 0)
-                        (match-beginning 1) (match-end 1))))))))
+                        (match-beginning 1) (match-end 1)))))
+     ;; The next two rules change the syntax of `:' and `_' in expl3
+     ;; constructs, so that `tex-font-lock-suscript' can fontify them
+     ;; more accurately.
+     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
+              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
+              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
+              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
+      (1 (ignore
+          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
+                                                       (match-end 1)))
+                 (list (seq-positions expr ?_)))
+            (dolist (pos list)
+              (put-text-property (+ pos (match-beginning 1))
+                                 (1+ (+ pos (match-beginning 1)))
+                                 'syntax-table (string-to-syntax "_"))))))
+      (2 "_"))
+     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
+      (1 "_")))))
 
 (defun tex-env-mark (cmd start end)
   (when (= cmd (line-beginning-position))
@@ -1291,6 +1310,8 @@ tex-common-initialization
 	      (syntax-propertize-rules latex-syntax-propertize-rules))
   ;; TABs in verbatim environments don't do what you think.
   (setq-local indent-tabs-mode nil)
+  ;; Set up xref backend in TeX buffers.
+  (add-hook 'xref-backend-functions #'tex--xref-backend nil t)
   ;; Other vars that should be buffer-local.
   (make-local-variable 'tex-command)
   (make-local-variable 'tex-start-of-header)
@@ -3742,6 +3763,317 @@ tex-chktex
       (process-send-region tex-chktex--process (point-min) (point-max))
       (process-send-eof tex-chktex--process))))
 
+\f
+;;; Xref backend
+
+;; Here we lightly adapt the default etags backend for xref so that
+;; the main xref user commands (including `xref-find-definitions',
+;; `xref-find-apropos', and `xref-find-references' [on M-., C-M-., and
+;; M-?, respectively]) work in TeX buffers.  The only methods we
+;; actually modify are `xref-backend-identifier-at-point' and
+;; `xref-backend-references'.  Many of the complications here, and in
+;; `etags' itself, are due to the necessity of parsing both the old
+;; TeX syntax and the new expl3 syntax, which will continue to appear
+;; together in documents for the foreseeable future.  Synchronizing
+;; Emacs and `etags' this way aims to improve the user experience "out
+;; of the box."
+
+(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
+  "The current TeX escape and grouping characters.
+
+The `etags' program only recognizes `\\' (92) and `!' (33) as
+escape characters in TeX documents, and if it detects the latter
+it also uses `<>' as the TeX grouping construct rather than `{}'.
+The TeX `xref-backend-identifier-at-point' method uses these
+three characters to delimit the `thing-at-point' in TeX buffers,
+so this variable should contain at least these three, though you
+can optionally add other characters if the default set of TeX
+symbol delimiters is inadequate for your documents.  (The
+functions `tex-thingatpt--beginning-of-symbol'
+`tex-thingatpt--end-of-symbol' construct the regexp.)  Setting
+the escape and grouping chars to anything other than `\\{}' or
+`!<>' will not be useful without changes to `etags', at least for
+commands that search tags tables, such as
+\\[xref-find-definitions] and \\[xref-find-apropos].")
+
+;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
+;; AUCTeX is doing the same for its modes.
+(defvar semantic-symref-filepattern-alist)
+(with-eval-after-load 'semantic/symref/grep
+  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
+                     "*.bbl" "*.drv" "*.hva")
+        semantic-symref-filepattern-alist)
+  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
+        semantic-symref-filepattern-alist)
+  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
+
+(defun tex--xref-backend () 'tex-etags)
+
+;; Setup AUCTeX modes (for testing purposes only).
+
+(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
+
+(defun tex-set-auctex-xref-backend ()
+  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
+
+;; `xref-find-references' currently may need this when called from a
+;; latex-mode buffer in order to search files or buffers with a .tex
+;; suffix (including the buffer from which it has been called).  We
+;; append it to `auto-mode-alist' so as not to interfere with the usual
+;; mode-setting apparatus.  Changes here and in AUCTeX should soon
+;; render it unnecessary.
+(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
+
+(cl-defmethod xref-backend-identifier-at-point ((_backend (eql 'tex-etags)))
+  (require 'etags)
+  (tex--thing-at-point))
+
+;; The detection of `_' and `:' is a primitive method for determining
+;; whether point is on an expl3 construct.  It may fail in some
+;; instances.
+(defun tex--thing-at-point ()
+  "Demarcate `thing-at-point' for TeX `xref' backend."
+  (let ((bounds (tex--bounds-of-symbol-at-point)))
+    (when bounds
+      (let ((texsym (buffer-substring-no-properties (car bounds) (cdr bounds))))
+        (if (and (not (string-match-p "reference" (symbol-name this-command)))
+                 (seq-contains-p texsym ?_)
+                 (seq-contains-p texsym ?:))
+            (seq-take texsym (seq-position texsym ?:))
+          texsym)))))
+
+(defun tex-thingatpt--beginning-of-symbol ()
+  (and
+   (re-search-backward (concat "[]["
+                               (mapconcat #'regexp-quote
+                                          (mapcar #'char-to-string
+                                                  tex-esc-and-group-chars))
+                               "\"*`'#=&()%,|$[:cntrl:][:blank:]]"))
+   (forward-char)))
+
+(defun tex-thingatpt--end-of-symbol ()
+  (and
+   (re-search-forward (concat "[]["
+                              (mapconcat #'regexp-quote
+                                          (mapcar #'char-to-string
+                                                  tex-esc-and-group-chars))
+                              "\"*`'#=&()%,|$[:cntrl:][:blank:]]"))
+   (backward-char)))
+
+(defun tex--bounds-of-symbol-at-point ()
+  "Simplify `bounds-of-thing-at-point' for TeX `xref' backend."
+  (let ((orig (point)))
+    (ignore-errors
+      (save-excursion
+	(tex-thingatpt--end-of-symbol)
+	(tex-thingatpt--beginning-of-symbol)
+	(let ((beg (point)))
+	  (if (<= beg orig)
+	      (let ((real-end
+		     (progn
+		       (tex-thingatpt--end-of-symbol)
+		       (point))))
+		(cond ((and (<= orig real-end) (< beg real-end))
+		       (cons beg real-end))
+                      ((and (= orig real-end) (= beg real-end))
+		       (cons beg (1+ beg)))))))))));; For 1-char TeX commands.
+
+(cl-defmethod xref-backend-identifier-completion-table ((_backend
+                                                         (eql 'tex-etags)))
+  (xref-backend-identifier-completion-table 'etags))
+
+(cl-defmethod xref-backend-identifier-completion-ignore-case ((_backend
+                                                               (eql
+                                                                'tex-etags)))
+  (xref-backend-identifier-completion-ignore-case 'etags))
+
+(cl-defmethod xref-backend-definitions ((_backend (eql 'tex-etags)) symbol)
+  (xref-backend-definitions 'etags symbol))
+
+(cl-defmethod xref-backend-apropos ((_backend (eql 'tex-etags)) pattern)
+  (xref-backend-apropos 'etags pattern))
+
+;; The `xref-backend-references' method requires more code than the
+;; others for at least two main reasons: TeX authors have typically been
+;; free in their invention of new file types with new suffixes, and they
+;; have also tended sometimes to include non-symbol characters in
+;; command names.  When combined with the default Semantic Symbol
+;; Reference API, these two characteristics of TeX code mean that a
+;; command like `xref-find-references' would often fail to find any hits
+;; for a symbol at point, including the one under point in the current
+;; buffer, or it would find only some instances and skip others.
+
+(defun tex-find-references-syntax-table ()
+  (let ((st (if (boundp 'TeX-mode-syntax-table)
+                 (make-syntax-table TeX-mode-syntax-table)
+               (make-syntax-table tex-mode-syntax-table))))
+    st))
+
+(defmacro tex-xref-syntax-function (str beg end)
+  (let* (grpb tempstr
+              (shrtstr (if end
+                           (progn
+                             (setq tempstr (seq-take str (1- (length str))))
+                             (if beg
+                                 (setq tempstr (seq-drop tempstr 1))
+                               tempstr))
+                         (seq-drop str 1)))
+              (grpa (if (and beg end)
+                        (prog1
+                            (list 1 "_")
+                          (setq grpb (list 2 "_")))
+                      (list 1 "_")))
+              (re (concat beg (regexp-quote shrtstr) end))
+              (temp-rule (if grpb
+                             (list re grpa grpb)
+                           (list re grpa))))
+    `(syntax-propertize-rules ,temp-rule)))
+
+(defun tex--collect-file-extensions ()
+  (let* ((mlist (when (rassq major-mode auto-mode-alist)
+		  (seq-filter
+		   (lambda (elt)
+		     (eq (cdr elt) major-mode))
+		   auto-mode-alist)))
+	 (lcsym (intern-soft (downcase (symbol-name major-mode))))
+	 (lclist (and lcsym
+		      (not (eq lcsym major-mode))
+		      (rassq lcsym auto-mode-alist)
+		      (seq-filter
+		       (lambda (elt)
+			 (eq (cdr elt) lcsym))
+		       auto-mode-alist)))
+	 (shortsym (when (stringp mode-name)
+		     (intern-soft (concat (string-trim-right mode-name "/.*")
+					  "-mode"))))
+	 (lcshortsym (when (stringp mode-name)
+		       (intern-soft (downcase
+				     (concat
+				      (string-trim-right mode-name "/.*")
+				      "-mode")))))
+	 (shlist (and shortsym
+		      (not (eq shortsym major-mode))
+		      (not (eq shortsym lcsym))
+		      (rassq shortsym auto-mode-alist)
+		      (seq-filter
+		       (lambda (elt)
+			 (eq (cdr elt) shortsym))
+		       auto-mode-alist)))
+	 (lcshlist (and lcshortsym
+			(not (eq lcshortsym major-mode))
+			(not (eq lcshortsym lcsym))
+			(rassq lcshortsym auto-mode-alist)
+			(seq-filter
+			 (lambda (elt)
+			   (eq (cdr elt) lcshortsym))
+			 auto-mode-alist)))
+	 (exts (when (or mlist lclist shlist lcshlist)
+		 (seq-union (seq-map #'car lclist)
+			    (seq-union (seq-map #'car mlist)
+				       (seq-union (seq-map #'car lcshlist)
+						  (seq-map #'car shlist))))))
+	 (ed-exts (when exts
+		    (seq-map
+		     (lambda (elt)
+		       (concat "*" (string-trim  elt "\\\\" "\\\\'")))
+		     exts))))
+    ed-exts))
+
+(defvar tex--buffers-list nil)
+(defvar-local tex--last-ref-syntax-flag nil)
+(defvar-local tex--old-syntax-function nil)
+
+(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
+  "Find references of IDENTIFIER in TeX buffers and files."
+  (require 'semantic/symref/grep)
+  (let (bufs texbufs
+             (mode major-mode))
+    (dolist (buf (buffer-list))
+      (if (eq (buffer-local-value 'major-mode buf) mode)
+          (push buf bufs)
+        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
+          (push buf texbufs))))
+    (unless (seq-set-equal-p tex--buffers-list bufs)
+      (let* ((amalist (tex--collect-file-extensions))
+	     (extlist (alist-get mode semantic-symref-filepattern-alist))
+	     (extlist-new (seq-uniq
+                           (seq-union amalist extlist #'string-match-p))))
+	(setq tex--buffers-list bufs)
+	(dolist (buf bufs)
+	  (when-let ((fbuf (buffer-file-name buf))
+		     (ext (file-name-extension fbuf))
+		     (finext (concat "*." ext))
+		     ((not (seq-find (lambda (elt) (string-match-p elt finext))
+				     extlist-new)))
+		     ((push finext extlist-new)))))
+	(unless (seq-set-equal-p extlist-new extlist)
+	  (setf (alist-get mode semantic-symref-filepattern-alist)
+                extlist-new))))
+    (let* (setsyntax
+           (punct (with-syntax-table (tex-find-references-syntax-table)
+                    (seq-positions identifier (list ?w ?_)
+			           (lambda (elt sycode)
+			             (not (memq (char-syntax elt) sycode))))))
+           (end (and punct
+                     (memq (1- (length identifier)) punct)
+                     (> (length identifier) 1)
+                     (concat "\\("
+                             (regexp-quote
+                              (string (elt identifier
+                                           (1- (length identifier)))))
+                             "\\)")))
+           (beg (and punct
+                     (memq 0 punct)
+                     (concat "\\("
+                             (regexp-quote (string (elt identifier 0)))
+                             "\\)")))
+           (text-mode-hook
+            (if (or end beg)
+                (progn
+                  (setq setsyntax (lambda ()
+		                    (setq-local syntax-propertize-function
+                                                (eval
+                                                 `(tex-xref-syntax-function
+                                                   ,identifier ,beg ,end)))
+                                    (setq-local TeX-style-hook-applied-p t)))
+                  (cons setsyntax text-mode-hook))
+              text-mode-hook)))
+      (unless (memq 'doctex-mode (derived-mode-all-parents mode))
+        (setq bufs (append texbufs bufs)))
+      (dolist (buf bufs)
+        (with-current-buffer buf
+          (if (or end beg)
+              (progn
+                (unless (local-variable-p 'tex--old-syntax-function)
+                  (setq tex--old-syntax-function syntax-propertize-function))
+                (setq-local syntax-propertize-function
+                            (eval
+                             `(tex-xref-syntax-function
+                               ,identifier ,beg ,end)))
+                (setq syntax-propertize--done 0)
+                (setq tex--last-ref-syntax-flag t))
+            ;; If we've computed a bespoke `syntax-propertize-function'
+            ;; then this returns the buffer to the status quo ante
+            ;; bellum on the next invocation of M-? that searches it.
+            (when tex--last-ref-syntax-flag
+              (setq-local syntax-propertize-function
+                          (eval
+                           `(tex-xref-syntax-function
+                             ,identifier nil nil)))
+              (setq syntax-propertize--done 0)))))
+      (unwind-protect
+          (xref-backend-references nil identifier)
+        (dolist (buf bufs)
+          (with-current-buffer buf
+            (when buffer-file-truename
+              (if (or end beg)
+                  (setq-local syntax-propertize-function
+                              tex--old-syntax-function)
+                (when tex--last-ref-syntax-flag
+                  (setq-local syntax-propertize-function
+                              tex--old-syntax-function)
+                  (setq tex--last-ref-syntax-flag nil))))))))))
+
 (make-obsolete-variable 'tex-mode-load-hook
                         "use `with-eval-after-load' instead." "28.1")
 (run-hooks 'tex-mode-load-hook)
-- 
2.35.8


  parent reply	other threads:[~2024-04-29 14:15 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-03 15:09 bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-21  2:11 ` Dmitry Gutov
2022-02-21  9:48   ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-21 17:28     ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-21 23:56       ` Dmitry Gutov
2022-02-22 15:19         ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-23  2:21           ` Dmitry Gutov
2022-02-23 10:45             ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-24  2:23               ` Dmitry Gutov
2022-02-24 13:15                 ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-21 23:55     ` Dmitry Gutov
2022-09-08 13:25   ` Lars Ingebrigtsen
2022-09-08 13:34     ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-08 13:39       ` Lars Ingebrigtsen
2022-09-08 15:50         ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-03  9:08           ` Stefan Kangas
2023-09-03 10:03             ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-03 10:46               ` Stefan Kangas
2023-09-13 11:10                 ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-13 13:42                   ` Stefan Kangas
2023-09-13 15:23                   ` Dmitry Gutov
2023-09-13 17:01                     ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-13 23:59                       ` Dmitry Gutov
2023-09-14  6:10                         ` Eli Zaretskii
2023-09-15 18:45                           ` Tassilo Horn
2023-09-16  5:53                             ` Ikumi Keita
2023-09-17  8:49                               ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-22 13:06                                 ` Arash Esbati
2024-04-22 14:56                                   ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-22 16:15                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-22 16:37                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-22 17:16                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-22 17:25                                           ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-24  0:09                                           ` Dmitry Gutov
2024-04-24  9:02                                             ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-23 12:04                                     ` Arash Esbati
2024-04-23 13:21                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-29 14:15                                   ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2024-05-02  0:43                                     ` Dmitry Gutov
2024-05-02 13:32                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-03 13:42                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-07  2:27                                           ` Dmitry Gutov
2024-05-09  3:00                                             ` Dmitry Gutov
2024-05-09  6:38                                               ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-09 10:49                                               ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-13 20:54                                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-14 21:24                                                 ` Dmitry Gutov
2024-05-16 18:18                                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-07  2:06                                         ` Dmitry Gutov
2024-05-02  6:47                                     ` Arash Esbati
2024-05-02 13:34                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-03 14:10                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-04  8:26                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-04 14:32                                       ` Arash Esbati
2024-05-04 14:54                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-04 21:15                                           ` Arash Esbati
2024-05-07 13:15                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-15 15:47                                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-16  7:53                                         ` Arash Esbati
2024-05-16 12:56                                           ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-14 16:11                         ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-14 23:55                           ` Dmitry Gutov
2023-09-15  6:47                             ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-13 19:16                     ` Eli Zaretskii
2023-09-13 20:25                       ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-14  5:14                         ` Eli Zaretskii
2022-02-21 12:35 ` Arash Esbati
2022-02-21 14:03   ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-25 20:16 ` Augusto Stoffel
2022-02-26  9:29   ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-26 10:56     ` Augusto Stoffel
2022-02-27 18:42       ` Arash Esbati
2022-02-28  9:09         ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-28 11:54           ` Arash Esbati
2022-02-28 13:11             ` Augusto Stoffel
2022-02-28 19:04               ` Arash Esbati
2022-03-01  8:46                 ` David Fussner via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-28 13:05           ` Augusto Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADF+RtiOECRmdoFD1qP=gRzS+P3tJqh_WNTnuW-1zPrY9g4VBw@mail.gmail.com' \
    --to=bug-gnu-emacs@gnu.org \
    --cc=53749@debbugs.gnu.org \
    --cc=arash@gnu.org \
    --cc=dfussner@googlemail.com \
    --cc=dgutov@yandex.ru \
    --cc=eliz@gnu.org \
    --cc=ikumi@ikumi.que.jp \
    --cc=monnier@iro.umontreal.ca \
    --cc=stefankangas@gmail.com \
    --cc=tsdh@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).