unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#64019: 29.0.91; Fix some tree-sitter :match regexps
@ 2023-06-12 14:25 Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-06-12 14:48 ` Mattias Engdegård
  2023-06-12 21:33 ` Dmitry Gutov
  0 siblings, 2 replies; 29+ messages in thread
From: Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-06-12 14:25 UTC (permalink / raw)
  To: 64019; +Cc: Yuan Fu, Theodor Thornhill, Randy Taylor, Daniel Colascione

[-- Attachment #1: Type: text/plain, Size: 549 bytes --]

Tags: patch

With help from modified versions of the xr and relint packages,
I noticed some suspicious regexps in the new tree-sitter modes:

- Using bol/eol anchors where matching is performed against the whole
  node text
- Shy groups probably mistyped/copied as :? instead of ?:
- Identifiers defined as comprising [A-Z_\\d], where I assume \d was
  meant to match digits, but instead matches '\' or 'd'
- Unnecessary numbered grouping

These all occur in new Emacs 29 features, so the patch is intended for
emacs-29.

WDYT?

Thanks,

-- 
Basil


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-some-tree-sitter-match-regexps.patch --]
[-- Type: text/x-diff, Size: 10831 bytes --]

From df7e575393c44976a4389641975f9e4aab7efb24 Mon Sep 17 00:00:00 2001
From: "Basil L. Contovounesios" <contovob@tcd.ie>
Date: Wed, 7 Jun 2023 12:26:25 +0100
Subject: [PATCH] Fix some tree-sitter :match regexps

Some of these issues were caught by modified versions of the
GNU ELPA packages xr and relint:
- https://github.com/mattiase/xr/pull/6
- https://github.com/mattiase/relint/pull/14

* lisp/progmodes/c-ts-mode.el (c-ts-mode--font-lock-settings)
(c-ts-mode--c-or-c++-regexp):
* lisp/progmodes/cmake-ts-mode.el
(cmake-ts-mode--font-lock-settings):
* lisp/progmodes/java-ts-mode.el (java-ts-mode--font-lock-settings):
* lisp/progmodes/js.el (js--plain-method-re):
(js--treesit-font-lock-settings):
* lisp/progmodes/python.el (python--treesit-settings):
* lisp/progmodes/rust-ts-mode.el (rust-ts-mode--font-lock-settings):
* lisp/progmodes/sh-script.el (sh-mode--treesit-settings):
* lisp/progmodes/typescript-ts-mode.el
(typescript-ts-mode--font-lock-settings):
* test/src/treesit-tests.el (treesit-query-api): Replace occurrences
of [\\d], which matches '\' or 'd', with the most likely intention
[0-9].  Anchor :match regexps at beginning/end of string, not line.
Fix shy groups mistyped as optional colon.
---
 lisp/progmodes/c-ts-mode.el          |  4 ++--
 lisp/progmodes/cmake-ts-mode.el      |  3 ++-
 lisp/progmodes/java-ts-mode.el       |  4 ++--
 lisp/progmodes/js.el                 |  6 +++---
 lisp/progmodes/python.el             |  2 +-
 lisp/progmodes/rust-ts-mode.el       | 19 +++++++++++--------
 lisp/progmodes/sh-script.el          |  2 +-
 lisp/progmodes/typescript-ts-mode.el |  4 ++--
 test/src/treesit-tests.el            |  4 ++--
 9 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index c6cb9520e58..4775cbd724d 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -701,7 +701,7 @@ c-ts-mode--font-lock-settings
    `(((call_expression
        (call_expression function: (identifier) @fn)
        @c-ts-mode--fontify-DEFUN)
-      (:match "^DEFUN$" @fn))
+      (:match "\\`DEFUN\\'" @fn))
 
      ((function_definition type: (_) @for-each-tail)
       @c-ts-mode--fontify-for-each-tail
@@ -1319,7 +1319,7 @@ c-ts-mode--c-or-c++-regexp
               "\\|" id "::"
               "\\|" id ws-maybe "=\\)"
               "\\|" "\\(?:inline" ws "\\)?namespace"
-              "\\(:?" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
+              "\\(?:" ws "\\(?:" id "::\\)*" id "\\)?" ws-maybe "{"
               "\\|" "class"     ws id
               "\\(?:" ws "final" "\\)?" ws-maybe "[:{;\n]"
               "\\|" "struct"     ws id "\\(?:" ws "final" ws-maybe "[:{\n]"
diff --git a/lisp/progmodes/cmake-ts-mode.el b/lisp/progmodes/cmake-ts-mode.el
index d83a956af21..9d35d8077bd 100644
--- a/lisp/progmodes/cmake-ts-mode.el
+++ b/lisp/progmodes/cmake-ts-mode.el
@@ -134,7 +134,8 @@ cmake-ts-mode--font-lock-settings
    :language 'cmake
    :feature 'number
    '(((unquoted_argument) @font-lock-number-face
-      (:match "^[[:digit:]]*\\.?[[:digit:]]*\\.?[[:digit:]]+$" @font-lock-number-face)))
+      (:match "\\`[[:digit:]]*\\.?[[:digit:]]*\\.?[[:digit:]]+\\'"
+              @font-lock-number-face)))
 
    :language 'cmake
    :feature 'string
diff --git a/lisp/progmodes/java-ts-mode.el b/lisp/progmodes/java-ts-mode.el
index 44dfd74cafd..7f2fc4188a3 100644
--- a/lisp/progmodes/java-ts-mode.el
+++ b/lisp/progmodes/java-ts-mode.el
@@ -168,7 +168,7 @@ java-ts-mode--font-lock-settings
    :override t
    :feature 'constant
    `(((identifier) @font-lock-constant-face
-      (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
+      (:match "\\`[A-Z_][0-9A-Z_]*\\'" @font-lock-constant-face))
      [(true) (false)] @font-lock-constant-face)
    :language 'java
    :override t
@@ -237,7 +237,7 @@ java-ts-mode--font-lock-settings
      (scoped_identifier (identifier) @font-lock-constant-face)
 
      ((scoped_identifier name: (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
 
      (type_identifier) @font-lock-type-face
 
diff --git a/lisp/progmodes/js.el b/lisp/progmodes/js.el
index 52ed19cc682..48fecf69537 100644
--- a/lisp/progmodes/js.el
+++ b/lisp/progmodes/js.el
@@ -106,7 +106,7 @@ js--opt-cpp-start
 
 (defconst js--plain-method-re
   (concat "^\\s-*?\\(" js--dotted-name-re "\\)\\.prototype"
-          "\\.\\(" js--name-re "\\)\\s-*?=\\s-*?\\(\\(:?async[ \t\n]+\\)function\\)\\_>")
+          "\\.\\(" js--name-re "\\)\\s-*?=\\s-*?\\(\\(?:async[ \t\n]+\\)function\\)\\_>")
   "Regexp matching an explicit JavaScript prototype \"method\" declaration.
 Group 1 is a (possibly-dotted) class name, group 2 is a method name,
 and group 3 is the `function' keyword.")
@@ -3493,7 +3493,7 @@ js--treesit-font-lock-settings
    :language 'javascript
    :feature 'constant
    '(((identifier) @font-lock-constant-face
-      (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
+      (:match "\\`[A-Z_][0-9A-Z_]*\\'" @font-lock-constant-face))
 
      [(true) (false) (null)] @font-lock-constant-face)
 
@@ -3612,7 +3612,7 @@ js--treesit-font-lock-settings
    :feature 'number
    '((number) @font-lock-number-face
      ((identifier) @font-lock-number-face
-      (:match "^\\(:?NaN\\|Infinity\\)$" @font-lock-number-face)))
+      (:match "\\`\\(?:NaN\\|Infinity\\)\\'" @font-lock-number-face)))
 
    :language 'javascript
    :feature 'operator
diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index fd196df7550..d9ca37145e1 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -1106,7 +1106,7 @@ python--treesit-settings
    :language 'python
    `([,@python--treesit-keywords] @font-lock-keyword-face
      ((identifier) @font-lock-keyword-face
-      (:match "^self$" @font-lock-keyword-face)))
+      (:match "\\`self\\'" @font-lock-keyword-face)))
 
    :feature 'definition
    :language 'python
diff --git a/lisp/progmodes/rust-ts-mode.el b/lisp/progmodes/rust-ts-mode.el
index c3cf8d0cf44..999c1d7ae96 100644
--- a/lisp/progmodes/rust-ts-mode.el
+++ b/lisp/progmodes/rust-ts-mode.el
@@ -143,7 +143,7 @@ rust-ts-mode--font-lock-settings
                               eol))
                       @font-lock-builtin-face)))
      ((identifier) @font-lock-type-face
-      (:match "^\\(:?Err\\|Ok\\|None\\|Some\\)$" @font-lock-type-face)))
+      (:match "\\`\\(?:Err\\|Ok\\|None\\|Some\\)\\'" @font-lock-type-face)))
 
    :language 'rust
    :feature 'comment
@@ -212,11 +212,11 @@ rust-ts-mode--font-lock-settings
      (scoped_use_list path: (scoped_identifier
                              name: (identifier) @font-lock-constant-face))
      ((use_as_clause alias: (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      ((use_as_clause path: (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      ((use_list (identifier) @font-lock-type-face)
-      (:match "^[A-Z]" @font-lock-type-face))
+      (:match "\\`[A-Z]" @font-lock-type-face))
      (use_wildcard [(identifier) @rust-ts-mode--fontify-scope
                     (scoped_identifier
                      name: (identifier) @rust-ts-mode--fontify-scope)])
@@ -232,9 +232,12 @@ rust-ts-mode--font-lock-settings
      (type_identifier) @font-lock-type-face
      ((scoped_identifier name: (identifier) @rust-ts-mode--fontify-tail))
      ((scoped_identifier path: (identifier) @font-lock-type-face)
-      (:match
-       "^\\(u8\\|u16\\|u32\\|u64\\|u128\\|usize\\|i8\\|i16\\|i32\\|i64\\|i128\\|isize\\|char\\|str\\)$"
-       @font-lock-type-face))
+      (:match ,(rx bos
+                   (or "u8" "u16" "u32" "u64" "u128" "usize"
+                       "i8" "i16" "i32" "i64" "i128" "isize"
+                       "char" "str")
+                   eos)
+              @font-lock-type-face))
      ((scoped_identifier path: (identifier) @rust-ts-mode--fontify-scope))
      ((scoped_type_identifier path: (identifier) @rust-ts-mode--fontify-scope))
      (type_identifier) @font-lock-type-face)
@@ -249,7 +252,7 @@ rust-ts-mode--font-lock-settings
    :feature 'constant
    `((boolean_literal) @font-lock-constant-face
      ((identifier) @font-lock-constant-face
-      (:match "^[A-Z][A-Z\\d_]*$" @font-lock-constant-face)))
+      (:match "\\`[A-Z][0-9A-Z_]*\\'" @font-lock-constant-face)))
 
    :language 'rust
    :feature 'variable
diff --git a/lisp/progmodes/sh-script.el b/lisp/progmodes/sh-script.el
index 54da1e0468e..9bc1f4dcfdc 100644
--- a/lisp/progmodes/sh-script.el
+++ b/lisp/progmodes/sh-script.el
@@ -3363,7 +3363,7 @@ sh-mode--treesit-settings
    :feature 'number
    :language 'bash
    `(((word) @font-lock-number-face
-      (:match "^[0-9]+$" @font-lock-number-face)))
+      (:match "\\`[0-9]+\\'" @font-lock-number-face)))
 
    :feature 'bracket
    :language 'bash
diff --git a/lisp/progmodes/typescript-ts-mode.el b/lisp/progmodes/typescript-ts-mode.el
index 1c19a031878..68aefd90f92 100644
--- a/lisp/progmodes/typescript-ts-mode.el
+++ b/lisp/progmodes/typescript-ts-mode.el
@@ -153,7 +153,7 @@ typescript-ts-mode--font-lock-settings
    :language language
    :feature 'constant
    `(((identifier) @font-lock-constant-face
-      (:match "^[A-Z_][A-Z_\\d]*$" @font-lock-constant-face))
+      (:match "\\`[A-Z_][0-9A-Z_]*\\'" @font-lock-constant-face))
      [(true) (false) (null)] @font-lock-constant-face)
 
    :language language
@@ -311,7 +311,7 @@ typescript-ts-mode--font-lock-settings
    :feature 'number
    `((number) @font-lock-number-face
      ((identifier) @font-lock-number-face
-      (:match "^\\(:?NaN\\|Infinity\\)$" @font-lock-number-face)))
+      (:match "\\`\\(?:NaN\\|Infinity\\)\\'" @font-lock-number-face)))
 
    :language language
    :feature 'operator
diff --git a/test/src/treesit-tests.el b/test/src/treesit-tests.el
index fef603840f9..69db37fc0b4 100644
--- a/test/src/treesit-tests.el
+++ b/test/src/treesit-tests.el
@@ -368,14 +368,14 @@ treesit-query-api
                ;; String query.
                '("(string) @string
 (pair key: (_) @keyword)
-((_) @bob (#match \"^B.b$\" @bob))
+((_) @bob (#match \"\\\\`B.b\\\\'\" @bob))
 (number) @number
 ((number) @n3 (#equal \"3\" @n3))
 ((number) @n3p (#pred treesit--ert-pred-last-sibling @n3p))"
                  ;; Sexp query.
                  ((string) @string
                   (pair key: (_) @keyword)
-                  ((_) @bob (:match "^B.b$" @bob))
+                  ((_) @bob (:match "\\`B.b\\'" @bob))
                   (number) @number
                   ((number) @n3 (:equal "3" @n3))
                   ((number) @n3p (:pred treesit--ert-pred-last-sibling
-- 
2.34.1


[-- Attachment #3: Type: text/plain, Size: 3258 bytes --]


In GNU Emacs 29.0.91 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.16.0, Xaw3d scroll bars) of 2023-06-12 built on blc
Repository revision: bdb0bc2b4e44a7d40369e10e3de825d58fe46825
Repository branch: wt/emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Ubuntu 22.04.2 LTS

Configured using:
 'configure CC=gcc-12 'CFLAGS=-Og -ggdb3' --prefix=/home/bic/.local
 --with-program-suffix=-29 --with-file-notification=yes --with-x
 --with-x-toolkit=lucid'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP X11 XAW3D XDBE XIM XINPUT2 XPM
LUCID ZLIB

Important settings:
  value of $LC_MONETARY: en_IE.UTF-8
  value of $LC_NUMERIC: en_IE.UTF-8
  value of $LC_TIME: en_IE.UTF-8
  value of $LANG: en_GB.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting cairo x-toolkit
xinput2 x multi-tty make-network-process emacs)

Memory information:
((conses 16 36709 7363)
 (symbols 48 5149 0)
 (strings 32 13887 1551)
 (string-bytes 1 379631)
 (vectors 16 9301)
 (vector-slots 8 148632 9511)
 (floats 8 23 25)
 (intervals 56 248 0)
 (buffers 984 10))

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-07-30 16:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 14:25 bug#64019: 29.0.91; Fix some tree-sitter :match regexps Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-12 14:48 ` Mattias Engdegård
2023-06-12 15:10   ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-12 15:37     ` Eli Zaretskii
2023-06-12 20:22       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-13  2:36         ` Eli Zaretskii
2023-06-13 13:51           ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-12 16:07     ` Mattias Engdegård
2023-06-12 21:39       ` Dmitry Gutov
2023-06-13  7:47         ` Mattias Engdegård
2023-06-13 17:06           ` Dmitry Gutov
2023-06-12 21:33 ` Dmitry Gutov
2023-06-13  2:37   ` Eli Zaretskii
2023-06-13 14:08   ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-13 17:06     ` Dmitry Gutov
2023-06-15 17:17     ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17  2:00       ` Dmitry Gutov
2023-06-17  6:48         ` Andreas Schwab
2023-06-17  8:39           ` Mattias Engdegård
2023-06-17 12:21             ` Mattias Engdegård
2023-06-17 15:44             ` Dmitry Gutov
2023-06-17 16:13               ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 16:34                 ` Eli Zaretskii
2023-06-17 16:56                   ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 17:35                     ` Eli Zaretskii
2023-06-17 19:54                       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-30 12:46                       ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-30 12:59                         ` Eli Zaretskii
2023-07-30 16:04                           ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).