unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Removing no-back-reference restriction from syntax-propertize-rules
@ 2020-05-16  8:39 Tassilo Horn
  2020-05-16 13:17 ` Stefan Monnier
  2020-05-17 23:57 ` Stefan Monnier
  0 siblings, 2 replies; 13+ messages in thread
From: Tassilo Horn @ 2020-05-16  8:39 UTC (permalink / raw)
  To: emacs-devel

Hi all,

right now, the docstring of `syntax-propertize-rules' states that
back-references aren't supported (which is true).  I don't see why that
has to be the case.  It already shifts numbered groups as needed, so why
can't it simply shift back-references, too?

The following patch does that:

--8<---------------cut here---------------start------------->8---
modified   lisp/emacs-lisp/syntax.el
@@ -139,14 +139,16 @@ syntax-propertize-multiline
 		  (point-max))))
   (cons beg end))
 
-(defun syntax-propertize--shift-groups (re n)
-  (replace-regexp-in-string
-   "\\\\(\\?\\([0-9]+\\):"
-   (lambda (s)
-     (replace-match
-      (number-to-string (+ n (string-to-number (match-string 1 s))))
-      t t s 1))
-   re t t))
+(defun syntax-propertize--shift-groups-and-backrefs (re n)
+  (let ((incr (lambda (s)
+                (replace-match
+                 (number-to-string
+                  (+ n (string-to-number (match-string 1 s))))
+                 t t s 1))))
+    (replace-regexp-in-string
+     "[^\\]\\\\\\([0-9]+\\)" incr
+     (replace-regexp-in-string "\\\\(\\?\\([0-9]+\\):" incr re t t)
+     t t)))
 
 (defmacro syntax-propertize-precompile-rules (&rest rules)
   "Return a precompiled form of RULES to pass to `syntax-propertize-rules'.
@@ -188,9 +190,7 @@ syntax-propertize-rules
 The SYNTAX expression is responsible to save the `match-data' if needed
 for subsequent HIGHLIGHTs.
 Also SYNTAX is free to move point, in which case RULES may not be applied to
-some parts of the text or may be applied several times to other parts.
-
-Note: back-references in REGEXPs do not work."
+some parts of the text or may be applied several times to other parts."
   (declare (debug (&rest &or symbolp    ;FIXME: edebug this eval step.
                          (form &rest
                                (numberp
@@ -219,7 +219,7 @@ syntax-propertize-rules
                  ;; tell when *this* match 0 has succeeded.
                  (cl-incf offset)
                  (setq re (concat "\\(" re "\\)")))
-               (setq re (syntax-propertize--shift-groups re offset))
+               (setq re (syntax-propertize--shift-groups-and-backrefs re offset))
                (let ((code '())
                      (condition
                       (cond
--8<---------------cut here---------------end--------------->8---

I've tested it with some simple rules, e.g.,

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

and the properties are applied correctly and the code of the generated
function looks correct, i.e., the second back-reference is rewritten to
\\4 which is the right group \\(three\\) in the combinded regexp.

Am I thinking too naively?  Is there something I'm missing out?

Well, I also found a non-working case:

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(?10:five\\)\\(six\\)\\(\\10\\)" (10 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

Syntactically, this seems to do the right thing.  The numbered group
becomes \\(?16:five\\) with back-reference \\(\\16\\).  However, it will
never match.  With a buffer with contents

--8<---------------cut here---------------start------------->8---
onetwoone test bla bla threefourthree bla quux fivesixfive threefourthree.
--8<---------------cut here---------------end--------------->8---

firing up re-builder with the constructed regexp

  "\\(one\\)\\(two\\)\\(\\1\\)\\|\\(three\\)\\(four\\)\\(\\4\\)\\|\\(?16:five\\)\\(six\\)\\(\\16\\)"

will not highlight fivesixfive, and re-search-forward doesn't stop at
it.  So is it true that back-references to explicitly numbered groups
don't work at all?

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-19 18:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-16  8:39 Removing no-back-reference restriction from syntax-propertize-rules Tassilo Horn
2020-05-16 13:17 ` Stefan Monnier
2020-05-16 13:56   ` Tassilo Horn
2020-05-17  2:41     ` Stefan Monnier
2020-05-17 23:57 ` Stefan Monnier
2020-05-18 18:20   ` Tassilo Horn
2020-05-18 19:30     ` Stefan Monnier
2020-05-18 21:30       ` Tassilo Horn
2020-05-19  2:58         ` Stefan Monnier
2020-05-19 13:28           ` Tassilo Horn
2020-05-19 15:06             ` Stefan Monnier
2020-05-19 18:54               ` Tassilo Horn
2020-05-19 18:55                 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).