all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Tassilo Horn <tsdh@gnu.org>
To: emacs-devel@gnu.org
Subject: Removing no-back-reference restriction from syntax-propertize-rules
Date: Sat, 16 May 2020 10:39:54 +0200	[thread overview]
Message-ID: <87wo5cff39.fsf@gnu.org> (raw)

Hi all,

right now, the docstring of `syntax-propertize-rules' states that
back-references aren't supported (which is true).  I don't see why that
has to be the case.  It already shifts numbered groups as needed, so why
can't it simply shift back-references, too?

The following patch does that:

--8<---------------cut here---------------start------------->8---
modified   lisp/emacs-lisp/syntax.el
@@ -139,14 +139,16 @@ syntax-propertize-multiline
 		  (point-max))))
   (cons beg end))
 
-(defun syntax-propertize--shift-groups (re n)
-  (replace-regexp-in-string
-   "\\\\(\\?\\([0-9]+\\):"
-   (lambda (s)
-     (replace-match
-      (number-to-string (+ n (string-to-number (match-string 1 s))))
-      t t s 1))
-   re t t))
+(defun syntax-propertize--shift-groups-and-backrefs (re n)
+  (let ((incr (lambda (s)
+                (replace-match
+                 (number-to-string
+                  (+ n (string-to-number (match-string 1 s))))
+                 t t s 1))))
+    (replace-regexp-in-string
+     "[^\\]\\\\\\([0-9]+\\)" incr
+     (replace-regexp-in-string "\\\\(\\?\\([0-9]+\\):" incr re t t)
+     t t)))
 
 (defmacro syntax-propertize-precompile-rules (&rest rules)
   "Return a precompiled form of RULES to pass to `syntax-propertize-rules'.
@@ -188,9 +190,7 @@ syntax-propertize-rules
 The SYNTAX expression is responsible to save the `match-data' if needed
 for subsequent HIGHLIGHTs.
 Also SYNTAX is free to move point, in which case RULES may not be applied to
-some parts of the text or may be applied several times to other parts.
-
-Note: back-references in REGEXPs do not work."
+some parts of the text or may be applied several times to other parts."
   (declare (debug (&rest &or symbolp    ;FIXME: edebug this eval step.
                          (form &rest
                                (numberp
@@ -219,7 +219,7 @@ syntax-propertize-rules
                  ;; tell when *this* match 0 has succeeded.
                  (cl-incf offset)
                  (setq re (concat "\\(" re "\\)")))
-               (setq re (syntax-propertize--shift-groups re offset))
+               (setq re (syntax-propertize--shift-groups-and-backrefs re offset))
                (let ((code '())
                      (condition
                       (cond
--8<---------------cut here---------------end--------------->8---

I've tested it with some simple rules, e.g.,

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

and the properties are applied correctly and the code of the generated
function looks correct, i.e., the second back-reference is rewritten to
\\4 which is the right group \\(three\\) in the combinded regexp.

Am I thinking too naively?  Is there something I'm missing out?

Well, I also found a non-working case:

--8<---------------cut here---------------start------------->8---
(defun test-syntax-propertize-with-backrefs ()
  (interactive)
  (setq-local syntax-propertize-function
              (syntax-propertize-rules
               ("\\(one\\)\\(two\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(three\\)\\(four\\)\\(\\1\\)" (1 "|") (2 "_") (3 "|"))
               ("\\(?10:five\\)\\(six\\)\\(\\10\\)" (10 "|") (2 "_") (3 "|"))))
  (setq-local syntax-propertize--done -1)
  (syntax-propertize (point-max)))
--8<---------------cut here---------------end--------------->8---

Syntactically, this seems to do the right thing.  The numbered group
becomes \\(?16:five\\) with back-reference \\(\\16\\).  However, it will
never match.  With a buffer with contents

--8<---------------cut here---------------start------------->8---
onetwoone test bla bla threefourthree bla quux fivesixfive threefourthree.
--8<---------------cut here---------------end--------------->8---

firing up re-builder with the constructed regexp

  "\\(one\\)\\(two\\)\\(\\1\\)\\|\\(three\\)\\(four\\)\\(\\4\\)\\|\\(?16:five\\)\\(six\\)\\(\\16\\)"

will not highlight fivesixfive, and re-search-forward doesn't stop at
it.  So is it true that back-references to explicitly numbered groups
don't work at all?

Bye,
Tassilo



             reply	other threads:[~2020-05-16  8:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-16  8:39 Tassilo Horn [this message]
2020-05-16 13:17 ` Removing no-back-reference restriction from syntax-propertize-rules Stefan Monnier
2020-05-16 13:56   ` Tassilo Horn
2020-05-17  2:41     ` Stefan Monnier
2020-05-17 23:57 ` Stefan Monnier
2020-05-18 18:20   ` Tassilo Horn
2020-05-18 19:30     ` Stefan Monnier
2020-05-18 21:30       ` Tassilo Horn
2020-05-19  2:58         ` Stefan Monnier
2020-05-19 13:28           ` Tassilo Horn
2020-05-19 15:06             ` Stefan Monnier
2020-05-19 18:54               ` Tassilo Horn
2020-05-19 18:55                 ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wo5cff39.fsf@gnu.org \
    --to=tsdh@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.