From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Tassilo Horn Newsgroups: gmane.emacs.devel Subject: Re: Removing no-back-reference restriction from syntax-propertize-rules Date: Mon, 18 May 2020 23:30:32 +0200 Message-ID: <87r1vh2ao7.fsf@gnu.org> References: <87wo5cff39.fsf@gnu.org> <87tv0dayv1.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="60432"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon May 18 23:31:18 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1janLy-000Fdy-49 for ged-emacs-devel@m.gmane-mx.org; Mon, 18 May 2020 23:31:18 +0200 Original-Received: from localhost ([::1]:52126 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1janLx-0006u1-2b for ged-emacs-devel@m.gmane-mx.org; Mon, 18 May 2020 17:31:17 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38514) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1janLR-0006OG-BL for emacs-devel@gnu.org; Mon, 18 May 2020 17:30:45 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37514) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1janLQ-00083D-4Q; Mon, 18 May 2020 17:30:44 -0400 Original-Received: from auth1-smtp.messagingengine.com ([66.111.4.227]:44041) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.82) (envelope-from ) id 1janLI-0002Jg-1W; Mon, 18 May 2020 17:30:39 -0400 Original-Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailauth.nyi.internal (Postfix) with ESMTP id 830A027C0054; Mon, 18 May 2020 17:30:35 -0400 (EDT) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute7.internal (MEProxy); Mon, 18 May 2020 17:30:35 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedruddthedgudeitdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgesthdtredttdertdenucfhrhhomhepvfgrshhs ihhlohcujfhorhhnuceothhsughhsehgnhhurdhorhhgqeenucggtffrrghtthgvrhhnpe dtleeiffekueffudeufeefhfeitdejuedtueevgeffgfdtjeelkeehgeekteekveenucfk phepkeegrddufedvrdduuddtrddukeejnecuvehluhhsthgvrhfuihiivgeptdenucfrrg hrrghmpehmrghilhhfrhhomhepthhhohhrnhdomhgvshhmthhprghuthhhphgvrhhsohhn rghlihhthidqkeeijeefkeejkeegqdeifeehvdelkedqthhsughhpeepghhnuhdrohhrgh esfhgrshhtmhgrihhlrdhfmh X-ME-Proxy: Original-Received: from thinkpad-t440p (p54846ebb.dip0.t-ipconnect.de [84.132.110.187]) by mail.messagingengine.com (Postfix) with ESMTPA id 5CE8B3280064; Mon, 18 May 2020 17:30:34 -0400 (EDT) Mail-Followup-To: Stefan Monnier , emacs-devel@gnu.org In-Reply-To: (Stefan Monnier's message of "Mon, 18 May 2020 15:30:32 -0400") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:250832 Archived-At: Stefan Monnier writes: >> Can you give an example regexp where \N preceeded by a non-\ is no >> back-reference (and still valid)? > > Of course: "bar\\(foo\\)[\\1-9]". Oh, right. >> BTW, do I read the docs right in that there are at most nine >> back-references, i.e., \10 cannot exist? In that case, we'd have the >> restriction that at most 9 back-references may appear in all syntax >> rules. > > Apparently, yes: > > (string-match "\\(?5:[ab]\\)-\\5" "a-a") > 0 (#o0, #x0, ?\C-@) > ELISP> (string-match "\\(?15:[ab]\\)-\\15" "a-a") > nil > > [ I guess that's another reason to stay away from backreferences. ] Ah, so my "back-refs to explicitly numbered groups don't work at all" issue was actually that I've used a bigger number than 9. >> I guess in that case we should signal an error, no? > > Indeed. Ok, will do. >> (when (save-match-data >> ;; With \N, the \ must be in a subregexp context and the >> ;; N must not be in a subregexp context. >> (and (subregexp-context-p new-re (match-beginning 0)) >> (not (subregexp-context-p new-re (match-beginning 1))))) > > You don't need/want to test (subregexp-context-p new-re (match-beginning 1)). Ok. So all in all, this should give the following patch: --8<---------------cut here---------------start------------->8--- scratch/syntax-propertize-rules-with-backrefs ba3eee275640d453ffee9f6d9768be1ebd73d51b Author: Tassilo Horn AuthorDate: Sat May 16 10:05:12 2020 +0200 Commit: Tassilo Horn CommitDate: Mon May 18 23:14:49 2020 +0200 Parent: ca7224d5db Add test for recent buffer-local-variables change Merged: emacs-27 feature/browse-url-browser-kind master scratch/syntax-propertize-rules-with-backrefs Contained: scratch/syntax-propertize-rules-with-backrefs Follows: emacs-27.0.91 (945) Allow back-references in syntax-propertize-rules. * lisp/emacs-lisp/syntax.el (syntax-propertize--shift-groups-and-backrefs): Renamed from syntax-propertize--shift-groups, and also shift back-references. (syntax-propertize-rules): Adapt docstring and use renamed function. 1 file changed, 25 insertions(+), 10 deletions(-) lisp/emacs-lisp/syntax.el | 35 +++++++++++++++++++++++++---------- modified lisp/emacs-lisp/syntax.el @@ -139,14 +139,28 @@ syntax-propertize-multiline (point-max)))) (cons beg end)) -(defun syntax-propertize--shift-groups (re n) - (replace-regexp-in-string - "\\\\(\\?\\([0-9]+\\):" - (lambda (s) - (replace-match - (number-to-string (+ n (string-to-number (match-string 1 s)))) - t t s 1)) - re t t)) +(defun syntax-propertize--shift-groups-and-backrefs (re n) + (let ((new-re (replace-regexp-in-string + "\\\\(\\?\\([0-9]+\\):" + (lambda (s) + (replace-match + (number-to-string + (+ n (string-to-number (match-string 1 s)))) + t t s 1)) + re t t)) + (pos 0)) + (while (string-match "\\\\\\([0-9]+\\)" new-re pos) + (setq pos (+ 1 (match-beginning 1))) + (when (save-match-data + ;; With \N, the \ must be in a subregexp context, i.e., + ;; not in a character class or in a \{\} repetition. + (subregexp-context-p new-re (match-beginning 0))) + (let ((shifted (+ n (string-to-number (match-string 1 new-re))))) + (when (> shifted 9) + (error "There may be at most nine back-references")) + (setq new-re (replace-match (number-to-string shifted) + t t new-re 1))))) + new-re)) (defmacro syntax-propertize-precompile-rules (&rest rules) "Return a precompiled form of RULES to pass to `syntax-propertize-rules'. @@ -190,7 +204,8 @@ syntax-propertize-rules Also SYNTAX is free to move point, in which case RULES may not be applied to some parts of the text or may be applied several times to other parts. -Note: back-references in REGEXPs do not work." +Note: There may be at most nine back-references in the REGEXPs of +all RULES in total." (declare (debug (&rest &or symbolp ;FIXME: edebug this eval step. (form &rest (numberp @@ -219,7 +234,7 @@ syntax-propertize-rules ;; tell when *this* match 0 has succeeded. (cl-incf offset) (setq re (concat "\\(" re "\\)"))) - (setq re (syntax-propertize--shift-groups re offset)) + (setq re (syntax-propertize--shift-groups-and-backrefs re offset)) (let ((code '()) (condition (cond --8<---------------cut here---------------end--------------->8--- Seems to work fine and errors as soon as a back-reference needs to be renumbered to \10 or more. Good to go? Bye, Tassilo