From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#37659: rx additions: anychar, unmatchable, unordered-or Date: Tue, 11 Feb 2020 13:57:27 +0100 Message-ID: <2F3A70C9-969B-4E86-998E-BF3CC990B769@acm.org> References: <88571301-3F15-428F-82F9-60A23D817EF8@acm.org> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_852F7524-6B33-46E1-8607-3F3982206465" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="40498"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 37659@debbugs.gnu.org To: Paul Eggert , Phil Sainty , Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Feb 11 13:58:18 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1j1V7J-000AS3-Mn for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 11 Feb 2020 13:58:17 +0100 Original-Received: from localhost ([::1]:48902 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1V7I-0007kt-Pe for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 11 Feb 2020 07:58:16 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60786) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j1V76-0007kX-A9 for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2020 07:58:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j1V75-0005ki-7n for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2020 07:58:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:50059) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j1V73-0005kE-TS for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2020 07:58:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1j1V73-0007aJ-Se for bug-gnu-emacs@gnu.org; Tue, 11 Feb 2020 07:58:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 11 Feb 2020 12:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 37659 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 37659-submit@debbugs.gnu.org id=B37659.158142587229137 (code B ref 37659); Tue, 11 Feb 2020 12:58:01 +0000 Original-Received: (at 37659) by debbugs.gnu.org; 11 Feb 2020 12:57:52 +0000 Original-Received: from localhost ([127.0.0.1]:56032 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j1V6t-0007Zt-LB for submit@debbugs.gnu.org; Tue, 11 Feb 2020 07:57:51 -0500 Original-Received: from mail1455c50.megamailservers.eu ([91.136.14.55]:36766 helo=mail266c50.megamailservers.eu) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j1V6r-0007Zf-5H for 37659@debbugs.gnu.org; Tue, 11 Feb 2020 07:57:50 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1581425851; bh=IEPl5gQP8VkI9FlC00LmQofIE/y3X4K2Ldkgxa1g2mI=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=XESaz+3xQxZEOV+Zu8KaYKdx3N5EzyOfGgierDNnYV6DYYON4gr9x99GV9fPf8Zqo jU0vahlEygv1gJBZOQ3mo86KWRXXCvERnMD8e6GqckUwqoElhkyuKdR8i07RLfF/iV VAF5tMNRjkfPPVtSYALZH5WI7kr9lnaHd6i+7UQE= Feedback-ID: mattiase@acm.or Original-Received: from stanniol.lan (c-6f4fe655.032-75-73746f71.bbcust.telenor.se [85.230.79.111]) (authenticated bits=0) by mail266c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 01BCvRRs001024; Tue, 11 Feb 2020 12:57:30 +0000 In-Reply-To: X-Mailer: Apple Mail (2.3445.104.11) X-CTCH-RefID: str=0001.0A0B0212.5E42A4BB.006A, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=PNJxBsiC c=1 sm=1 tr=0 a=fHaj9vQUQVKQ4sUldAaXuQ==:117 a=fHaj9vQUQVKQ4sUldAaXuQ==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=M51BFTxLslgA:10 a=qlOD2N-g_92EnhTNWWIA:9 a=CjuIK1q_8ugA:10 a=qU6-CtAJggarldYP3nEA:9 a=B2y7HmGcmWMA:10 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:175932 Archived-At: --Apple-Mail=_852F7524-6B33-46E1-8607-3F3982206465 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii 22 okt. 2019 kl. 19.33 skrev Paul Eggert : > Moreover, if greed is the longstanding tradition for regexp-opt, = shouldn't plain "or" be greedy, to be consistent with other operators? Having second thoughts, I've come to believe that Paul may have been = right after all. We might just as well let plain 'or' (alias '|') match = as much as possible when it is able to do so. In particular, we should = guarantee that this will happen when all arguments are strings, as used = to be the case. Initially I thought it was a bug that (or "a" "ab") was optimised into = "ab?" on the grounds that this made the behaviour unpredictable: when = matching the string "abc", (or "a" "ab") matched "ab", whereas (or "a" = "ab" space) would match "a". However, the current 'fixed' code isn't = necessarily more useful. Since the change was introduced in Emacs 27 which has not yet been = released, I suggest the attached patch for emacs-27. It reverts the use = of regexp-opt with KEEP-ORDER =3D t. What do you think? It would solve = the problem without introducing new constructs, and without running the = risk of introducing subtle errors in existing rx expressions. (In fact, if we do not do this in Emacs 27, we'd have to add a NEWS = entry to warn users about the change.) A further improvement would be to ensure that nested all-string 'or' = forms would have the same property, and that expansion of user-defined = forms would be transparent. In other words, that (rx-let ((x (or "abc" "de"))) (rx (or "a" x (or "ab" "def")))) would be equivalent to (rx "abc" "ab" "a" "def" "de") I'll prepare a patch for this QoI improvement, but the attached patch = should be required no matter what. --Apple-Mail=_852F7524-6B33-46E1-8607-3F3982206465 Content-Disposition: attachment; filename=0001-rx-Use-longest-match-for-all-string-or-forms-bug-376.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-rx-Use-longest-match-for-all-string-or-forms-bug-376.patch" Content-Transfer-Encoding: quoted-printable =46rom=20b05c5ace634986f3c32310a4f62bd619e6ac5db9=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Tue,=2011=20Feb=202020=2013:23:10=20+0100=0A= Subject:=20[PATCH]=20rx:=20Use=20longest=20match=20for=20all-string=20= 'or'=20forms=20(bug#37659)=0A=0ARevert=20to=20the=20Emacs=2026=20= semantics=20that=20always=20gave=20the=20longest=20match=0Afor=20rx=20= 'or'=20forms=20with=20only=20string=20arguments.=20=20This=20guarantee=20= was=0Anever=20well=20documented,=20but=20it=20is=20useful=20and=20people=20= likely=20have=20come=20to=0Arely=20on=20it.=20=20For=20example,=20prior=20= to=20this=20change,=0A=0A=20(rx=20(or=20">"=20">=3D"))=0A=0Amatched=20= ">"=20even=20if=20the=20text=20contained=20">=3D".=0A=0A*=20= lisp/emacs-lisp/rx.el=20(rx--translate-or):=20Don't=20tell=20regexp-opt=20= to=0Apreserve=20the=20matching=20order.=0A*=20doc/lispref/searching.texi=20= (Rx=20Constructs):=20Document=20the=0Alongest-match=20guarantee=20for=20= all-string=20'or'=20forms.=0A---=0A=20doc/lispref/searching.texi=20|=205=20= ++++-=0A=20lisp/emacs-lisp/rx.el=20=20=20=20=20=20|=202=20+-=0A=202=20= files=20changed,=205=20insertions(+),=202=20deletions(-)=0A=0Adiff=20= --git=20a/doc/lispref/searching.texi=20b/doc/lispref/searching.texi=0A= index=203d7ea93286..5f4509a8b4=20100644=0A---=20= a/doc/lispref/searching.texi=0A+++=20b/doc/lispref/searching.texi=0A@@=20= -1080,7=20+1080,10=20@@=20Rx=20Constructs=0A=20@cindex=20@code{or}=20in=20= rx=0A=20@itemx=20@code{(|=20@var{rx}@dots{})}=0A=20@cindex=20@code{|}=20= in=20rx=0A-Match=20exactly=20one=20of=20the=20@var{rx}s,=20trying=20from=20= left=20to=20right.=0A+Match=20exactly=20one=20of=20the=20@var{rx}s.=0A= +If=20all=20arguments=20are=20string=20literals,=20the=20longest=20= possible=20match=0A+will=20always=20be=20used.=20=20Otherwise,=20either=20= the=20longest=20match=20or=20the=0A+first=20(in=20left-to-right=20order)=20= will=20be=20used.=0A=20Without=20arguments,=20the=20expression=20will=20= not=20match=20anything=20at=20all.@*=0A=20Corresponding=20string=20= regexp:=20@samp{@var{A}\|@var{B}\|@dots{}}.=0A=20=0Adiff=20--git=20= a/lisp/emacs-lisp/rx.el=20b/lisp/emacs-lisp/rx.el=0Aindex=20= 03af053c91..b4cab5715d=20100644=0A---=20a/lisp/emacs-lisp/rx.el=0A+++=20= b/lisp/emacs-lisp/rx.el=0A@@=20-290,7=20+290,7=20@@=20rx--translate-or=0A= =20=20=20=20((null=20(cdr=20body))=20=20=20=20=20=20=20=20=20=20=20=20=20= =20;=20Single=20item.=0A=20=20=20=20=20(rx--translate=20(car=20body)))=0A= =20=20=20=20((rx--every=20#'stringp=20body)=20=20=20=20=20;=20All=20= strings.=0A-=20=20=20=20(cons=20(list=20(regexp-opt=20body=20nil=20t))=0A= +=20=20=20=20(cons=20(list=20(regexp-opt=20body=20nil))=0A=20=20=20=20=20= =20=20=20=20=20=20t))=0A=20=20=20=20((rx--every=20#'rx--charset-p=20= body)=20=20;=20All=20charsets.=0A=20=20=20=20=20(rx--translate-union=20= nil=20body))=0A--=20=0A2.21.1=20(Apple=20Git-122.3)=0A=0A= --Apple-Mail=_852F7524-6B33-46E1-8607-3F3982206465--