From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#58726: 29.0.50; Bug in regexp matching with shy groups Date: Sun, 23 Oct 2022 15:50:41 +0200 Message-ID: <7E80A46A-DB9F-407F-B3F1-33E1DA5689EF@acm.org> References: <87y1t72u62.fsf@web.de> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_DF4B63C1-8050-4E2E-86AF-A9557E4024BB" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12261"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58726@debbugs.gnu.org To: Michael Heerdegen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Oct 24 08:11:37 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1omqgQ-0002zM-Dy for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 24 Oct 2022 08:11:34 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ommWs-0005WU-JF for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 23 Oct 2022 21:45:26 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ombOU-0006ec-GJ for bug-gnu-emacs@gnu.org; Sun, 23 Oct 2022 09:52:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ombOU-0006dM-8M for bug-gnu-emacs@gnu.org; Sun, 23 Oct 2022 09:52:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ombOT-0001fu-TU for bug-gnu-emacs@gnu.org; Sun, 23 Oct 2022 09:52:01 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <87y1t72u62.fsf@web.de> Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Oct 2022 13:52:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58726 X-GNU-PR-Package: emacs Original-Received: via spool by 58726-submit@debbugs.gnu.org id=B58726.16665330716377 (code B ref 58726); Sun, 23 Oct 2022 13:52:01 +0000 Original-Received: (at 58726) by debbugs.gnu.org; 23 Oct 2022 13:51:11 +0000 Original-Received: from localhost ([127.0.0.1]:43895 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ombNf-0001en-85 for submit@debbugs.gnu.org; Sun, 23 Oct 2022 09:51:11 -0400 Original-Received: from mail1445c50.megamailservers.eu ([91.136.14.45]:43906 helo=mail265c50.megamailservers.eu) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ombNb-0001eF-1L for 58726@debbugs.gnu.org; Sun, 23 Oct 2022 09:51:09 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1666533060; bh=FTwqQbbwqzmUluoJzVunOtDT9Wl3oJQ1cGOBLkNEryE=; h=From:Subject:Date:Cc:To:From; b=JjohXhR14IHbLPEAGR1XZboxL9tZOH4lXG6Cd7mnoljXpbPS77+0WBm/cgu9EUX7C H+r5FDtYXe8MfBKL7G2rinOxcihOTg1vSfBmE5oHf369HRjgNddQLWmjaznuQ1N0xc 21pYKEGmRwr3jyRqn6D7eAepP4HNJ9FLAKdCY37o= Feedback-ID: mattiase@acm.or Original-Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se [188.150.171.209]) (authenticated bits=0) by mail265c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 29NDoh8B037048; Sun, 23 Oct 2022 13:50:52 +0000 X-Mailer: Apple Mail (2.3654.120.0.1.13) X-CTCH-RefID: str=0001.0A782F18.635546C4.001E, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-Origin-Country: SE X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:246092 Archived-At: --Apple-Mail=_DF4B63C1-8050-4E2E-86AF-A9557E4024BB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Michael, thank you for finding this amusing bug! > (string-match-p "\\`\\(?:ab\\)*\\'" "a") =3D=3D> 0 With a bit of help from the regexp-disasm package, we see that this = compiles to 0 begbuf 1 on-failure-jump-smart to 11 4 exact "ab" 8 jump to 1 11 endbuf 12 succeed where the on-failure-jump-smart op turns into = on-failure-keep-string-jump the first time it's executed. This gives us a clue about what is wrong: when there is a failure inside = an 'exact' string match, the target pointer should be reset to the start = of that string ("ab" here) before jumping to the failure location. Reading the source it becomes clear that this is done correctly when = there is a mismatch, but not if the target string ends prematurely = because PREFETCH() has no idea that it should reset the target pointer! = Easy enough to fix. Please try the attached patch. (The patch takes care of counted = repetitions for good measure although I wasn't able to provoke a failure = directly.) --Apple-Mail=_DF4B63C1-8050-4E2E-86AF-A9557E4024BB Content-Disposition: attachment; filename=0001-Fix-regexp-matching-with-atomic-strings-and-optimise.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Fix-regexp-matching-with-atomic-strings-and-optimise.patch" Content-Transfer-Encoding: quoted-printable =46rom=20a1bc10533625bde326434325bc75cf1934895472=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Sun,=2023=20Oct=202022=2015:40:37=20+0200=0A= Subject:=20[PATCH]=20Fix=20regexp=20matching=20with=20atomic=20strings=20= and=20optimised=0A=20backtracking=0A=0AThis=20bug=20occurs=20when=20an=20= atomic=20pattern=20is=20matched=20at=20the=20end=20of=0Aa=20string=20and=20= the=20on-failure-keep-string-jump=20optimisation=20is=0Ain=20effect,=20= as=20in:=0A=0A=20=20(string-match=20"\\'\\(?:ab\\)*\\'"=20"a")=0A=0A= which=20succeeded=20but=20clearly=20should=20not=20(bug#58726).=0A=0A= Reported=20by=20Michael=20Heerdegen.=0A=0A*=20src/regex-emacs.c=20= (PREFETCH):=20Add=20reset=20parameter.=0A(re_match_2_internal):=20Use=20= it=20for=20proper=20atomic=20pattern=20treatment.=0A*=20= test/src/regex-emacs-tests.el=20(regexp-atomic-failure):=20New=20test.=0A= ---=0A=20src/regex-emacs.c=20=20=20=20=20=20=20=20=20=20=20=20=20|=2014=20= +++++++++-----=0A=20test/src/regex-emacs-tests.el=20|=20=205=20+++++=0A=20= 2=20files=20changed,=2014=20insertions(+),=205=20deletions(-)=0A=0Adiff=20= --git=20a/src/regex-emacs.c=20b/src/regex-emacs.c=0Aindex=20= 9b2c14c413..626560911f=20100644=0A---=20a/src/regex-emacs.c=0A+++=20= b/src/regex-emacs.c=0A@@=20-3446,14=20+3446,18=20@@=20#define=20= POINTER_TO_OFFSET(ptr)=09=09=09\=0A=20=0A=20/*=20Call=20before=20= fetching=20a=20character=20with=20*d.=20=20This=20switches=20over=20to=0A= =20=20=20=20string2=20if=20necessary.=0A+=20=20=20`reset'=20is=20= executed=20before=20backtracking=20if=20there=20are=20no=20more=20= characters.=0A=20=20=20=20Check=20re_match_2_internal=20for=20a=20= discussion=20of=20why=20end_match_2=20might=0A=20=20=20=20not=20be=20= within=20string2=20(but=20be=20equal=20to=20end_match_1=20instead).=20=20= */=0A-#define=20PREFETCH()=09=09=09=09=09=09=09\=0A+#define=20= PREFETCH(reset)=09=09=09=09=09=09=09\=0A=20=20=20while=20(d=20=3D=3D=20= dend)=09=09=09=09=09=09=09\=0A=20=20=20=20=20{=09=09=09=09=09=09=09=09=09= \=0A=20=20=20=20=20=20=20/*=20End=20of=20string2=20=3D>=20fail.=20=20*/=09= =09=09=09=09\=0A=20=20=20=20=20=20=20if=20(dend=20=3D=3D=20end_match_2)=09= =09=09=09=09=09\=0A-=09goto=20fail;=09=09=09=09=09=09=09\=0A+=20=20=20=20= =20=20=20=20{=09=09=09=09=09=09=09=09\=0A+=09=20=20reset;=09=09=09=09=09=09= =09\=0A+=09=20=20goto=20fail;=09=09=09=09=09=09=09\=0A+=09}=09=09=09=09=09= =09=09=09\=0A=20=20=20=20=20=20=20/*=20End=20of=20string1=20=3D>=20= advance=20to=20string2.=20=20*/=09=09=09\=0A=20=20=20=20=20=20=20d=20=3D=20= string2;=09=09=09=09=09=09=09\=0A=20=20=20=20=20=20=20dend=20=3D=20= end_match_2;=09=09=09=09=09=09\=0A@@=20-4252,7=20+4256,7=20@@=20= re_match_2_internal=20(struct=20re_pattern_buffer=20*bufp,=0A=20=09=09= int=20pat_charlen,=20buf_charlen;=0A=20=09=09int=20pat_ch,=20buf_ch;=0A=20= =0A-=09=09PREFETCH=20();=0A+=09=09PREFETCH=20(d=20=3D=20dfail);=0A=20=09=09= if=20(multibyte)=0A=20=09=09=20=20pat_ch=20=3D=20string_char_and_length=20= (p,=20&pat_charlen);=0A=20=09=09else=0A@@=20-4280,7=20+4284,7=20@@=20= re_match_2_internal=20(struct=20re_pattern_buffer=20*bufp,=0A=20=09=09= int=20pat_charlen;=0A=20=09=09int=20pat_ch,=20buf_ch;=0A=20=0A-=09=09= PREFETCH=20();=0A+=09=09PREFETCH=20(d=20=3D=20dfail);=0A=20=09=09if=20= (multibyte)=0A=20=09=09=20=20{=0A=20=09=09=20=20=20=20pat_ch=20=3D=20= string_char_and_length=20(p,=20&pat_charlen);=0A@@=20-4486,7=20+4490,7=20= @@=20re_match_2_internal=20(struct=20re_pattern_buffer=20*bufp,=0A=20=09=09= if=20(d2=20=3D=3D=20dend2)=20break;=0A=20=0A=20=09=09/*=20If=20= necessary,=20advance=20to=20next=20segment=20in=20data.=20=20*/=0A-=09=09= PREFETCH=20();=0A+=09=09PREFETCH=20(d=20=3D=20dfail);=0A=20=0A=20=09=09= /*=20How=20many=20characters=20left=20in=20this=20segment=20to=20match.=20= =20*/=0A=20=09=09dcnt=20=3D=20dend=20-=20d;=0Adiff=20--git=20= a/test/src/regex-emacs-tests.el=20b/test/src/regex-emacs-tests.el=0A= index=20ff0d6be3f5..b323f592dc=20100644=0A---=20= a/test/src/regex-emacs-tests.el=0A+++=20b/test/src/regex-emacs-tests.el=0A= @@=20-867,4=20+867,9=20@@=20regexp-eszett=0A=20=20=20=20=20(should=20= (equal=20(string-match=20"[[:lower:]]"=20"=E1=BA=9E")=200))=0A=20=20=20=20= =20(should=20(equal=20(string-match=20"[[:upper:]]"=20"=E1=BA=9E")=20= 0))))=0A=20=0A+(ert-deftest=20regexp-atomic-failure=20()=0A+=20=20= "Bug#58726."=0A+=20=20(should=20(equal=20(string-match=20= "\\`\\(?:ab\\)*\\'"=20"a")=20nil))=0A+=20=20(should=20(equal=20= (string-match=20"\\`a\\{2\\}*\\'"=20"a")=20nil)))=0A+=0A=20;;;=20= regex-emacs-tests.el=20ends=20here=0A--=20=0A2.32.0=20(Apple=20Git-132)=0A= =0A= --Apple-Mail=_DF4B63C1-8050-4E2E-86AF-A9557E4024BB--