From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#64128: regexp parser zero-width assertion bugs Date: Mon, 19 Jun 2023 20:34:42 +0200 Message-ID: <6AA06366-E276-47EA-96A3-506DA8B17D41@gmail.com> References: <4A303177-384E-4FEF-98F2-FAB89A12ACC9@gmail.com> <83pm5tpdy2.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_58C619C2-F9BA-4321-8E94-AA99718BF664" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38773"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Paul Eggert , 64128@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jun 19 20:35:23 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qBJil-0009tw-AY for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 19 Jun 2023 20:35:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qBJiT-0007Xv-ET; Mon, 19 Jun 2023 14:35:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qBJiQ-0007Xi-E4 for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 14:35:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qBJiQ-0005jG-53 for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 14:35:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qBJiP-00084q-O3 for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 14:35:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 19 Jun 2023 18:35:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64128 X-GNU-PR-Package: emacs Original-Received: via spool by 64128-submit@debbugs.gnu.org id=B64128.168719969131029 (code B ref 64128); Mon, 19 Jun 2023 18:35:01 +0000 Original-Received: (at 64128) by debbugs.gnu.org; 19 Jun 2023 18:34:51 +0000 Original-Received: from localhost ([127.0.0.1]:57283 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qBJiF-00084P-JK for submit@debbugs.gnu.org; Mon, 19 Jun 2023 14:34:51 -0400 Original-Received: from mail-lj1-f179.google.com ([209.85.208.179]:49467) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qBJiD-00084C-Po for 64128@debbugs.gnu.org; Mon, 19 Jun 2023 14:34:50 -0400 Original-Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2b44eddb52dso53060351fa.3 for <64128@debbugs.gnu.org>; Mon, 19 Jun 2023 11:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687199684; x=1689791684; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:sender:from:to:cc:subject:date:message-id:reply-to; bh=23q+PayCz4TF8bf5tn2tOvPSBg9lOxrkqrlmQad34Yo=; b=YYRE+MIDBexEIO3CIvXazeaXKDx0GirA9TaSskEHAwx+g34sjuiqGxgPCFDl+KhX99 czUYH0Q5wNDVSzMk31rntI65xdJBTTwST2IP6uWDH+dWiJdJwokDivHA9XcCv7p39aih odkoMEHg51Q0eTViqNBxenjmIpMjV7rDMRjg80Y60SyTODDnn1MHsWIG6RCaa/iPKsP3 KwmJw5xVtZ4LHo9JUl69ele6q8R1Wn3H0EnW6rMM87D0r3T0W2yo2YkvUguD3aFBhabW elC6UmNnxFmkXWqwW7j1RyDZXSlZnL4rPXzSrzUOOULc54dbQMpwNasF/GY1i5NT/M/5 CR1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687199684; x=1689791684; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=23q+PayCz4TF8bf5tn2tOvPSBg9lOxrkqrlmQad34Yo=; b=OJw0SOHi6JhWVaZWxG38kVjJpV7IPRWa0yvq6gM6d5o6WSSky/JBHcfqqWPyTiHhos Hp60RlIk88/Sa4Ol35EoAnbZjLZDBo1fapabqbItXN6If+cu1hSfY4PNR24BTr/ElDWA j7xYs45JqvNSTNfkTKEDzSETKe9QPUJ00Yt0Q81xDMajLsX9K3Z5G99lqbwZm3LfSPox f7+yMuxVAVOWdhcQifZoeKqxzAgmAkrONycprEDKcqc7NyygVjrpCzJkZqypVSRim7hY 8hjQ1Rd3L6gEf41k5pv1iczwM2jFHzw9jNvDDB7GFfBd6ksnTEMdwnwiTifu0jowNNZm CHAA== X-Gm-Message-State: AC+VfDzkedGhCPQK7I1eDsPJ3yIyVMFoSaEBQJ94FiLvuixQ+rh/dc99 PaxtVh3YmxKrkTI6OSk5vESKNiP2InQ= X-Google-Smtp-Source: ACHHUZ6Zqu79p2mXCfLOtaD5/rsa7ZlyRpuZ1ok035CN8dzEl+C/MB09EyTM3Mm/Q2ls37eVrqx8DQ== X-Received: by 2002:a2e:83d0:0:b0:2b4:792d:a4b5 with SMTP id s16-20020a2e83d0000000b002b4792da4b5mr1913176ljh.33.1687199683495; Mon, 19 Jun 2023 11:34:43 -0700 (PDT) Original-Received: from smtpclient.apple (c188-150-165-235.bredband.tele2.se. [188.150.165.235]) by smtp.gmail.com with ESMTPSA id o11-20020a2e9b4b000000b002ad5f774579sm1216ljj.96.2023.06.19.11.34.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Jun 2023 11:34:42 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:263706 Archived-At: --Apple-Mail=_58C619C2-F9BA-4321-8E94-AA99718BF664 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii 19 juni 2023 kl. 14.54 skrev Stefan Monnier : >=20 > I wish there was a way to emit warnings about oddball constructs > (starting with the "* is literal when encountered at the beginning of > a regexp"). I agree, but I'm more of a static analysis man. (And relint does = complain about all these cases as long as the regexp is detected as = such, so there probably aren't many of them left in the Emacs tree.) Here is a reduced patch that only fixes the really silly behaviour = reported earlier, by making sure that `laststart` is reset correctly for = all group A assertions. This should be uncontroversial. Maybe we should change group B assertions so that they work in the same = way. --Apple-Mail=_58C619C2-F9BA-4321-8E94-AA99718BF664 Content-Disposition: attachment; filename=regexp-zero-width-assertion-noquack.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="regexp-zero-width-assertion-noquack.diff" Content-Transfer-Encoding: 7bit diff --git a/src/regex-emacs.c b/src/regex-emacs.c index fea34df991b..f2da1a2d0db 100644 --- a/src/regex-emacs.c +++ b/src/regex-emacs.c @@ -1716,7 +1716,9 @@ regex_compile (re_char *pattern, ptrdiff_t size, /* Address of start of the most recently finished expression. This tells, e.g., postfix * where to find the start of its - operand. Reset at the beginning of groups and alternatives. */ + operand. Reset at the beginning of groups and alternatives, + and after zero-width assertions which should not be the target + of any postfix repetition operators. */ unsigned char *laststart = 0; /* Address of beginning of regexp, or inside of last group. */ @@ -1847,12 +1849,14 @@ regex_compile (re_char *pattern, ptrdiff_t size, case '^': if (! (p == pattern + 1 || at_begline_loc_p (pattern, p))) goto normal_char; + laststart = 0; BUF_PUSH (begline); break; case '$': if (! (p == pend || at_endline_loc_p (p, pend))) goto normal_char; + laststart = 0; BUF_PUSH (endline); break; @@ -1892,7 +1896,7 @@ regex_compile (re_char *pattern, ptrdiff_t size, /* Star, etc. applied to an empty pattern is equivalent to an empty pattern. */ - if (!laststart || laststart == b) + if (laststart == b) break; /* Now we know whether or not zero matches is allowed @@ -2544,18 +2548,22 @@ regex_compile (re_char *pattern, ptrdiff_t size, break; case 'b': + laststart = 0; BUF_PUSH (wordbound); break; case 'B': + laststart = 0; BUF_PUSH (notwordbound); break; case '`': + laststart = 0; BUF_PUSH (begbuf); break; case '\'': + laststart = 0; BUF_PUSH (endbuf); break; diff --git a/test/src/regex-emacs-tests.el b/test/src/regex-emacs-tests.el index 52d43775b8e..48a487ffe15 100644 --- a/test/src/regex-emacs-tests.el +++ b/test/src/regex-emacs-tests.el @@ -883,4 +883,14 @@ regexp-tests-backtrack-optimization (should (looking-at "x*\\(=\\|:\\)*")) (should (looking-at "x*=*?")))) +(ert-deftest regexp-tests-zero-width-assertion-repetition () + ;; Check compatibility behaviour with repetition operators after + ;; certain zero-width assertions (bug#64128). + (should (equal (string-match "^*a" "*a") 0)) + (should (equal (string-match "\\`*a" "*a") 0)) + (should (equal (string-match "q\\b*!" "q*!") 0)) + (should (equal (string-match "q\\b*!" "!") nil)) + (should (equal (string-match "/\\B*z" "/*z") 0)) + (should (equal (string-match "/\\B*z" "z") nil))) + ;;; regex-emacs-tests.el ends here --Apple-Mail=_58C619C2-F9BA-4321-8E94-AA99718BF664 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii --Apple-Mail=_58C619C2-F9BA-4321-8E94-AA99718BF664--