all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattias.engdegard@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Eli Zaretskii <eliz@gnu.org>, Paul Eggert <eggert@cs.ucla.edu>,
	64128@debbugs.gnu.org
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Mon, 19 Jun 2023 20:34:42 +0200	[thread overview]
Message-ID: <6AA06366-E276-47EA-96A3-506DA8B17D41@gmail.com> (raw)
In-Reply-To: <jwvmt0vwr3s.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 717 bytes --]

19 juni 2023 kl. 14.54 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> 
> I wish there was a way to emit warnings about oddball constructs
> (starting with the "* is literal when encountered at the beginning of
> a regexp").

I agree, but I'm more of a static analysis man. (And relint does complain about all these cases as long as the regexp is detected as such, so there probably aren't many of them left in the Emacs tree.)

Here is a reduced patch that only fixes the really silly behaviour reported earlier, by making sure that `laststart` is reset correctly for all group A assertions. This should be uncontroversial.
Maybe we should change group B assertions so that they work in the same way.


[-- Attachment #2: regexp-zero-width-assertion-noquack.diff --]
[-- Type: application/octet-stream, Size: 2669 bytes --]

diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index fea34df991b..f2da1a2d0db 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -1716,7 +1716,9 @@ regex_compile (re_char *pattern, ptrdiff_t size,
 
   /* Address of start of the most recently finished expression.
      This tells, e.g., postfix * where to find the start of its
-     operand.  Reset at the beginning of groups and alternatives.  */
+     operand.  Reset at the beginning of groups and alternatives,
+     and after zero-width assertions which should not be the target
+     of any postfix repetition operators.  */
   unsigned char *laststart = 0;
 
   /* Address of beginning of regexp, or inside of last group.  */
@@ -1847,12 +1849,14 @@ regex_compile (re_char *pattern, ptrdiff_t size,
 	case '^':
 	  if (! (p == pattern + 1 || at_begline_loc_p (pattern, p)))
 	    goto normal_char;
+	  laststart = 0;
 	  BUF_PUSH (begline);
 	  break;
 
 	case '$':
 	  if (! (p == pend || at_endline_loc_p (p, pend)))
 	    goto normal_char;
+	  laststart = 0;
 	  BUF_PUSH (endline);
 	  break;
 
@@ -1892,7 +1896,7 @@ regex_compile (re_char *pattern, ptrdiff_t size,
 
 	    /* Star, etc. applied to an empty pattern is equivalent
 	       to an empty pattern.  */
-	    if (!laststart || laststart == b)
+	    if (laststart == b)
 	      break;
 
 	    /* Now we know whether or not zero matches is allowed
@@ -2544,18 +2548,22 @@ regex_compile (re_char *pattern, ptrdiff_t size,
               break;
 
 	    case 'b':
+	      laststart = 0;
 	      BUF_PUSH (wordbound);
 	      break;
 
 	    case 'B':
+	      laststart = 0;
 	      BUF_PUSH (notwordbound);
 	      break;
 
 	    case '`':
+	      laststart = 0;
 	      BUF_PUSH (begbuf);
 	      break;
 
 	    case '\'':
+	      laststart = 0;
 	      BUF_PUSH (endbuf);
 	      break;
 
diff --git a/test/src/regex-emacs-tests.el b/test/src/regex-emacs-tests.el
index 52d43775b8e..48a487ffe15 100644
--- a/test/src/regex-emacs-tests.el
+++ b/test/src/regex-emacs-tests.el
@@ -883,4 +883,14 @@ regexp-tests-backtrack-optimization
     (should (looking-at "x*\\(=\\|:\\)*"))
     (should (looking-at "x*=*?"))))
 
+(ert-deftest regexp-tests-zero-width-assertion-repetition ()
+  ;; Check compatibility behaviour with repetition operators after
+  ;; certain zero-width assertions (bug#64128).
+  (should (equal (string-match "^*a" "*a") 0))
+  (should (equal (string-match "\\`*a" "*a") 0))
+  (should (equal (string-match "q\\b*!" "q*!") 0))
+  (should (equal (string-match "q\\b*!" "!") nil))
+  (should (equal (string-match "/\\B*z" "/*z") 0))
+  (should (equal (string-match "/\\B*z" "z") nil)))
+
 ;;; regex-emacs-tests.el ends here

[-- Attachment #3: Type: text/plain, Size: 3 bytes --]





  reply	other threads:[~2023-06-19 18:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-17 12:20 bug#64128: regexp parser zero-width assertion bugs Mattias Engdegård
2023-06-17 18:44 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 20:07   ` Mattias Engdegård
2023-06-17 22:18     ` Paul Eggert
2023-06-18  4:55       ` Eli Zaretskii
2023-06-18 20:26         ` Mattias Engdegård
2023-06-19  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19  8:44             ` Mattias Engdegård
2023-06-19 12:54               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19 18:34                 ` Mattias Engdegård [this message]
2023-06-19 19:21                   ` Paul Eggert
2023-06-19 19:52                     ` Mattias Engdegård
2023-06-19 20:08                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-20 11:36                         ` Mattias Engdegård
2023-06-21  6:08                           ` Paul Eggert
2023-06-21 15:57                             ` Mattias Engdegård
2023-06-19 20:40                       ` Paul Eggert
2023-06-19 18:14           ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6AA06366-E276-47EA-96A3-506DA8B17D41@gmail.com \
    --to=mattias.engdegard@gmail.com \
    --cc=64128@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.