* bug#58726: 29.0.50; Bug in regexp matching with shy groups @ 2022-10-23 1:41 Michael Heerdegen 2022-10-23 13:50 ` Mattias Engdegård 0 siblings, 1 reply; 6+ messages in thread From: Michael Heerdegen @ 2022-10-23 1:41 UTC (permalink / raw) To: 58726 Hello, (string-match-p "\\`\\(?:ab\\)*\\'" "a") ==> 0 That's wrong, the expected result is nil. The language matched by that regexp is {"", "ab", "abab", "ababab", ...}. Changing to a non-shy group doesn't exploit the issue: (string-match-p "\\`\\(ab\\)*\\'" "a") ==> nil as expected. I've been told (emacs-help, Bruno Barbier) that the problem exists at least in emacs 27, emacs 28 and emacs 29. TIA, Michael. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58726: 29.0.50; Bug in regexp matching with shy groups 2022-10-23 1:41 bug#58726: 29.0.50; Bug in regexp matching with shy groups Michael Heerdegen @ 2022-10-23 13:50 ` Mattias Engdegård 2022-10-24 2:38 ` Michael Heerdegen 0 siblings, 1 reply; 6+ messages in thread From: Mattias Engdegård @ 2022-10-23 13:50 UTC (permalink / raw) To: Michael Heerdegen; +Cc: 58726 [-- Attachment #1: Type: text/plain, Size: 1015 bytes --] Michael, thank you for finding this amusing bug! > (string-match-p "\\`\\(?:ab\\)*\\'" "a") ==> 0 With a bit of help from the regexp-disasm package, we see that this compiles to 0 begbuf 1 on-failure-jump-smart to 11 4 exact "ab" 8 jump to 1 11 endbuf 12 succeed where the on-failure-jump-smart op turns into on-failure-keep-string-jump the first time it's executed. This gives us a clue about what is wrong: when there is a failure inside an 'exact' string match, the target pointer should be reset to the start of that string ("ab" here) before jumping to the failure location. Reading the source it becomes clear that this is done correctly when there is a mismatch, but not if the target string ends prematurely because PREFETCH() has no idea that it should reset the target pointer! Easy enough to fix. Please try the attached patch. (The patch takes care of counted repetitions for good measure although I wasn't able to provoke a failure directly.) [-- Attachment #2: 0001-Fix-regexp-matching-with-atomic-strings-and-optimise.patch --] [-- Type: application/octet-stream, Size: 3240 bytes --] From a1bc10533625bde326434325bc75cf1934895472 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org> Date: Sun, 23 Oct 2022 15:40:37 +0200 Subject: [PATCH] Fix regexp matching with atomic strings and optimised backtracking This bug occurs when an atomic pattern is matched at the end of a string and the on-failure-keep-string-jump optimisation is in effect, as in: (string-match "\\'\\(?:ab\\)*\\'" "a") which succeeded but clearly should not (bug#58726). Reported by Michael Heerdegen. * src/regex-emacs.c (PREFETCH): Add reset parameter. (re_match_2_internal): Use it for proper atomic pattern treatment. * test/src/regex-emacs-tests.el (regexp-atomic-failure): New test. --- src/regex-emacs.c | 14 +++++++++----- test/src/regex-emacs-tests.el | 5 +++++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/src/regex-emacs.c b/src/regex-emacs.c index 9b2c14c413..626560911f 100644 --- a/src/regex-emacs.c +++ b/src/regex-emacs.c @@ -3446,14 +3446,18 @@ #define POINTER_TO_OFFSET(ptr) \ /* Call before fetching a character with *d. This switches over to string2 if necessary. + `reset' is executed before backtracking if there are no more characters. Check re_match_2_internal for a discussion of why end_match_2 might not be within string2 (but be equal to end_match_1 instead). */ -#define PREFETCH() \ +#define PREFETCH(reset) \ while (d == dend) \ { \ /* End of string2 => fail. */ \ if (dend == end_match_2) \ - goto fail; \ + { \ + reset; \ + goto fail; \ + } \ /* End of string1 => advance to string2. */ \ d = string2; \ dend = end_match_2; \ @@ -4252,7 +4256,7 @@ re_match_2_internal (struct re_pattern_buffer *bufp, int pat_charlen, buf_charlen; int pat_ch, buf_ch; - PREFETCH (); + PREFETCH (d = dfail); if (multibyte) pat_ch = string_char_and_length (p, &pat_charlen); else @@ -4280,7 +4284,7 @@ re_match_2_internal (struct re_pattern_buffer *bufp, int pat_charlen; int pat_ch, buf_ch; - PREFETCH (); + PREFETCH (d = dfail); if (multibyte) { pat_ch = string_char_and_length (p, &pat_charlen); @@ -4486,7 +4490,7 @@ re_match_2_internal (struct re_pattern_buffer *bufp, if (d2 == dend2) break; /* If necessary, advance to next segment in data. */ - PREFETCH (); + PREFETCH (d = dfail); /* How many characters left in this segment to match. */ dcnt = dend - d; diff --git a/test/src/regex-emacs-tests.el b/test/src/regex-emacs-tests.el index ff0d6be3f5..b323f592dc 100644 --- a/test/src/regex-emacs-tests.el +++ b/test/src/regex-emacs-tests.el @@ -867,4 +867,9 @@ regexp-eszett (should (equal (string-match "[[:lower:]]" "ẞ") 0)) (should (equal (string-match "[[:upper:]]" "ẞ") 0)))) +(ert-deftest regexp-atomic-failure () + "Bug#58726." + (should (equal (string-match "\\`\\(?:ab\\)*\\'" "a") nil)) + (should (equal (string-match "\\`a\\{2\\}*\\'" "a") nil))) + ;;; regex-emacs-tests.el ends here -- 2.32.0 (Apple Git-132) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#58726: 29.0.50; Bug in regexp matching with shy groups 2022-10-23 13:50 ` Mattias Engdegård @ 2022-10-24 2:38 ` Michael Heerdegen 2022-10-24 10:55 ` Mattias Engdegård 0 siblings, 1 reply; 6+ messages in thread From: Michael Heerdegen @ 2022-10-24 2:38 UTC (permalink / raw) To: Mattias Engdegård; +Cc: 58726 Mattias Engdegård <mattiase@acm.org> writes: > Please try the attached patch. (The patch takes care of counted > repetitions for good measure although I wasn't able to provoke a > failure directly.) Yes, works for me, thanks. Unfortunately I can't estimate whether your fix is correct and the right thing to do, so all I have to offer is that I will run Emacs with your patch installed and watch for any problems that it may have introduced. Thanks again, Michael. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58726: 29.0.50; Bug in regexp matching with shy groups 2022-10-24 2:38 ` Michael Heerdegen @ 2022-10-24 10:55 ` Mattias Engdegård 2022-10-24 11:17 ` Michael Heerdegen 0 siblings, 1 reply; 6+ messages in thread From: Mattias Engdegård @ 2022-10-24 10:55 UTC (permalink / raw) To: Michael Heerdegen; +Cc: 58726 24 okt. 2022 kl. 04.38 skrev Michael Heerdegen <michael_heerdegen@web.de>: > Unfortunately I can't estimate whether your fix is correct and the right > thing to do, so all I have to offer is that I will run Emacs with your > patch installed and watch for any problems that it may have introduced. Thanks for testing! I'm fairly certain of its correctness, but there could be other places with a similar bug that I didn't find. Nevertheless this should do it for now -- pushed to master. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58726: 29.0.50; Bug in regexp matching with shy groups 2022-10-24 10:55 ` Mattias Engdegård @ 2022-10-24 11:17 ` Michael Heerdegen 2022-10-24 11:28 ` Mattias Engdegård 0 siblings, 1 reply; 6+ messages in thread From: Michael Heerdegen @ 2022-10-24 11:17 UTC (permalink / raw) To: Mattias Engdegård; +Cc: 58726 Mattias Engdegård <mattiase@acm.org> writes: > Thanks for testing! I'm fairly certain of its correctness, but there > could be other places with a similar bug that I didn't find. > Nevertheless this should do it for now -- pushed to master. Ok, thanks. Can we close this one? Side note: I'm sorry to tell you, but you messed up the example in the commit message (that one evals to 1). Michael. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#58726: 29.0.50; Bug in regexp matching with shy groups 2022-10-24 11:17 ` Michael Heerdegen @ 2022-10-24 11:28 ` Mattias Engdegård 0 siblings, 0 replies; 6+ messages in thread From: Mattias Engdegård @ 2022-10-24 11:28 UTC (permalink / raw) To: Michael Heerdegen; +Cc: 58726-done 24 okt. 2022 kl. 13.17 skrev Michael Heerdegen <michael_heerdegen@web.de>: > Ok, thanks. Can we close this one? Of course, done. > I'm sorry to tell you, but you messed up the example in the > commit message (that one evals to 1). Oh dear, sorry about that! At least there is a reference to this bug in case someone wonders. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-10-24 11:28 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-23 1:41 bug#58726: 29.0.50; Bug in regexp matching with shy groups Michael Heerdegen 2022-10-23 13:50 ` Mattias Engdegård 2022-10-24 2:38 ` Michael Heerdegen 2022-10-24 10:55 ` Mattias Engdegård 2022-10-24 11:17 ` Michael Heerdegen 2022-10-24 11:28 ` Mattias Engdegård
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.