* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
@ 2022-02-01 7:37 Christian Johansson
2022-02-01 11:00 ` Lars Ingebrigtsen
2022-02-01 12:56 ` Mattias Engdegård
0 siblings, 2 replies; 9+ messages in thread
From: Christian Johansson @ 2022-02-01 7:37 UTC (permalink / raw)
To: 53680
Hello
Some context first, since I needed a multi-line regexp matcher that
could tell the difference of end-of-string vs newline ($ vs \n) in a
string I replace occurrences of \n with \r in a string.
In a peculiar case a endless loop happens and I can reproduce this bug
in Emacs 27.2 and 28.0.50 (have not tested any other versions), ok try
to eval following snippet:
(string-match-p·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
or
(string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
--
Hälsningar / Best Regards
Christian
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 7:37 bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50 Christian Johansson
@ 2022-02-01 11:00 ` Lars Ingebrigtsen
2022-02-01 11:08 ` Lars Ingebrigtsen
2022-02-01 11:15 ` Andreas Schwab
2022-02-01 12:56 ` Mattias Engdegård
1 sibling, 2 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-02-01 11:00 UTC (permalink / raw)
To: Christian Johansson; +Cc: 53680
Christian Johansson <christian@cvj.se> writes:
> In a peculiar case a endless loop happens and I can reproduce this bug
> in Emacs 27.2 and 28.0.50 (have not tested any other versions), ok try
> to eval following snippet:
>
> (string-match-p·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
> or
I guess the · are supposed to be spaces, so it's:
(string-match
"[\r\t ]*implements[\r\t ]+\\([\r\t ]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t ]*{$"
"ariable implements \\Magento\\Framework\\Event\\OberserverInterface\r{\r public function __construct()\r ")
And I can reproduce this on master, too. Is it a problem with excessive
backtracking, perhaps? I let it run for a minute, and it didn't finish
in that time.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 11:00 ` Lars Ingebrigtsen
@ 2022-02-01 11:08 ` Lars Ingebrigtsen
2022-02-01 11:25 ` Andreas Schwab
2022-02-01 11:15 ` Andreas Schwab
1 sibling, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-02-01 11:08 UTC (permalink / raw)
To: Christian Johansson; +Cc: 53680
Lars Ingebrigtsen <larsi@gnus.org> writes:
> And I can reproduce this on master, too. Is it a problem with excessive
> backtracking, perhaps? I let it run for a minute, and it didn't finish
> in that time.
This simplified version also infloops:
(string-match
"implements\\([\r\t ]*[\\a-zA-Z0-9_]+,?\\)+{$"
"ariable implements \\Magento\\Framework\\Event\\OberserverInterface\r{\r public function __construct()\r ")
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 11:08 ` Lars Ingebrigtsen
@ 2022-02-01 11:25 ` Andreas Schwab
0 siblings, 0 replies; 9+ messages in thread
From: Andreas Schwab @ 2022-02-01 11:25 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Christian Johansson, 53680
On Feb 01 2022, Lars Ingebrigtsen wrote:
> This simplified version also infloops:
>
> (string-match
> "implements\\([\r\t ]*[\\a-zA-Z0-9_]+,?\\)+{$"
This has the same search space explosion.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 11:00 ` Lars Ingebrigtsen
2022-02-01 11:08 ` Lars Ingebrigtsen
@ 2022-02-01 11:15 ` Andreas Schwab
2022-02-01 12:16 ` Lars Ingebrigtsen
1 sibling, 1 reply; 9+ messages in thread
From: Andreas Schwab @ 2022-02-01 11:15 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Christian Johansson, 53680
On Feb 01 2022, Lars Ingebrigtsen wrote:
> I guess the · are supposed to be spaces, so it's:
>
> (string-match
> "[\r\t ]*implements[\r\t ]+\\([\r\t ]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t ]*{$"
> "ariable implements \\Magento\\Framework\\Event\\OberserverInterface\r{\r public function __construct()\r ")
>
> And I can reproduce this on master, too. Is it a problem with excessive
> backtracking, perhaps?
It sure is. The nesting of the + operator without proper anchoring
makes the search space explode.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 11:15 ` Andreas Schwab
@ 2022-02-01 12:16 ` Lars Ingebrigtsen
0 siblings, 0 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-02-01 12:16 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Christian Johansson, 53680
Andreas Schwab <schwab@linux-m68k.org> writes:
> It sure is. The nesting of the + operator without proper anchoring
> makes the search space explode.
Yup. So something like
(string-match
"[\r\t ]*implements\\([\r\t ]+[\\a-zA-Z_0-9_]+,?\\)+[\r\t ]*{$"
"ariable implements \\Magento\\Framework\\Event\\OberserverInterface\r{\r public function __construct()\r ")
won't have the same explosion (and indeed finished immediately).
So I don't think this is an Emacs bug, just a very tough regexp to
match. There's been some talk about replacing Emacs' regexp motor with
something that has better backtracking characteristics, but I don't
recall if anybody has actually made any moves to make that happen.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 7:37 bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50 Christian Johansson
2022-02-01 11:00 ` Lars Ingebrigtsen
@ 2022-02-01 12:56 ` Mattias Engdegård
2022-02-01 13:12 ` Christian Johansson
1 sibling, 1 reply; 9+ messages in thread
From: Mattias Engdegård @ 2022-02-01 12:56 UTC (permalink / raw)
To: Christian Johansson; +Cc: Lars Ingebrigtsen, Andreas Schwab, 53680
> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
The diagnostics by Lars and Andreas is correct. Let's look at it more closely, first translating the regexp to rx for ease of reasoning, and see if we can make it work:
(rx (* (in "\t\r "))
"implements"
(+ (in "\t\r "))
(+ (group
(* (in "\t\r "))
(+ (in "0-9A-Za-z" "\\_"))
(? ",")))
(* (in "\t\r "))
"{" eol)
The first line is meaningless since it can match the empty string, but you probably want to anchor the start of "implements" so that it doesn't match "house_implements". Let's also drop the capture group, and we get:
(rx symbol-start "implements"
(+ (in "\t\r "))
(+ (* (in "\t\r "))
(+ (in "0-9A-Za-z" "\\_"))
(? ","))
(* (in "\t\r "))
"{" eol)
You clearly want to match a non-empty sequence of 'words' separated with whitespace and/or commas, but the pattern is ambiguous -- all inter-word separators are optional. Let's make them mandatory:
(rx symbol-start "implements"
;; mandatory whitespace
(+ (in "\t\r "))
;; then a word
(+ (in "0-9A-Za-z" "\\_"))
;; then maybe more words, each prefixed by spaces or comma
(* (+ (in "\t\r ,")) ; fast and loose
(+ (in "0-9A-Za-z" "\\_")))
;; finally whitespace before the curly bracket
(* (in "\t\r "))
"{" eol)
which is reasonably efficient, since all ambiguity is now gone: the regexp can (almost) only match in one way.
Note the "fast and loose" pattern where we accept any number of spaces or commas. Here it depends on your grammar but if you want exactly one comma separating each word, that subexpression would be something like
(* (in "\t\r "))
","
(* (in "\t\r "))
instead.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50
2022-02-01 12:56 ` Mattias Engdegård
@ 2022-02-01 13:12 ` Christian Johansson
[not found] ` <AB866492-4CC0-418A-8C9E-CAAB2C522CDA@acm.org>
0 siblings, 1 reply; 9+ messages in thread
From: Christian Johansson @ 2022-02-01 13:12 UTC (permalink / raw)
To: Mattias Engdegård; +Cc: Lars Ingebrigtsen, Andreas Schwab, 53680
Alright, thanks for helping me find the correct regexp, this issue only appeared on some peculiar strings
> 1 feb. 2022 kl. 13:56 skrev Mattias Engdegård <mattiase@acm.org>:
>
>
>>
>> (string-match·"[\r\t·]*implements[\r\t·]+\\([\r\t·]*[\\a-zA-Z_0-9_]+,?\\)+[\r\t·]*{$"·"ariable·implements·\\Magento\\Framework\\Event\\OberserverInterface\r{\r····public·function·__construct()\r·")
>
> The diagnostics by Lars and Andreas is correct. Let's look at it more closely, first translating the regexp to rx for ease of reasoning, and see if we can make it work:
>
> (rx (* (in "\t\r "))
> "implements"
> (+ (in "\t\r "))
> (+ (group
> (* (in "\t\r "))
> (+ (in "0-9A-Za-z" "\\_"))
> (? ",")))
> (* (in "\t\r "))
> "{" eol)
>
> The first line is meaningless since it can match the empty string, but you probably want to anchor the start of "implements" so that it doesn't match "house_implements". Let's also drop the capture group, and we get:
>
> (rx symbol-start "implements"
> (+ (in "\t\r "))
> (+ (* (in "\t\r "))
> (+ (in "0-9A-Za-z" "\\_"))
> (? ","))
> (* (in "\t\r "))
> "{" eol)
>
> You clearly want to match a non-empty sequence of 'words' separated with whitespace and/or commas, but the pattern is ambiguous -- all inter-word separators are optional. Let's make them mandatory:
>
> (rx symbol-start "implements"
> ;; mandatory whitespace
> (+ (in "\t\r "))
> ;; then a word
> (+ (in "0-9A-Za-z" "\\_"))
> ;; then maybe more words, each prefixed by spaces or comma
> (* (+ (in "\t\r ,")) ; fast and loose
> (+ (in "0-9A-Za-z" "\\_")))
> ;; finally whitespace before the curly bracket
> (* (in "\t\r "))
> "{" eol)
>
> which is reasonably efficient, since all ambiguity is now gone: the regexp can (almost) only match in one way.
>
> Note the "fast and loose" pattern where we accept any number of spaces or commas. Here it depends on your grammar but if you want exactly one comma separating each word, that subexpression would be something like
>
> (* (in "\t\r "))
> ","
> (* (in "\t\r "))
>
> instead.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-02-01 16:23 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-01 7:37 bug#53680: Endless loop in peculiar case of string-match and string-match-p 27.02 and 28.0.50 Christian Johansson
2022-02-01 11:00 ` Lars Ingebrigtsen
2022-02-01 11:08 ` Lars Ingebrigtsen
2022-02-01 11:25 ` Andreas Schwab
2022-02-01 11:15 ` Andreas Schwab
2022-02-01 12:16 ` Lars Ingebrigtsen
2022-02-01 12:56 ` Mattias Engdegård
2022-02-01 13:12 ` Christian Johansson
[not found] ` <AB866492-4CC0-418A-8C9E-CAAB2C522CDA@acm.org>
2022-02-01 16:23 ` Christian Johansson
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).