* performance/hang bug in regex.c
@ 2008-09-02 5:53 Ami Fischman
2008-09-02 20:26 ` Stefan Monnier
0 siblings, 1 reply; 4+ messages in thread
From: Ami Fischman @ 2008-09-02 5:53 UTC (permalink / raw)
To: emacs-devel
In a buffer containing the single line:
aaba-----------------------------------------
with point at the beginning of the line, executing this:
M-:(re-search-forward "\\([^ab]+\\)+bb") RET
sends emacs into a tizzy: 100% CPU is consumed and emacs appears hung;
C-g unhangs it. Shortening the line (reducing the number of dashes)
changes behavior to a delay followed by the no-match error being
triggered (correctly). This repros reliably with 22.1.1 and with CVS
HEAD as of 2008/08/10. I do not have a fix or a workaround other than
avoiding \\(...+\\)+ patterns.
Cheers,
-a
P.S. This recipe is a much-simplified version of a regexp that occurs
in erin-mode (http://www.neilvandyke.org/erin-twiki-emacs/). Its
occurrence there is in a font-lock pattern and causes emacs to hang
after the bug is triggered; not even C-g can make emacs stop trying to
match the pattern. A workaround is to remove the lone "+" at
erin.el:382.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: performance/hang bug in regex.c
2008-09-02 5:53 performance/hang bug in regex.c Ami Fischman
@ 2008-09-02 20:26 ` Stefan Monnier
2008-09-03 2:41 ` Richard M. Stallman
0 siblings, 1 reply; 4+ messages in thread
From: Stefan Monnier @ 2008-09-02 20:26 UTC (permalink / raw)
To: Ami Fischman; +Cc: emacs-devel
> In a buffer containing the single line:
> aaba-----------------------------------------
> with point at the beginning of the line, executing this:
> M-:(re-search-forward "\\([^ab]+\\)+bb") RET
> sends emacs into a tizzy: 100% CPU is consumed and emacs appears hung;
> C-g unhangs it. Shortening the line (reducing the number of dashes)
> changes behavior to a delay followed by the no-match error being
> triggered (correctly). This repros reliably with 22.1.1 and with CVS
> HEAD as of 2008/08/10. I do not have a fix or a workaround other than
> avoiding \\(...+\\)+ patterns.
Regexps can be implemented in mainly 2 ways: backtracking or not.
If you don't do backtracking, the above problem doesn't occur.
But many/most regexp implementations use a backtracking algorithm
because it's almost indispensable in order to handle backreferences.
Emacs provides backreferences and doesn't bother to provide 2 regexp
implementations (a backtracking one for regexps with backrefs and
a non-backtracking one for all the others), so you get pathological
behaviors for regexps such as the one above. It's too bad, and I hope
we can fix it at some point, but don't hold your breath,
Stefan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: performance/hang bug in regex.c
2008-09-02 20:26 ` Stefan Monnier
@ 2008-09-03 2:41 ` Richard M. Stallman
2008-09-03 3:48 ` Thomas Lord
0 siblings, 1 reply; 4+ messages in thread
From: Richard M. Stallman @ 2008-09-03 2:41 UTC (permalink / raw)
To: Stefan Monnier; +Cc: ami, emacs-devel
Regexps can be implemented in mainly 2 ways: backtracking or not.
If you don't do backtracking, the above problem doesn't occur.
But many/most regexp implementations use a backtracking algorithm
because it's almost indispensable in order to handle backreferences.
Emacs provides backreferences and doesn't bother to provide 2 regexp
implementations (a backtracking one for regexps with backrefs and
a non-backtracking one for all the others), so you get pathological
behaviors for regexps such as the one above. It's too bad, and I hope
we can fix it at some point, but don't hold your breath,
This comes up often enough that maybe it should be explained in one
of the manuals.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: performance/hang bug in regex.c
2008-09-03 2:41 ` Richard M. Stallman
@ 2008-09-03 3:48 ` Thomas Lord
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lord @ 2008-09-03 3:48 UTC (permalink / raw)
To: rms; +Cc: ami, Stefan Monnier, emacs-devel
Richard M. Stallman wrote:
> Regexps can be implemented in mainly 2 ways: backtracking or not.
> If you don't do backtracking, the above problem doesn't occur.
> But many/most regexp implementations use a backtracking algorithm
> because it's almost indispensable in order to handle backreferences.
>
> Emacs provides backreferences and doesn't bother to provide 2 regexp
> implementations (a backtracking one for regexps with backrefs and
> a non-backtracking one for all the others), so you get pathological
> behaviors for regexps such as the one above. It's too bad, and I hope
> we can fix it at some point, but don't hold your breath,
>
> This comes up often enough that maybe it should be explained in one
> of the manuals.
>
You can probably clean up Rx (found in GNU Arch) well
enough to fix all of those problems although the work of
doing so will require the worker to understand regexp
stuff deeply (not much (but some) beyond undergrad courses
about DFAs, NFAs and all that but you have to be really
into it -- it's a fairly involved application of that stuff).
-t
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-09-03 3:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-02 5:53 performance/hang bug in regex.c Ami Fischman
2008-09-02 20:26 ` Stefan Monnier
2008-09-03 2:41 ` Richard M. Stallman
2008-09-03 3:48 ` Thomas Lord
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.