unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* performance/hang bug in regex.c
@ 2008-09-02  5:53 Ami Fischman
  2008-09-02 20:26 ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Ami Fischman @ 2008-09-02  5:53 UTC (permalink / raw)
  To: emacs-devel

In a buffer containing the single line:
aaba-----------------------------------------
with point at the beginning of the line, executing this:
M-:(re-search-forward "\\([^ab]+\\)+bb") RET
sends emacs into a tizzy: 100% CPU is consumed and emacs appears hung;
C-g unhangs it.  Shortening the line (reducing the number of dashes)
changes behavior to a delay followed by the no-match error being
triggered (correctly).  This repros reliably with 22.1.1 and with CVS
HEAD as of 2008/08/10.  I do not have a fix or a workaround other than
avoiding \\(...+\\)+ patterns.

Cheers,
-a

P.S. This recipe is a much-simplified version of a regexp that occurs
in erin-mode (http://www.neilvandyke.org/erin-twiki-emacs/).  Its
occurrence there is in a font-lock pattern and causes emacs to hang
after the bug is triggered; not even C-g can make emacs stop trying to
match the pattern.  A workaround is to remove the lone "+" at
erin.el:382.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: performance/hang bug in regex.c
  2008-09-02  5:53 performance/hang bug in regex.c Ami Fischman
@ 2008-09-02 20:26 ` Stefan Monnier
  2008-09-03  2:41   ` Richard M. Stallman
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Monnier @ 2008-09-02 20:26 UTC (permalink / raw)
  To: Ami Fischman; +Cc: emacs-devel

> In a buffer containing the single line:
> aaba-----------------------------------------
> with point at the beginning of the line, executing this:
> M-:(re-search-forward "\\([^ab]+\\)+bb") RET
> sends emacs into a tizzy: 100% CPU is consumed and emacs appears hung;
> C-g unhangs it.  Shortening the line (reducing the number of dashes)
> changes behavior to a delay followed by the no-match error being
> triggered (correctly).  This repros reliably with 22.1.1 and with CVS
> HEAD as of 2008/08/10.  I do not have a fix or a workaround other than
> avoiding \\(...+\\)+ patterns.

Regexps can be implemented in mainly 2 ways: backtracking or not.
If you don't do backtracking, the above problem doesn't occur.
But many/most regexp implementations use a backtracking algorithm
because it's almost indispensable in order to handle backreferences.

Emacs provides backreferences and doesn't bother to provide 2 regexp
implementations (a backtracking one for regexps with backrefs and
a non-backtracking one for all the others), so you get pathological
behaviors for regexps such as the one above.  It's too bad, and I hope
we can fix it at some point, but don't hold your breath,


        Stefan





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: performance/hang bug in regex.c
  2008-09-02 20:26 ` Stefan Monnier
@ 2008-09-03  2:41   ` Richard M. Stallman
  2008-09-03  3:48     ` Thomas Lord
  0 siblings, 1 reply; 4+ messages in thread
From: Richard M. Stallman @ 2008-09-03  2:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ami, emacs-devel

    Regexps can be implemented in mainly 2 ways: backtracking or not.
    If you don't do backtracking, the above problem doesn't occur.
    But many/most regexp implementations use a backtracking algorithm
    because it's almost indispensable in order to handle backreferences.

    Emacs provides backreferences and doesn't bother to provide 2 regexp
    implementations (a backtracking one for regexps with backrefs and
    a non-backtracking one for all the others), so you get pathological
    behaviors for regexps such as the one above.  It's too bad, and I hope
    we can fix it at some point, but don't hold your breath,

This comes up often enough that maybe it should be explained in one
of the manuals.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: performance/hang bug in regex.c
  2008-09-03  2:41   ` Richard M. Stallman
@ 2008-09-03  3:48     ` Thomas Lord
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lord @ 2008-09-03  3:48 UTC (permalink / raw)
  To: rms; +Cc: ami, Stefan Monnier, emacs-devel

Richard M. Stallman wrote:
>     Regexps can be implemented in mainly 2 ways: backtracking or not.
>     If you don't do backtracking, the above problem doesn't occur.
>     But many/most regexp implementations use a backtracking algorithm
>     because it's almost indispensable in order to handle backreferences.
>
>     Emacs provides backreferences and doesn't bother to provide 2 regexp
>     implementations (a backtracking one for regexps with backrefs and
>     a non-backtracking one for all the others), so you get pathological
>     behaviors for regexps such as the one above.  It's too bad, and I hope
>     we can fix it at some point, but don't hold your breath,
>
> This comes up often enough that maybe it should be explained in one
> of the manuals.
>   

You can probably clean up Rx (found in GNU Arch) well
enough to fix all of those problems although the work of
doing so will require the worker to understand regexp
stuff deeply  (not much (but some) beyond undergrad courses
about DFAs, NFAs and all that but you have to be really
into it -- it's a fairly involved application of that stuff).

-t






>
>
>   





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-09-03  3:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-02  5:53 performance/hang bug in regex.c Ami Fischman
2008-09-02 20:26 ` Stefan Monnier
2008-09-03  2:41   ` Richard M. Stallman
2008-09-03  3:48     ` Thomas Lord

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).