From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: performance/hang bug in regex.c Date: Tue, 02 Sep 2008 16:26:23 -0400 Message-ID: References: <9aa0cfde0809012253j18394460s9954a0651eefbd5b@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1220387954 10691 80.91.229.12 (2 Sep 2008 20:39:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 2 Sep 2008 20:39:14 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Ami Fischman" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 02 22:40:08 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KacfU-0007bD-6u for ged-emacs-devel@m.gmane.org; Tue, 02 Sep 2008 22:40:04 +0200 Original-Received: from localhost ([127.0.0.1]:57137 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KaceU-0008La-PS for ged-emacs-devel@m.gmane.org; Tue, 02 Sep 2008 16:39:02 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KaceM-0008G1-Gt for emacs-devel@gnu.org; Tue, 02 Sep 2008 16:38:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KaceK-0008CY-I5 for emacs-devel@gnu.org; Tue, 02 Sep 2008 16:38:53 -0400 Original-Received: from [199.232.76.173] (port=41454 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KaceK-0008CG-BH for emacs-devel@gnu.org; Tue, 02 Sep 2008 16:38:52 -0400 Original-Received: from chene.dit.umontreal.ca ([132.204.246.20]:59550) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KaceJ-0000Qz-P2 for emacs-devel@gnu.org; Tue, 02 Sep 2008 16:38:52 -0400 Original-Received: from alfajor.home (x-132-204-254-63.xtpr.umontreal.ca [132.204.254.63]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with SMTP id m82KcbnQ017973; Tue, 2 Sep 2008 16:38:38 -0400 Original-Received: by alfajor.home (Postfix, from userid 20848) id 23B521C706; Tue, 2 Sep 2008 16:26:23 -0400 (EDT) In-Reply-To: <9aa0cfde0809012253j18394460s9954a0651eefbd5b@mail.gmail.com> (Ami Fischman's message of "Mon, 1 Sep 2008 22:53:38 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV3095=0 X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:103446 Archived-At: > In a buffer containing the single line: > aaba----------------------------------------- > with point at the beginning of the line, executing this: > M-:(re-search-forward "\\([^ab]+\\)+bb") RET > sends emacs into a tizzy: 100% CPU is consumed and emacs appears hung; > C-g unhangs it. Shortening the line (reducing the number of dashes) > changes behavior to a delay followed by the no-match error being > triggered (correctly). This repros reliably with 22.1.1 and with CVS > HEAD as of 2008/08/10. I do not have a fix or a workaround other than > avoiding \\(...+\\)+ patterns. Regexps can be implemented in mainly 2 ways: backtracking or not. If you don't do backtracking, the above problem doesn't occur. But many/most regexp implementations use a backtracking algorithm because it's almost indispensable in order to handle backreferences. Emacs provides backreferences and doesn't bother to provide 2 regexp implementations (a backtracking one for regexps with backrefs and a non-backtracking one for all the others), so you get pathological behaviors for regexps such as the one above. It's too bad, and I hope we can fix it at some point, but don't hold your breath, Stefan