From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.help Subject: Re: Stack overflow in regexp matcher Date: Mon, 07 Feb 2011 15:24:14 -0500 Organization: A noiseless patient Spider Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1297118628 7671 80.91.229.12 (7 Feb 2011 22:43:48 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 7 Feb 2011 22:43:48 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Feb 07 23:43:43 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PmZo3-00079h-9e for geh-help-gnu-emacs@m.gmane.org; Mon, 07 Feb 2011 23:43:39 +0100 Original-Received: from localhost ([127.0.0.1]:33710 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PmZo4-0005lt-1Q for geh-help-gnu-emacs@m.gmane.org; Mon, 07 Feb 2011 17:43:40 -0500 Original-Path: usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!news2.euro.net!news.mixmin.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 35 Injection-Info: mx02.eternal-september.org; posting-host="amKqFoJRJ7Hda9JgZt0dlw"; logging-data="3085"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/LZY5KEgBDJkKLFHr5tGiB" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:ORfo4WelWIgH+4oTDfhTur8TMoM= sha1:KbjQwbh11cKpsHqQF+xJDX8JrpM= Original-Xref: usenet.stanford.edu gnu.emacs.help:184845 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:79017 Archived-At: > The following fails with "Stack overflow in regexp matcher" in emacs 23 > and 24: > (string-match > "^\\[.+\\]$" > (concat > "[" > (mapconcat (lambda (i) "x") (number-sequence 1 33500) "") > "]")) > This surprised me; I assumed that the ^ and $ anchors, and the simple > ".+" requirement in the middle would result in a simple, efficient > regexp. The problem is that Emacs's regexp engine is not very clever. Basically, each time . succeeds we first recurse, trying to match the rest against ".*\\]$", and if that fails then we try to match "\\]$" instead. The fact that at each step we have this "fork" of two different possible ways to match, means that at each step we have to push something on the stack, so the longer the string against which we match, the deeper the stack we need. You can fix it by using a regexp that does not need this backtracking. E.g. "^\\[[^]\n]+\\]$". If you do that, then Emacs's regexp engine will notice that when [^]\n] matches, then "\\]$" can't match, so there's no point pushing something on the stack to try the second path when the first fails. I.e. it will do the whole match with a constant amount of stack space. This relates also to the recent discussion with Ilya in the thread titled "will we ever have zero width assertions in regexps?". Stefan