From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dan Davison Newsgroups: gmane.emacs.help Subject: Re: Stack overflow in regexp matcher Date: Tue, 08 Feb 2011 22:58:15 +0000 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1297206143 31007 80.91.229.12 (8 Feb 2011 23:02:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 8 Feb 2011 23:02:23 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Stefan Monnier Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Feb 09 00:02:19 2011 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PmwZa-0001Iy-DX for geh-help-gnu-emacs@m.gmane.org; Wed, 09 Feb 2011 00:02:14 +0100 Original-Received: from localhost ([127.0.0.1]:53378 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PmwWC-0005Sm-C5 for geh-help-gnu-emacs@m.gmane.org; Tue, 08 Feb 2011 17:58:44 -0500 Original-Received: from [140.186.70.92] (port=52179 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PmwVr-0005Ro-V3 for help-gnu-emacs@gnu.org; Tue, 08 Feb 2011 17:58:24 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PmwVq-0003Eq-Nm for help-gnu-emacs@gnu.org; Tue, 08 Feb 2011 17:58:23 -0500 Original-Received: from mail-ww0-f49.google.com ([74.125.82.49]:37261) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PmwVq-0003El-JJ for help-gnu-emacs@gnu.org; Tue, 08 Feb 2011 17:58:22 -0500 Original-Received: by wwb17 with SMTP id 17so6443305wwb.30 for ; Tue, 08 Feb 2011 14:58:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-type; bh=ED5KsOMjY9hfo4mnd4AatDJfbWmeQVW37WJeEP1P8Hk=; b=kZV1JJIuwTID/h1K/xfA8N5pgKK/9z0p5swEBhlnbVZpidfaAF4bTLAV5M3WTOt2s2 rWDLeLyYoPurzWZgelJTPiPCtQOdAZoBO3ZRIDW52jAUeGvR6k7Qv4XqtF5WS8umIy/r NFJjOX6YtabhHJnMLC5LcO8KaZh6gtHN2jnQA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; b=UBbqKHjX1wRJBlCvt/Mu1qRowr3iXNVctYxjX7IlDjyXVbzWKwqEc6FQwgAP/NLmnS aS9L9eYCEnLCvlkayDxFwsrA48iKJfWBLAHXattYI3EMrnrj9/oOkzIarMt6ZfIyEytM lih5Gxx1qHNGdL4SIMT2+YSiJTubQLF6k3fx8= Original-Received: by 10.227.134.5 with SMTP id h5mr1304056wbt.26.1297205901506; Tue, 08 Feb 2011 14:58:21 -0800 (PST) Original-Received: from 94.196.76.76.threembb.co.uk (94.196.76.76.threembb.co.uk [94.196.76.76]) by mx.google.com with ESMTPS id y29sm4879874wbd.16.2011.02.08.14.58.19 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 08 Feb 2011 14:58:20 -0800 (PST) In-Reply-To: (Stefan Monnier's message of "Mon, 07 Feb 2011 15:24:14 -0500") User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (darwin) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 74.125.82.49 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:79037 Archived-At: Stefan Monnier writes: >> The following fails with "Stack overflow in regexp matcher" in emacs 23 >> and 24: > >> (string-match >> "^\\[.+\\]$" >> (concat >> "[" >> (mapconcat (lambda (i) "x") (number-sequence 1 33500) "") >> "]")) > >> This surprised me; I assumed that the ^ and $ anchors, and the simple >> ".+" requirement in the middle would result in a simple, efficient >> regexp. > > The problem is that Emacs's regexp engine is not very clever. > > Basically, each time . succeeds we first recurse, trying to match the > rest against ".*\\]$", and if that fails then we try to match > "\\]$" instead. The fact that at each step we have this "fork" of two > different possible ways to match, means that at each step we have to > push something on the stack, so the longer the string against which we > match, the deeper the stack we need. > > You can fix it by using a regexp that does not need this backtracking. > E.g. "^\\[[^]\n]+\\]$". If you do that, then Emacs's regexp engine will > notice that when [^]\n] matches, then "\\]$" can't match, so there's no > point pushing something on the stack to try the second path when the > first fails. I.e. it will do the whole match with a constant amount of > stack space. > > This relates also to the recent discussion with Ilya in the thread > titled "will we ever have zero width assertions in regexps?". Hi Stefan, Thanks for the explanation. Dan > > > Stefan