From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace Date: Sun, 15 Mar 2020 21:06:46 +0100 Message-ID: References: <20200315103922.GA4928@ACM> <858A7BE9-9170-477F-908B-3C2383F5A727@acm.org> <20200315165715.GD4928@ACM> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="94340"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Mar 15 22:15:19 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jDabO-000ORH-UW for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 22:15:18 +0100 Original-Received: from localhost ([::1]:58924 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDabN-00007Q-NJ for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 17:15:18 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38664) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDaWa-0006nj-St for emacs-devel@gnu.org; Sun, 15 Mar 2020 17:10:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jDaWZ-00046H-KW for emacs-devel@gnu.org; Sun, 15 Mar 2020 17:10:20 -0400 Original-Received: from mail1457c50.megamailservers.eu ([91.136.14.57]:55876 helo=mail267c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jDaWY-0003u5-Gp for emacs-devel@gnu.org; Sun, 15 Mar 2020 17:10:19 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1584302808; bh=X3cASUHDpu6TCgpX1i2boDkK/8pDNv+5nIYaoSL5LX0=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=b4jif7hjlED45nnJ+rPnWsmHQbDcTeYi5g6xEzBkOqf99Klwj/ateFO2xW1bFFyjd QlLOwwt2rKNqXeDzAu+6hXOQTVloYU8Q8uCMbiiU9sys1bC89vXGi2lz54Rt72Gfj7 CdEU+3hj4VflftXGb9Olh62OqleY6mucCnAWT2Bs= Feedback-ID: mattiase@acm.or Original-Received: from stanniol.lan (c-6f4fe655.032-75-73746f71.bbcust.telenor.se [85.230.79.111]) (authenticated bits=0) by mail267c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 02FK6kBB015580; Sun, 15 Mar 2020 20:06:48 +0000 In-Reply-To: <20200315165715.GD4928@ACM> X-Mailer: Apple Mail (2.3445.104.11) X-CTCH-RefID: str=0001.0A782F28.5E6E8ACB.0067, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=Cf92G4jl c=1 sm=1 tr=0 a=fHaj9vQUQVKQ4sUldAaXuQ==:117 a=fHaj9vQUQVKQ4sUldAaXuQ==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=M51BFTxLslgA:10 a=WVymBPeJTYyGkQ_YoSIA:9 a=CjuIK1q_8ugA:10 a=Z5ABNNGmrOfJ6cZ5bIyy:22 a=jd6J4Gguk5HxikPWLKER:22 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 91.136.14.57 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:245547 Archived-At: 15 mars 2020 kl. 17.57 skrev Alan Mackenzie : > I agree (having tried "\\(?:" in place of "\\("), but why? What is > causing the recursion here? Each of the two groups need only remember > the latest string matching it. Surely? I'd like some insight into > this, so as to avoid it happening again. Each time a capture group is entered, the old extent of that group is = pushed onto the stack so that it can be reloaded if a match failure = occurs and the engine needs to backtrack. This means that removing the = unnecessary group is not an asymptotic improvement in space here, but = reduces the constant factor so that you can match larger strings. It's = still O(N) space, N being the string size. You could probably rewrite the regexp to improve the constant further by = taking advantage of the fact that the [^\\\n\r] branch matches much more = often than the other. Something like (rx (* (seq "\\" anything)) (* (+ (not (any "\n\r\\"))) (seq "\\" anything)) (* (not (any "\n\r\\")))) which could perhaps be trimmed a bit. In any case, removing unnecessary = capture groups everywhere is a cheap and easy improvement to start with. = (Another benefit of rx, by the way -- there tends to be a lot fewer of = those.)