From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace Date: Sun, 15 Mar 2020 13:22:20 +0100 Message-ID: <858A7BE9-9170-477F-908B-3C2383F5A727@acm.org> References: <20200315103922.GA4928@ACM> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="92130"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Mar 15 15:41:55 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jDUSh-000NsY-5G for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 15:41:55 +0100 Original-Received: from localhost ([::1]:54610 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDUSg-0004mS-2L for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 10:41:54 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56277) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDURC-0004JJ-5h for emacs-devel@gnu.org; Sun, 15 Mar 2020 10:40:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jDURB-0000sZ-09 for emacs-devel@gnu.org; Sun, 15 Mar 2020 10:40:22 -0400 Original-Received: from mail1448c50.megamailservers.eu ([91.136.14.48]:54520 helo=mail265c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jDURA-0000dP-EY for emacs-devel@gnu.org; Sun, 15 Mar 2020 10:40:20 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1584274943; bh=kbFdvlxHz5E/uunZLR2Y9vPsJjQr3BTO2+wUEVPmY2g=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=TkBUcDqt5rAsKCgZsBJufYirqMIo1nyEvJ+UAlw0zJTqfTBjQSZ5duTZQ2xXRGRxj ow4R2ldKocHFk7ICad7T+DZiDkFLy7LszR9L2MMhPhS0S1AFwe3JlRSB7p/YuQvbvb VWtvHAd7UYwWRNEKPnYPyBimKM9bTP/DLyCiF4Ak= Feedback-ID: mattiase@acm.or Original-Received: from stanniol.lan (c-6f4fe655.032-75-73746f71.bbcust.telenor.se [85.230.79.111]) (authenticated bits=0) by mail265c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 02FCMKvO021646; Sun, 15 Mar 2020 12:22:22 +0000 In-Reply-To: <20200315103922.GA4928@ACM> X-Mailer: Apple Mail (2.3445.104.11) X-CTCH-RefID: str=0001.0A782F1A.5E6E1DEE.0048, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=D5w51cZj c=1 sm=1 tr=0 a=fHaj9vQUQVKQ4sUldAaXuQ==:117 a=fHaj9vQUQVKQ4sUldAaXuQ==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=M51BFTxLslgA:10 a=y-iZ68T0UsDYZ-LTN34A:9 a=CjuIK1q_8ugA:10 a=Z5ABNNGmrOfJ6cZ5bIyy:22 a=jd6J4Gguk5HxikPWLKER:22 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 91.136.14.48 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:245530 Archived-At: 15 mars 2020 kl. 11.39 skrev Alan Mackenzie : > Hello, Emacs. Hello Alan. Thanks for the nice example! > First of all, note the regexp, "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*" > ^^^ > In the source, the "\15" is "\r". Why is this substitution being made > for the backtrace? Is it intentional (in which case, why not do the > same to the "\n"?), or is it a bug? To me, it is more like a bug. I agree; there are some ad-hoc switches like print-escape-newlines = (which only works on \n and \f) and print-escape-control-characters = (which produces octal), but nothing that gives human-friendly escapes = for other known control characters. > More importantly, why is there a stack overflow here at all? Even > though the regexp matcher has a long, long piece of buffer to scan = over, > the regexp is a simple linear search, without any nesting to speak of. Let's ask xr for help: (xr-pp "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*") =3D> (zero-or-more (group (or (seq "\\" (group anything)) (not (any "\n\r\\"))))) (note that xr pretty-prints \r properly) There are two capture groups here, neither of which are actually used. = Remove them (the outer one in particular) and the regexp no longer = overflows. Navigating the file also becomes noticeably faster. Like = this: (rx (zero-or-more (or (seq "\\" anything) (not (any "\n\r\\"))))) (rx will use a slightly more efficient rendition of 'anything', but that = isn't actually important in this case.)