From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: (error "Stack overflow in regexp matcher") and (?)wrong display of regexp in backtrace Date: Sun, 15 Mar 2020 16:57:15 +0000 Message-ID: <20200315165715.GD4928@ACM> References: <20200315103922.GA4928@ACM> <858A7BE9-9170-477F-908B-3C2383F5A727@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="54567"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: emacs-devel@gnu.org To: Mattias =?iso-8859-1?Q?Engdeg=E5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Mar 15 18:04:20 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jDWgW-000E5Z-0W for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 18:04:20 +0100 Original-Received: from localhost ([::1]:56174 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDWgU-0002Tn-Sr for ged-emacs-devel@m.gmane-mx.org; Sun, 15 Mar 2020 13:04:19 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48783) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jDWZl-0000vK-1n for emacs-devel@gnu.org; Sun, 15 Mar 2020 12:57:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jDWZi-000593-Mf for emacs-devel@gnu.org; Sun, 15 Mar 2020 12:57:20 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:37775 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1jDWZi-00052Y-Cr for emacs-devel@gnu.org; Sun, 15 Mar 2020 12:57:18 -0400 Original-Received: (qmail 89942 invoked by uid 3782); 15 Mar 2020 16:57:16 -0000 Original-Received: from acm.muc.de (p2E5D5251.dip0.t-ipconnect.de [46.93.82.81]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 15 Mar 2020 17:57:15 +0100 Original-Received: (qmail 11618 invoked by uid 1000); 15 Mar 2020 16:57:15 -0000 Content-Disposition: inline In-Reply-To: <858A7BE9-9170-477F-908B-3C2383F5A727@acm.org> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:245537 Archived-At: Hello, Mattias. On Sun, Mar 15, 2020 at 13:22:20 +0100, Mattias Engdegård wrote: > 15 mars 2020 kl. 11.39 skrev Alan Mackenzie : > Hello Alan. Thanks for the nice example! > > First of all, note the regexp, "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*" > > In the source, the "\15" is "\r". Why is this substitution being made > > for the backtrace? Is it intentional (in which case, why not do the > > same to the "\n"?), or is it a bug? To me, it is more like a bug. > I agree; there are some ad-hoc switches like print-escape-newlines > (which only works on \n and \f) and print-escape-control-characters > (which produces octal), but nothing that gives human-friendly escapes > for other known control characters. OK. > > More importantly, why is there a stack overflow here at all? Even > > though the regexp matcher has a long, long piece of buffer to scan over, > > the regexp is a simple linear search, without any nesting to speak of. > Let's ask xr for help: > (xr-pp "\\(\\\\\\(.\\|\n\\)\\|[^\\\n\15]\\)*") > => > (zero-or-more > (group > (or (seq "\\" > (group anything)) > (not (any "\n\r\\"))))) > (note that xr pretty-prints \r properly) > There are two capture groups here, neither of which are actually used. > Remove them (the outer one in particular) and the regexp no longer > overflows. I agree (having tried "\\(?:" in place of "\\("), but why? What is causing the recursion here? Each of the two groups need only remember the latest string matching it. Surely? I'd like some insight into this, so as to avoid it happening again. [ .... ] I actually changed the regexp to one which searches for what I'm looking for (a non-escaped newline or EOB) from the regexp here (which matches everything which I'm not looking for). I might even time the two approaches and see which is faster. -- Alan Mackenzie (Nuremberg, Germany).