From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#19873: Ill-formed regular expression is constructed in forward-paragraph. Date: Thu, 2 Dec 2021 20:45:17 +0000 Message-ID: References: <20150215103122.GA3282@acm.fritz.box> <87o9xodhq4.fsf@jane> <20170309210445.GB4046@acm> <87y25347ew.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37074"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Marcin Borkowski , 19873@debbugs.gnu.org To: Lars Ingebrigtsen Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Dec 02 21:46:10 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mssy1-0009QM-So for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 02 Dec 2021 21:46:09 +0100 Original-Received: from localhost ([::1]:57040 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mssy0-0006VM-Nk for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 02 Dec 2021 15:46:08 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:43256) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mssxu-0006V9-Gf for bug-gnu-emacs@gnu.org; Thu, 02 Dec 2021 15:46:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:37885) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mssxu-0002m2-9C for bug-gnu-emacs@gnu.org; Thu, 02 Dec 2021 15:46:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mssxt-0003Ip-Ti for bug-gnu-emacs@gnu.org; Thu, 02 Dec 2021 15:46:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 02 Dec 2021 20:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19873 X-GNU-PR-Package: emacs Original-Received: via spool by 19873-submit@debbugs.gnu.org id=B19873.163847792812656 (code B ref 19873); Thu, 02 Dec 2021 20:46:01 +0000 Original-Received: (at 19873) by debbugs.gnu.org; 2 Dec 2021 20:45:28 +0000 Original-Received: from localhost ([127.0.0.1]:49430 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mssxM-0003I4-3f for submit@debbugs.gnu.org; Thu, 02 Dec 2021 15:45:28 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:33336 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1mssxJ-0003Hl-9M for 19873@debbugs.gnu.org; Thu, 02 Dec 2021 15:45:26 -0500 Original-Received: (qmail 86768 invoked by uid 3782); 2 Dec 2021 20:45:18 -0000 Original-Received: from acm.muc.de (p4fe154df.dip0.t-ipconnect.de [79.225.84.223]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 02 Dec 2021 21:45:18 +0100 Original-Received: (qmail 7994 invoked by uid 1000); 2 Dec 2021 20:45:17 -0000 Content-Disposition: inline In-Reply-To: <87y25347ew.fsf@gnus.org> X-Submission-Agent: TMDA/1.3.x (Ph3nix) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:221346 Archived-At: Hello, Lars. On Thu, Dec 02, 2021 at 11:39:51 +0100, Lars Ingebrigtsen wrote: > Alan Mackenzie writes: > > I think this idea is workable, but you'll have to check for one or both > > of paragraph-s{tart,eparate} starting with "[ \t]+". A good strategy > > here might be to begin the target regexp with "^[ \t]*", then begin one > > or both components with "[ \t]" (without the "*"). > > There may be other gotchas which I haven't thought about yet. > > One needs a twisted mind to do this sort of thing properly, so I offer my > > services to review your upcoming patch. ;-) > The problem seems rather intractable to me. Is there really any way to > examine a regexp to determine "does this in practice match [ \t]*"? Back when the bug was new, I started writing a library to analyse a regular expression and convert it into an equivalent well formed regular expression. It's actually working, but is incomplete. It's currently 2757 lines long, including pretty complete unit testing. I actually looked at it again at the start of November. > I wonder whether instead of trying to construct a better overall regexp > could rewrite the loop. That is, instead of searching for sp-parstart, > search for parstart "\\|" parsep, and then check whether > (match-beginning 0) of that comes after "^[ \t]*". Or something along > those lines. > But I don't know whether that'd be any faster in practice. It strikes me as one of these things which needs to be done systematically, which, as I said, I've already tried (and not yet given up). The question presents itself, would the effort be better spent improving Emacs's regexp engine? > Do you have a test case that demonstrates the slowness? In that case I > could try to see whether there's any alternate approach here that's > faster. Martin Rudalics had the original testcase. The slowness was exponential with the number of spaces typed, I think. > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no -- Alan Mackenzie (Nuremberg, Germany).