From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#19873: Ill-formed regular expression is constructed in forward-paragraph. Date: Thu, 9 Mar 2017 21:04:45 +0000 Message-ID: <20170309210445.GB4046@acm> References: <20150215103122.GA3282@acm.fritz.box> <87o9xodhq4.fsf@jane> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1489093579 23522 195.159.176.226 (9 Mar 2017 21:06:19 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 9 Mar 2017 21:06:19 +0000 (UTC) User-Agent: Mutt/1.7.2 (2016-11-26) Cc: 19873@debbugs.gnu.org To: Marcin Borkowski Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Mar 09 22:06:10 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm5GB-0004rx-T7 for geb-bug-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 22:06:08 +0100 Original-Received: from localhost ([::1]:36289 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm5GH-0003nv-Qu for geb-bug-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 16:06:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58463) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm5GA-0003nc-Cq for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:06:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cm5G7-0007tb-5G for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:06:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:49880) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cm5G6-0007rW-Rc for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:06:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cm5G6-0007dU-A2 for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:06:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 09 Mar 2017 21:06:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19873 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 19873-submit@debbugs.gnu.org id=B19873.148909352129300 (code B ref 19873); Thu, 09 Mar 2017 21:06:02 +0000 Original-Received: (at 19873) by debbugs.gnu.org; 9 Mar 2017 21:05:21 +0000 Original-Received: from localhost ([127.0.0.1]:48079 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm5FQ-0007cW-TX for submit@debbugs.gnu.org; Thu, 09 Mar 2017 16:05:21 -0500 Original-Received: from ocolin.muc.de ([193.149.48.4]:10256 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1cm5FP-0007cO-Fs for 19873@debbugs.gnu.org; Thu, 09 Mar 2017 16:05:20 -0500 Original-Received: (qmail 19286 invoked by uid 3782); 9 Mar 2017 21:05:17 -0000 Original-Received: from acm.muc.de (p548C735A.dip0.t-ipconnect.de [84.140.115.90]) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 09 Mar 2017 22:05:16 +0100 Original-Received: (qmail 26110 invoked by uid 1000); 9 Mar 2017 21:04:46 -0000 Content-Disposition: inline In-Reply-To: <87o9xodhq4.fsf@jane> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:130393 Archived-At: Hello, Marcin. On Sun, Feb 26, 2017 at 17:44:51 +0100, Marcin Borkowski wrote: > On 2015-02-15, at 10:31, Alan Mackenzie wrote: > > Hello, Emacs! > > In forward-paragraph, L37, a regular expression is constructed as > > follows: > > (let* ... > > (sp-parstart (concat "^[ \t]*\\(?:" parstart "\\|" parsep "\\)")) > > ...) > > . Here parstart and parsep are, more or less, > > paragraph-{start,separate}. > > The problem is that parstart and parsep themselves are likely to begin > > with "[ \t]*" (the default values certainly do), so we have two > > consecutive matchers for an arbitrary amount of whitespace. This causes > > the regexp engine to run very slowly when a line starts with lots of WS > > but doesn't match. > > This problem seems to be the cause of bug # 19846 (where holding down the > > spacebar inside a C comment causes Emacs to seize up when auto-fill mode > > is enabled). > Hi Alan, hi all, > I put this bug on my todo-list some time ago and decided now to revisit > it. > I'm wondering what could be done about it. First of all, my Emacs has > this as paragraph-start: > " \\|[ ]*$" > and this as paragraph-separate: > "[ ]*$" > and frankly speaking, I'm not sure why they differ at all (by default). > Also, even though forward-paragraph checks for "^" at their beginning, > they actually don't begin with that character (again, by default). > My first thought is to add a check whether paragraph-start and > paragraph-sep match something like > "^\\^?\\[[[:space:]]+\\][+*]?" > and if yes, make parstart/parsep equal to them, but without the matching > part. > WDYT? My first reaction is "This is a good idea, but be very careful!". For example, if paragraph-start and/or paragraph-separate begin with "[ \t]+" (i.e. the paragraph start requires space at BOL), you will miss it by removing matches of "^\\^?\\[[[:space:]]+\\][+*]?" from them. I think this idea is workable, but you'll have to check for one or both of paragraph-s{tart,eparate} starting with "[ \t]+". A good strategy here might be to begin the target regexp with "^[ \t]*", then begin one or both components with "[ \t]" (without the "*"). There may be other gotchas which I haven't thought about yet. One needs a twisted mind to do this sort of thing properly, so I offer my services to review your upcoming patch. ;-) > -- > Marcin Borkowski > http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski > Faculty of Mathematics and Computer Science > Adam Mickiewicz University -- Alan Mackenzie (Nuremberg, Germany).