From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Documentation on debugging regexp performance Date: Thu, 21 Jan 2016 15:27:42 +0000 Message-ID: <20160121152742.GA1795@acm.fritz.box> References: <56A06CD6.2090707@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1453389941 18942 80.91.229.3 (21 Jan 2016 15:25:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 21 Jan 2016 15:25:41 +0000 (UTC) Cc: Emacs developers To: =?iso-8859-1?Q?Cl=E9ment?= Pit--Claudel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 21 16:25:33 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aMH75-0000mQ-Ef for ged-emacs-devel@m.gmane.org; Thu, 21 Jan 2016 16:25:31 +0100 Original-Received: from localhost ([::1]:48352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMH74-0002d0-Pr for ged-emacs-devel@m.gmane.org; Thu, 21 Jan 2016 10:25:30 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53654) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMH6x-0002bV-GS for emacs-devel@gnu.org; Thu, 21 Jan 2016 10:25:28 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMH6t-0002VE-CL for emacs-devel@gnu.org; Thu, 21 Jan 2016 10:25:23 -0500 Original-Received: from mail.muc.de ([193.149.48.3]:41388) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMH6t-0002Un-2t for emacs-devel@gnu.org; Thu, 21 Jan 2016 10:25:19 -0500 Original-Received: (qmail 88526 invoked by uid 3782); 21 Jan 2016 15:25:16 -0000 Original-Received: from acm.muc.de (p5B146A47.dip0.t-ipconnect.de [91.20.106.71]) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 21 Jan 2016 16:25:15 +0100 Original-Received: (qmail 1820 invoked by uid 1000); 21 Jan 2016 15:27:42 -0000 Content-Disposition: inline In-Reply-To: <56A06CD6.2090707@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:198498 Archived-At: Hello, Clément. On Thu, Jan 21, 2016 at 12:29:58AM -0500, Clément Pit--Claudel wrote: > Hi emacs-devel, > I'm running into a surprising regular expressions issue. I have > attached a file (~50k) in which (re-search-forward " +[^:=]+ +:=?") > seems to be extremely slow. (I killed it after 30 seconds). Truncating > the file to its first 20 lines reduces the time for re-search-forward > to about a second, which is still extremely slow. > Are there good resources on how to rewrite regexps to make them > Emacs-friendly? I didn't find such documentation, and I'm puzzled as to > what could make the regexp above hard to re-search-forward for. > Cheers, > Clément. " +[^:=]+ +:=?" is an ill-formed regexp - if you get lots of spaces in a non-match, the Emacs regexp engine will try all possible ways of matching these spaces before giving up. You have three concatenated sub-expressions, all of which match any number of spaces, namely: " +[^:=]+ +" 1122222233 I would suggest reformulating it thus: " +[^:= ][^:=]+ " 112222223333334 Subexpression 1 matches ALL the leading spaces. Subexp 2 is exactly one character which can't be a space. Subexp 3 matches almost anything, including spaces, and subexp 4 matches a single space at the end (to make sure there is at least one space there). All the best with your regexp! -- Alan Mackenzie (Nuremberg, Germany).