From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Cl=c3=a9ment_Pit--Claudel?= Newsgroups: gmane.emacs.devel Subject: Re: Documentation on debugging regexp performance Date: Thu, 21 Jan 2016 11:37:48 -0500 Message-ID: <56A1095C.2070107@gmail.com> References: <56A06CD6.2090707@gmail.com> <20160121152742.GA1795@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CuGEjOH5CtvvUvmH4qCXPIupAGcIdFOLF" X-Trace: ger.gmane.org 1453394504 1170 80.91.229.3 (21 Jan 2016 16:41:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 21 Jan 2016 16:41:44 +0000 (UTC) Cc: Emacs developers To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 21 17:41:32 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aMIIe-00022r-7S for ged-emacs-devel@m.gmane.org; Thu, 21 Jan 2016 17:41:32 +0100 Original-Received: from localhost ([::1]:48674 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMIId-0003Gf-JH for ged-emacs-devel@m.gmane.org; Thu, 21 Jan 2016 11:41:31 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41600) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMIF8-0004vO-Dq for emacs-devel@gnu.org; Thu, 21 Jan 2016 11:37:55 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMIF5-0004FC-Mt for emacs-devel@gnu.org; Thu, 21 Jan 2016 11:37:54 -0500 Original-Received: from mout.kundenserver.de ([217.72.192.73]:55588) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMIF5-0004ES-EP for emacs-devel@gnu.org; Thu, 21 Jan 2016 11:37:51 -0500 Original-Received: from [10.0.2.75] ([4.53.175.82]) by mrelayeu.kundenserver.de (mreue102) with ESMTPSA (Nemesis) id 0MGRCi-1aHwez24sS-00DGai; Thu, 21 Jan 2016 17:37:49 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 In-Reply-To: <20160121152742.GA1795@acm.fritz.box> X-Provags-ID: V03:K0:0zF8T7co7n4TGbc5I73sXp4f5QxqRx6U4hqvZms/xHlTRc30cCa OpAOBeXzjdXOCOAQCdUMG0h4bVRbpuOt+gyVGaGnZNxwFjFCM8A6W/iauS0pRT4S7jTyNiW sErCC39Nu4o4wHgeOQv3kNznFtJhVKRye+UXYGyOBrommQQsp4iS88qr+9PSKW5MtHAHi1Y vdYR2Cxv9+oyP2ktR8dYw== X-UI-Out-Filterresults: notjunk:1;V01:K0:Xos/Wm/5HrE=:QpE04AlgOzb2xx+Q7DDaBg GHWeG/2VkO3KSjY97ljL1vNUdsIii9YlhtdgwwgwFSjaDf5b1wssds9Hu8gzuYDLw1/7+fjCW fau0O+PADtzMKk0y8ghngt6P3j+/bcVSTsRv1FlQgPubng7X3JTD1vlBCRIhFgoNMduWYJ09N P5bgh1G6UO4v/IHMT40fz5Rx59E2G6q6lxxDRnAWjJAghOz/2lmdsNde4e3yMgca5HM6ccpUZ CQ/stSc8O+Buq/N36cZ+xNkGI8dV6d2C/1J4bDiw7J/RLVT9JVVB0O0yAHKiLOU6ljtQQ9CE0 GFzaEBakU0Opmkd84OjpkiJ9cuMFVj24IDdQ2h7Znw7tMtBgoDWXD/FYnNx/HgKTQ9EvUSMDu UO18ZotzSFjD7x8kf7nEpvkOnZ4KpLgFEo57U3IzMnOeXvvu1rXEMxKAsIX7YwqH2yCyE+gGi IlWzdVyhR8sQRacDYYUXSXIiqn4LBHgD9X9Wr7cSGo5yn9rXlNoEXWCRxERK8kTvbYaI1rhPW p6r2cwOwe1IMLiT1PWWAcAzofDzVnEPQN3FwBVhNLmAB+eHZxfEjcrPbm2bzoGjrRTDUZPS7r mP0KsfgnE82GfJL6Kfbwrfi4cwg05bSFNFRewY1h4f3c66alfkQ4B6UD/WunwydUsKOVKWUOy f49ipWEtrTMd1FQMBOyQGJVB+t56/pF3nghDxd0ypIwTT8lZWlY/0ox1ZCTSjliy7lYY= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 217.72.192.73 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:198500 Archived-At: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --CuGEjOH5CtvvUvmH4qCXPIupAGcIdFOLF Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 01/21/2016 10:27 AM, Alan Mackenzie wrote: > Hello, Cl=E9ment. Hi Alan! > " +[^:=3D]+ +:=3D?" is an ill-formed regexp - if you get lots of spac= es in > a non-match, the Emacs regexp engine will try all possible ways of > matching these spaces before giving up. You have three concatenated > sub-expressions, all of which match any number of spaces, namely: >=20 > " +[^:=3D]+ +" > 1122222233 >=20 > I would suggest reformulating it thus: >=20 > " +[^:=3D ][^:=3D]+ " > 112222223333334 I think this has different semantics: my original regexp requires at leas= t three spaces. But I think prepending spaces to yours fixes that. >=20 > Subexpression 1 matches ALL the leading spaces. > Subexp 2 is exactly one > character which can't be a space. Subexp 3 matches almost anything, > including spaces, and subexp 4 matches a single space at the end (to ma= ke > sure there is at least one space there). This is helpful, thanks! I realize however that maybe I oversimplified. T= he issue is that what I really want is something like this: " +\\([^:=3D]+\\) +:=3D?" IOW, I want to capture that first group. > All the best with your regexp! Thanks. Your points about backtracking were helpful as well. Do you know = if there are technical reasons why Emacs chooses a backtracking implement= ation for this regexp (instead of compiling it to a linear-time matcher)?= Cl=E9ment. --CuGEjOH5CtvvUvmH4qCXPIupAGcIdFOLF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWoQlcAAoJEPqg+cTm90wjgNUP/3+Y89ge41KTgtkNVpuo+eWp gs6MTJcDHEA1aXTT87HdOLoARBAZ+d4m/s1jjH907cuSOGco1TKUf0QJ9dru0nHX Yy12Wo6X6/2rm1IIPWkR622liLbFMkVJLHiJY+oN6RxieE7QqSXABd7aHPNO1uWG TaVmOb6qI2XUs/4a2CaAE9NnMkxw4dPnYMJaowR3atqx7G5f+g7dmb3pNynh3cXl EVXGDGsdCxBD+SXlypovO+TGVlCO5f/XzF40zq4gcg9RLaWihg98OSvnIMj/+S/0 XYiIoZjddhDgu58TyHW81UiKcGnDf83eqRJvgkxxB0wNi5hQ9IiJ9SZ1bhRF5vLZ 14Kt01PsDCYkLIn2XRMPP7GrR6QZiPTWATY+Op50Cx/yGQQ12YRM+68aJMVdMwp7 uFJPMKZAPfV5heLCDuAWvpY5gngFZ3pdBclIJ2PqaxLIBqx9ZU1UM1nxSUE9TS6s dcw0HMmZHA/QBjL3V3CUM14kNdN8w+h/hHUm1R3E+6J3a/FprAG4zCEskDcfo7JW hjD5VErye3qnorutzOTrrSyG8560hgQZtyLIl/YTYIPGH6DYdzxfJZFePdlkJ3T+ wbrXwidYaJ3TcRWnhBomkEir/qZ7B+wnbWUpk1lrRtxaFor0/r194lxKPt8ou/x+ dftEO9psMbX4Jr7VxJ84 =ek1f -----END PGP SIGNATURE----- --CuGEjOH5CtvvUvmH4qCXPIupAGcIdFOLF--