From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string' Date: Thu, 26 Nov 2020 13:57:59 +0100 Message-ID: <83EC926B-DE9E-48BC-8FD2-C7CB3617AD50@acm.org> References: <6F768DED-2E1B-4D06-A776-FFA162AC32AD@acm.org> <97535AF5-D542-4267-A5A9-1483C32A61AC@acm.org> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_FB3BA8E5-4848-4CB2-BF78-96ACA7A98898" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15320"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 44861@debbugs.gnu.org, Shigeru Fukaya To: Stefan Kangas Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Nov 26 13:59:11 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kiGrf-0003rZ-Ki for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Nov 2020 13:59:11 +0100 Original-Received: from localhost ([::1]:56188 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kiGre-0007OF-Jd for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Nov 2020 07:59:10 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35958) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kiGrW-0007Mt-9B for bug-gnu-emacs@gnu.org; Thu, 26 Nov 2020 07:59:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:56779) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kiGrW-0001v7-1E for bug-gnu-emacs@gnu.org; Thu, 26 Nov 2020 07:59:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kiGrV-0003y8-W4 for bug-gnu-emacs@gnu.org; Thu, 26 Nov 2020 07:59:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 26 Nov 2020 12:59:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44861 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch confirmed Original-Received: via spool by 44861-submit@debbugs.gnu.org id=B44861.160639549115195 (code B ref 44861); Thu, 26 Nov 2020 12:59:01 +0000 Original-Received: (at 44861) by debbugs.gnu.org; 26 Nov 2020 12:58:11 +0000 Original-Received: from localhost ([127.0.0.1]:40092 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiGqh-0003x0-LA for submit@debbugs.gnu.org; Thu, 26 Nov 2020 07:58:11 -0500 Original-Received: from mail1480c50.megamailservers.eu ([91.136.14.80]:42712 helo=mail118c50.megamailservers.eu) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiGqf-0003wl-6q for 44861@debbugs.gnu.org; Thu, 26 Nov 2020 07:58:10 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1606395482; bh=cToqMShpQcqaOWJZi1bL6K2MAuJPpdJqRf1yLy7t/T0=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=hiErruedloPurEgb4LCqjZIM1WRnBnGhqVNxwh8Q4zQssmlMzNao3Ky89Ycip5Ctz GEPdEpd+neg3irh35XHUIn6EpNsdUIEa1gzZk1ddx6JzZDQkBis0FSRXK9SK1kt7B3 haDU8XxfrbLML99qqkM7yBZtVVzhyrPgMFlecfR8= Feedback-ID: mattiase@acm.or Original-Received: from stanniol.lan (c-064ae655.032-75-73746f71.bbcust.telenor.se [85.230.74.6]) (authenticated bits=0) by mail118c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 0AQCvxJf028278; Thu, 26 Nov 2020 12:58:01 +0000 In-Reply-To: X-Mailer: Apple Mail (2.3445.104.17) X-CTCH-RefID: str=0001.0A782F1E.5FBFA65A.009A, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=U/Ps8tju c=1 sm=1 tr=0 a=Ni+dBsiEfW2GqKMPYZim9A==:117 a=Ni+dBsiEfW2GqKMPYZim9A==:17 a=M51BFTxLslgA:10 a=pGLkceISAAAA:8 a=8Umpk8v5Mqs1vKtccoYA:9 a=CjuIK1q_8ugA:10 a=useeaeFYC0j9yllu8zAA:9 a=B2y7HmGcmWMA:10 X-Origin-Country: SE X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:194306 Archived-At: --Apple-Mail=_FB3BA8E5-4848-4CB2-BF78-96ACA7A98898 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii 25 nov. 2020 kl. 22.39 skrev Stefan Kangas : > I personally worry about the performance here. Since we use regexps > heavily all over, it is not clear (to me) that 10 % overall = performance > drop with subexpressions is worth it to work correctly in these rare > edge-cases. I suppose we do have to fix the bug here, but is it > feasible to solve this in a way that has less performance impact? We can't really let it remain buggy, especially as the consequence can = be an error or silently wrong results. Also remember that one man's edge = case is another's reasonable use. However, unlike Boris we can eat our cake and have it! The attached = patch performs the match-data translation in a C function, which = obviously is much faster and indeed speeds up replace-regexp-in-string = in all cases (as long as there is any match at all). The new primitive = is a bit ad-hoc, but does one well-defined thing and isn't intended for = use by the general public anyway. --Apple-Mail=_FB3BA8E5-4848-4CB2-BF78-96ACA7A98898 Content-Disposition: attachment; filename=0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch" Content-Transfer-Encoding: quoted-printable =46rom=2088d5a8d847045e23c2ab39786dc6e5a9a5412a32=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Wed,=2025=20Nov=202020=2015:32:08=20+0100=0A= Subject:=20[PATCH]=20Fix=20replace-regexp-in-string=20substring=20match=20= data=20translation=0A=0AFor=20certain=20patterns,=20re-matching=20the=20= same=20regexp=20on=20the=20matched=0Asubstring=20does=20not=20produce=20= correctly=20translated=20match=20data=0A(bug#15107=20and=20bug#44861).=0A= =0AUsing=20a=20new=20builtin=20function=20also=20improves=20performance=20= since=20the=0Anumber=20of=20calls=20to=20string-match=20is=20halved.=0A=0A= Reported=20by=20Kevin=20Ryde=20and=20Shigeru=20Fukaya.=0A=0A*=20= lisp/subr.el=20(replace-regexp-in-string):=20Translate=20the=20match=20= data=0Ausing=20match-data--translate=20instead=20of=20trusting=20a=20= call=20to=20string-match=0Aon=20the=20matched=20string=20to=20do=20the=20= job.=0A*=20test/lisp/subr-tests.el=20(subr-replace-regexp-in-string):=0A= Add=20test=20cases.=0A*=20src/search.c=20(Fmatch_data__translate):=20New=20= internal=20function.=0A(syms_of_search):=20Register=20it=20as=20a=20= subroutine.=0A---=0A=20lisp/subr.el=20=20=20=20=20=20=20=20=20=20=20=20|=20= =207=20+++----=0A=20src/search.c=20=20=20=20=20=20=20=20=20=20=20=20|=20= 18=20++++++++++++++++++=0A=20test/lisp/subr-tests.el=20|=20=206=20+++++-=0A= =203=20files=20changed,=2026=20insertions(+),=205=20deletions(-)=0A=0A= diff=20--git=20a/lisp/subr.el=20b/lisp/subr.el=0Aindex=20= 1fb0f9ab7e..e009dcc2b9=20100644=0A---=20a/lisp/subr.el=0A+++=20= b/lisp/subr.el=0A@@=20-4546,10=20+4546,9=20@@=20replace-regexp-in-string=0A= =20=09(when=20(=3D=20me=20mb)=20(setq=20me=20(min=20l=20(1+=20mb))))=0A=20= =09;;=20Generate=20a=20replacement=20for=20the=20matched=20substring.=0A=20= =09;;=20Operate=20on=20only=20the=20substring=20to=20minimize=20string=20= consing.=0A-=09;;=20Set=20up=20match=20data=20for=20the=20substring=20= for=20replacement;=0A-=09;;=20presumably=20this=20is=20likely=20to=20be=20= faster=20than=20munging=20the=0A-=09;;=20match=20data=20directly=20in=20= Lisp.=0A-=09(string-match=20regexp=20(setq=20str=20(substring=20string=20= mb=20me)))=0A+=20=20=20=20=20=20=20=20;;=20Translate=20the=20match=20= data=20so=20that=20it=20applies=20to=20the=20matched=20substring.=0A+=20=20= =20=20=20=20=20=20(match-data--translate=20(-=20mb))=0A+=20=20=20=20=20=20= =20=20(setq=20str=20(substring=20string=20mb=20me))=0A=20=09(setq=20= matches=0A=20=09=20=20=20=20=20=20(cons=20(replace-match=20(if=20= (stringp=20rep)=0A=20=09=09=09=09=20=20=20=20=20=20=20rep=0Adiff=20--git=20= a/src/search.c=20b/src/search.c=0Aindex=20e7f9094946..4eb634a3c0=20= 100644=0A---=20a/src/search.c=0A+++=20b/src/search.c=0A@@=20-3031,6=20= +3031,23=20@@=20DEFUN=20("set-match-data",=20Fset_match_data,=20= Sset_match_data,=201,=202,=200,=0A=20=20=20return=20Qnil;=0A=20}=0A=20=0A= +DEFUN=20("match-data--translate",=20Fmatch_data__translate,=20= Smatch_data__translate,=0A+=20=20=20=20=20=20=201,=201,=200,=0A+=20=20=20= =20=20=20=20doc:=20/*=20Add=20N=20to=20all=20string=20positions=20in=20= the=20match=20data.=20=20Internal.=20=20*/)=0A+=20=20(Lisp_Object=20n)=0A= +{=0A+=20=20CHECK_FIXNUM=20(n);=0A+=20=20EMACS_INT=20delta=20=3D=20= XFIXNUM=20(n);=0A+=20=20if=20(EQ=20(last_thing_searched,=20Qt))=20=20=20= /*=20String=20match=20data=20only.=20=20*/=0A+=20=20=20=20for=20= (ptrdiff_t=20i=20=3D=200;=20i=20<=20search_regs.num_regs;=20i++)=0A+=20=20= =20=20=20=20if=20(search_regs.start[i]=20>=3D=200)=0A+=20=20=20=20=20=20=20= =20{=0A+=20=20=20=20=20=20=20=20=20=20search_regs.start[i]=20=3D=20max=20= (0,=20search_regs.start[i]=20+=20delta);=0A+=20=20=20=20=20=20=20=20=20=20= search_regs.end[i]=20=3D=20max=20(0,=20search_regs.end[i]=20+=20delta);=0A= +=20=20=20=20=20=20=20=20}=0A+=20=20return=20Qnil;=0A+}=0A+=0A=20/*=20= Called=20from=20Flooking_at,=20Fstring_match,=20search_buffer,=20= Fstore_match_data=0A=20=20=20=20if=20asynchronous=20code=20(filter=20or=20= sentinel)=20is=20running.=20*/=0A=20static=20void=0A@@=20-3388,6=20= +3405,7=20@@=20syms_of_search=20(void)=0A=20=20=20defsubr=20= (&Smatch_end);=0A=20=20=20defsubr=20(&Smatch_data);=0A=20=20=20defsubr=20= (&Sset_match_data);=0A+=20=20defsubr=20(&Smatch_data__translate);=0A=20=20= =20defsubr=20(&Sregexp_quote);=0A=20=20=20defsubr=20= (&Snewline_cache_check);=0A=20=0Adiff=20--git=20= a/test/lisp/subr-tests.el=20b/test/lisp/subr-tests.el=0Aindex=20= c77be511dc..67f7fc9749=20100644=0A---=20a/test/lisp/subr-tests.el=0A+++=20= b/test/lisp/subr-tests.el=0A@@=20-545,7=20+545,11=20@@=20= subr-replace-regexp-in-string=0A=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20(match-beginning=201)=20= (match-end=201)))=0A=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20"babbcaacabc")=0A=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= "ba"))=0A-=20=20)=0A+=20=20;;=20= anchors=20(bug#15107,=20bug#44861)=0A+=20=20(should=20(equal=20= (replace-regexp-in-string=20"a\\B"=20"b"=20"a=20aaaa")=0A+=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20"a=20bbba"))=0A+=20=20(should=20= (equal=20(replace-regexp-in-string=20"\\`\\|x"=20"z"=20"--xx--")=0A+=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20"z--zz--")))=0A=20=0A=20= (provide=20'subr-tests)=0A=20;;;=20subr-tests.el=20ends=20here=0A--=20=0A= 2.21.1=20(Apple=20Git-122.3)=0A=0A= --Apple-Mail=_FB3BA8E5-4848-4CB2-BF78-96ACA7A98898--