From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.bugs Subject: bug#64128: regexp parser zero-width assertion bugs Date: Mon, 19 Jun 2023 13:40:06 -0700 Organization: UCLA Computer Science Department Message-ID: References: <4A303177-384E-4FEF-98F2-FAB89A12ACC9@gmail.com> <83pm5tpdy2.fsf@gnu.org> <6AA06366-E276-47EA-96A3-506DA8B17D41@gmail.com> <687a312e-3d1d-edd8-039d-f1cac98caaa6@cs.ucla.edu> <48D53EC3-4335-4E88-98C1-4A74423E6ACB@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------m4I2wpnM6qJv0gx0tAj9ltb5" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6134"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cc: Eli Zaretskii , Stefan Monnier , 64128@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jun 19 22:41:17 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qBLga-0001RH-1v for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 19 Jun 2023 22:41:16 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qBLgP-0004oB-7q; Mon, 19 Jun 2023 16:41:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qBLgM-0004nl-8w for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 16:41:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qBLgL-0001Kw-Rc for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 16:41:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qBLgL-0003Db-OM for bug-gnu-emacs@gnu.org; Mon, 19 Jun 2023 16:41:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 19 Jun 2023 20:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64128 X-GNU-PR-Package: emacs Original-Received: via spool by 64128-submit@debbugs.gnu.org id=B64128.168720721512263 (code B ref 64128); Mon, 19 Jun 2023 20:41:01 +0000 Original-Received: (at 64128) by debbugs.gnu.org; 19 Jun 2023 20:40:15 +0000 Original-Received: from localhost ([127.0.0.1]:57396 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qBLfb-0003Bi-1f for submit@debbugs.gnu.org; Mon, 19 Jun 2023 16:40:15 -0400 Original-Received: from mail.cs.ucla.edu ([131.179.128.66]:53132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qBLfZ-0003BV-Ad for 64128@debbugs.gnu.org; Mon, 19 Jun 2023 16:40:13 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 45A243C09FA21; Mon, 19 Jun 2023 13:40:07 -0700 (PDT) Original-Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 7swmXsnpHv_9; Mon, 19 Jun 2023 13:40:06 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id CDB043C09FB43; Mon, 19 Jun 2023 13:40:06 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu CDB043C09FB43 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1687207206; bh=7oY+yHgt1cGqYJkqUMsLXSUp3GMBClEViZ4bbk+5Krg=; h=Message-ID:Date:MIME-Version:To:From; b=Pj3MPVXpHNPUqNJkLZghZmNXR6eIOR7DyHspHHuKUGm6WWyGiIYlzAclihrEum4P9 G74OguRgejPHK03OWZmJ+h07pwiF8vEkEa4VOIWuzka8cilpBuYgvJ/2RCtsHQ4xMm dqukxwP0T/L7QWvz45X3RGqRiqQe99ATXADzqlO5XCel+ygBVJl2yW9ZRFDMU7QA2j sXe+VFZ99G6nJ2OsrppeTY2o56gLg0XkQ6RDES/xP/CGFhV2L6N/A1yZVpM0Mbp9MP 0vff8EpRitqHIbGKNBqhPvoOdOhopeW0DN1mLwiLQjzyXXFcVBESFO32G5jnDu4piv He5BjnkYRxkUQ== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Original-Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id GH5EaJlgG0nC; Mon, 19 Jun 2023 13:40:06 -0700 (PDT) Original-Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 9E3C63C09FA21; Mon, 19 Jun 2023 13:40:06 -0700 (PDT) Content-Language: en-US In-Reply-To: <48D53EC3-4335-4E88-98C1-4A74423E6ACB@gmail.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:263721 Archived-At: This is a multi-part message in MIME format. --------------m4I2wpnM6qJv0gx0tAj9ltb5 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 2023-06-19 12:52, Mattias Engdeg=C3=A5rd wrote: > Sure, we can turn \b and \B into group B assertions, but the patch was = more conservative in nature. OK, but we still need to fix this, as \b and \B should not be a special=20 case for following "*". > I think we have to preserve \`* meaning \`\* for compatibility, histori= cal or not, because it's something we keep sighting in the wild. That makes some sense, in that \` is like ^, and ^ is already a special=20 case (this is true even in POSIX BREs). In other words, how about if we change the groups from your list: Group A: ^ $ \` \' \b \B Group B: \< \> \_< \_> \=3D to this: Group A: ^ \` Group B: $ \' \b \B \< \> \_< \_> \=3D where "*" is ordinary after Group A, and special after Group B and there=20 is no other squirrelly behavior. And similarly for the other repetition=20 operators. Attached is a proposed doc change for this, which I have not installed.=20 Of course the code and etc/NEWS would need changing too. --------------m4I2wpnM6qJv0gx0tAj9ltb5 Content-Type: text/x-patch; charset=UTF-8; name="0001-Document-proposed-regex-fix-bug-64128.patch" Content-Disposition: attachment; filename="0001-Document-proposed-regex-fix-bug-64128.patch" Content-Transfer-Encoding: base64 RnJvbSAxOGY2ZTBjODVhNzMxM2QyMjFkYTg2OGU2YmY1NWFmMzI4MjgxMTJiIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBNb24sIDE5IEp1biAyMDIzIDEzOjM1OjQ4IC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gRG9jdW1lbnQgcHJvcG9zZWQgcmVnZXggZml4IChidWcjNjQxMjgpCgoqIGRvYy9saXNw cmVmL3NlYXJjaGluZy50ZXhpIChSZWdleHAgU3BlY2lhbCk6ClNheSB0aGF0IHJlcGV0aXRp b24gb3BlcmF0b3JzIGFyZSBub3Qgc3BlY2lhbCBhZnRlciBcYCwKYW5kIHRoYXQgdGhleSB3 b3JrIGFzIGV4cGVjdGVkIGFmdGVyIG90aGVyIGJhY2tzbGFzaCBlc2NhcGVzLgotLS0KIGRv Yy9saXNwcmVmL3NlYXJjaGluZy50ZXhpIHwgNiArLS0tLS0KIDEgZmlsZSBjaGFuZ2VkLCAx IGluc2VydGlvbigrKSwgNSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9kb2MvbGlzcHJl Zi9zZWFyY2hpbmcudGV4aSBiL2RvYy9saXNwcmVmL3NlYXJjaGluZy50ZXhpCmluZGV4IDI4 MjMwY2VhNjQuLjdjOTg5MzA1NGQgMTAwNjQ0Ci0tLSBhL2RvYy9saXNwcmVmL3NlYXJjaGlu Zy50ZXhpCisrKyBiL2RvYy9saXNwcmVmL3NlYXJjaGluZy50ZXhpCkBAIC01NDYsMTUgKzU0 NiwxMSBAQCBSZWdleHAgU3BlY2lhbAogCiBGb3IgaGlzdG9yaWNhbCBjb21wYXRpYmlsaXR5 LCBhIHJlcGV0aXRpb24gb3BlcmF0b3IgaXMgdHJlYXRlZCBhcyBvcmRpbmFyeQogaWYgaXQg YXBwZWFycyBhdCB0aGUgc3RhcnQgb2YgYSByZWd1bGFyIGV4cHJlc3Npb24KLW9yIGFmdGVy IEBzYW1we159LCBAc2FtcHtcKH0sIEBzYW1we1woPzp9IG9yIEBzYW1we1x8fS4KK29yIGFm dGVyIEBzYW1we159LCBAc2FtcHtcYH0sIEBzYW1we1wofSwgQHNhbXB7XCg/On0gb3IgQHNh bXB7XHx9LgogRm9yIGV4YW1wbGUsIEBzYW1weypmb299IGlzIHRyZWF0ZWQgYXMgQHNhbXB7 XCpmb299LCBhbmQKIEBzYW1we3R3b1x8XlxAezJcQH19IGlzIHRyZWF0ZWQgYXMgQHNhbXB7 dHdvXHxeQHsyQH19LgogSXQgaXMgcG9vciBwcmFjdGljZSB0byBkZXBlbmQgb24gdGhpcyBi ZWhhdmlvcjsgdXNlIHByb3BlciBiYWNrc2xhc2gKIGVzY2FwaW5nIGFueXdheSwgcmVnYXJk bGVzcyBvZiB3aGVyZSB0aGUgcmVwZXRpdGlvbiBvcGVyYXRvciBhcHBlYXJzLgotQWxzbywg YSByZXBldGl0aW9uIG9wZXJhdG9yIHNob3VsZCBub3QgaW1tZWRpYXRlbHkgZm9sbG93IGEg YmFja3NsYXNoIGVzY2FwZQotdGhhdCBtYXRjaGVzIG9ubHkgZW1wdHkgc3RyaW5ncywgYXMg RW1hY3MgaGFzIGJ1Z3MgaW4gdGhpcyBhcmVhLgotRm9yIGV4YW1wbGUsIGl0IGlzIHVud2lz ZSB0byB1c2UgQHNhbXB7XGIqfSwgd2hpY2ggY2FuIGJlIG9taXR0ZWQKLXdpdGhvdXQgY2hh bmdpbmcgdGhlIGRvY3VtZW50ZWQgbWVhbmluZyBvZiB0aGUgcmVndWxhciBleHByZXNzaW9u LgogCiBBcyBhIEBzYW1we1x9IGlzIG5vdCBzcGVjaWFsIGluc2lkZSBhIGJyYWNrZXQgZXhw cmVzc2lvbiwgaXQgY2FuCiBuZXZlciByZW1vdmUgdGhlIHNwZWNpYWwgbWVhbmluZyBvZiBA c2FtcHstfSwgQHNhbXB7Xn0gb3IgQHNhbXB7XX0uCi0tIAoyLjM5LjIKCg== --------------m4I2wpnM6qJv0gx0tAj9ltb5--