all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Paul Eggert <eggert@cs.ucla.edu>
To: "Mattias Engdegård" <mattias.engdegard@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	64128@debbugs.gnu.org
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Mon, 19 Jun 2023 13:40:06 -0700	[thread overview]
Message-ID: <bbf8b2ee-086d-f6fe-971c-26f578b3289a@cs.ucla.edu> (raw)
In-Reply-To: <48D53EC3-4335-4E88-98C1-4A74423E6ACB@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

On 2023-06-19 12:52, Mattias Engdegård wrote:

> Sure, we can turn \b and \B into group B assertions, but the patch was more conservative in nature.

OK, but we still need to fix this, as \b and \B should not be a special 
case for following "*".

> I think we have to preserve \`* meaning \`\* for compatibility, historical or not, because it's something we keep sighting in the wild.

That makes some sense, in that \` is like ^, and ^ is already a special 
case (this is true even in POSIX BREs).

In other words, how about if we change the groups from your list:

Group A: ^ $ \` \' \b \B
Group B: \< \> \_< \_> \=

to this:

Group A: ^ \`
Group B: $ \' \b \B \< \> \_< \_> \=

where "*" is ordinary after Group A, and special after Group B and there 
is no other squirrelly behavior. And similarly for the other repetition 
operators.

Attached is a proposed doc change for this, which I have not installed. 
Of course the code and etc/NEWS would need changing too.

[-- Attachment #2: 0001-Document-proposed-regex-fix-bug-64128.patch --]
[-- Type: text/x-patch, Size: 1609 bytes --]

From 18f6e0c85a7313d221da868e6bf55af32828112b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 19 Jun 2023 13:35:48 -0700
Subject: [PATCH] Document proposed regex fix (bug#64128)

* doc/lispref/searching.texi (Regexp Special):
Say that repetition operators are not special after \`,
and that they work as expected after other backslash escapes.
---
 doc/lispref/searching.texi | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 28230cea64..7c9893054d 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -546,15 +546,11 @@ Regexp Special
 
 For historical compatibility, a repetition operator is treated as ordinary
 if it appears at the start of a regular expression
-or after @samp{^}, @samp{\(}, @samp{\(?:} or @samp{\|}.
+or after @samp{^}, @samp{\`}, @samp{\(}, @samp{\(?:} or @samp{\|}.
 For example, @samp{*foo} is treated as @samp{\*foo}, and
 @samp{two\|^\@{2\@}} is treated as @samp{two\|^@{2@}}.
 It is poor practice to depend on this behavior; use proper backslash
 escaping anyway, regardless of where the repetition operator appears.
-Also, a repetition operator should not immediately follow a backslash escape
-that matches only empty strings, as Emacs has bugs in this area.
-For example, it is unwise to use @samp{\b*}, which can be omitted
-without changing the documented meaning of the regular expression.
 
 As a @samp{\} is not special inside a bracket expression, it can
 never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
-- 
2.39.2


  parent reply	other threads:[~2023-06-19 20:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-17 12:20 bug#64128: regexp parser zero-width assertion bugs Mattias Engdegård
2023-06-17 18:44 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 20:07   ` Mattias Engdegård
2023-06-17 22:18     ` Paul Eggert
2023-06-18  4:55       ` Eli Zaretskii
2023-06-18 20:26         ` Mattias Engdegård
2023-06-19  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19  8:44             ` Mattias Engdegård
2023-06-19 12:54               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19 18:34                 ` Mattias Engdegård
2023-06-19 19:21                   ` Paul Eggert
2023-06-19 19:52                     ` Mattias Engdegård
2023-06-19 20:08                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-20 11:36                         ` Mattias Engdegård
2023-06-21  6:08                           ` Paul Eggert
2023-06-21 15:57                             ` Mattias Engdegård
2023-06-19 20:40                       ` Paul Eggert [this message]
2023-06-19 18:14           ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbf8b2ee-086d-f6fe-971c-26f578b3289a@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=64128@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=mattias.engdegard@gmail.com \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.