From: Paul Eggert <eggert@cs.ucla.edu>
To: "Mattias Engdegård" <mattias.engdegard@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>,
Stefan Monnier <monnier@iro.umontreal.ca>,
64128@debbugs.gnu.org
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Mon, 19 Jun 2023 13:40:06 -0700 [thread overview]
Message-ID: <bbf8b2ee-086d-f6fe-971c-26f578b3289a@cs.ucla.edu> (raw)
In-Reply-To: <48D53EC3-4335-4E88-98C1-4A74423E6ACB@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]
On 2023-06-19 12:52, Mattias Engdegård wrote:
> Sure, we can turn \b and \B into group B assertions, but the patch was more conservative in nature.
OK, but we still need to fix this, as \b and \B should not be a special
case for following "*".
> I think we have to preserve \`* meaning \`\* for compatibility, historical or not, because it's something we keep sighting in the wild.
That makes some sense, in that \` is like ^, and ^ is already a special
case (this is true even in POSIX BREs).
In other words, how about if we change the groups from your list:
Group A: ^ $ \` \' \b \B
Group B: \< \> \_< \_> \=
to this:
Group A: ^ \`
Group B: $ \' \b \B \< \> \_< \_> \=
where "*" is ordinary after Group A, and special after Group B and there
is no other squirrelly behavior. And similarly for the other repetition
operators.
Attached is a proposed doc change for this, which I have not installed.
Of course the code and etc/NEWS would need changing too.
[-- Attachment #2: 0001-Document-proposed-regex-fix-bug-64128.patch --]
[-- Type: text/x-patch, Size: 1609 bytes --]
From 18f6e0c85a7313d221da868e6bf55af32828112b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 19 Jun 2023 13:35:48 -0700
Subject: [PATCH] Document proposed regex fix (bug#64128)
* doc/lispref/searching.texi (Regexp Special):
Say that repetition operators are not special after \`,
and that they work as expected after other backslash escapes.
---
doc/lispref/searching.texi | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 28230cea64..7c9893054d 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -546,15 +546,11 @@ Regexp Special
For historical compatibility, a repetition operator is treated as ordinary
if it appears at the start of a regular expression
-or after @samp{^}, @samp{\(}, @samp{\(?:} or @samp{\|}.
+or after @samp{^}, @samp{\`}, @samp{\(}, @samp{\(?:} or @samp{\|}.
For example, @samp{*foo} is treated as @samp{\*foo}, and
@samp{two\|^\@{2\@}} is treated as @samp{two\|^@{2@}}.
It is poor practice to depend on this behavior; use proper backslash
escaping anyway, regardless of where the repetition operator appears.
-Also, a repetition operator should not immediately follow a backslash escape
-that matches only empty strings, as Emacs has bugs in this area.
-For example, it is unwise to use @samp{\b*}, which can be omitted
-without changing the documented meaning of the regular expression.
As a @samp{\} is not special inside a bracket expression, it can
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
--
2.39.2
next prev parent reply other threads:[~2023-06-19 20:40 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-17 12:20 bug#64128: regexp parser zero-width assertion bugs Mattias Engdegård
2023-06-17 18:44 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 20:07 ` Mattias Engdegård
2023-06-17 22:18 ` Paul Eggert
2023-06-18 4:55 ` Eli Zaretskii
2023-06-18 20:26 ` Mattias Engdegård
2023-06-19 3:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19 8:44 ` Mattias Engdegård
2023-06-19 12:54 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19 18:34 ` Mattias Engdegård
2023-06-19 19:21 ` Paul Eggert
2023-06-19 19:52 ` Mattias Engdegård
2023-06-19 20:08 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-20 11:36 ` Mattias Engdegård
2023-06-21 6:08 ` Paul Eggert
2023-06-21 15:57 ` Mattias Engdegård
2023-06-19 20:40 ` Paul Eggert [this message]
2023-06-19 18:14 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbf8b2ee-086d-f6fe-971c-26f578b3289a@cs.ucla.edu \
--to=eggert@cs.ucla.edu \
--cc=64128@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=mattias.engdegard@gmail.com \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.