unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattias.engdegard@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>,
	monnier@iro.umontreal.ca, 64128@debbugs.gnu.org
Subject: bug#64128: regexp parser zero-width assertion bugs
Date: Sun, 18 Jun 2023 22:26:28 +0200	[thread overview]
Message-ID: <E90E9171-C4A5-47E2-9539-E46C17C4F917@gmail.com> (raw)
In-Reply-To: <83pm5tpdy2.fsf@gnu.org>

18 juni 2023 kl. 06.55 skrev Eli Zaretskii <eliz@gnu.org>:

> My comment is that since this was a documented feature, I'm not
> interested in making it an error.

Yes, it would be unwise to raise an error for "^*" or the like; it's in active use.
The manual is a bit hazy about what we actually promise, though.

As Paul notes, we must be able to document it and that might not be easy, so perhaps we shouldn't even try (to change, or document)?

To make everything clear, we have to groups of zero-width assertions:

Group A: ^ $ \` \' \b \B
Group B: \< \> \_< \_> \=

Group B assertions work like ordinary elements, syntactically and semantically. Simple, predictable, but also useless.

Group A assertions are more interesting: either there is nothing before a train of such assertions, such as

   "^\\`\\b\\`*?"

which turns the first character of the operator into a literal (and a second character, if present, now becomes an operator acting on that literal).
Or there is something, and the operator acts on the last element preceding the assertions, except that multiple literal characters coalesce to a single element. Except if one of the literal chars is an out-of-place `^` which splits a sequence of literals into separate segments but not exactly where you think it would.
For example,

  "abc^def\\B\\B+?"

means, I think,

  (seq "ab" (+? "c^def" not-word-boundary not-word-boundary))








  reply	other threads:[~2023-06-18 20:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-17 12:20 bug#64128: regexp parser zero-width assertion bugs Mattias Engdegård
2023-06-17 18:44 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 20:07   ` Mattias Engdegård
2023-06-17 22:18     ` Paul Eggert
2023-06-18  4:55       ` Eli Zaretskii
2023-06-18 20:26         ` Mattias Engdegård [this message]
2023-06-19  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19  8:44             ` Mattias Engdegård
2023-06-19 12:54               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-19 18:34                 ` Mattias Engdegård
2023-06-19 19:21                   ` Paul Eggert
2023-06-19 19:52                     ` Mattias Engdegård
2023-06-19 20:08                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-20 11:36                         ` Mattias Engdegård
2023-06-21  6:08                           ` Paul Eggert
2023-06-21 15:57                             ` Mattias Engdegård
2023-06-19 20:40                       ` Paul Eggert
2023-06-19 18:14           ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E90E9171-C4A5-47E2-9539-E46C17C4F917@gmail.com \
    --to=mattias.engdegard@gmail.com \
    --cc=64128@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).