unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Regular expressions and user-escaped characters
@ 2024-12-02 22:04 Christopher Howard
  2024-12-02 22:32 ` Joost Kremers
  2024-12-03 14:01 ` Stefan Monnier via Users list for the GNU Emacs text editor
  0 siblings, 2 replies; 5+ messages in thread
From: Christopher Howard @ 2024-12-02 22:04 UTC (permalink / raw)
  To: Help Gnu Emacs Mailing List

Hi, what do you do in a regular expression if you want to match a character, but not a the same character that has been escaped by the user. E.g., if I want my regular expression to look for ?\[ (ASCII 91), matching string "[" and "a[a" but not string "\\[" or "a\\[a", if you follow me. Is this possible with just a regular expression?

If not, what is a good workaround? I was wondering about, say, replacing all the escaped characters first with some uncommon character (like a control code) and then converting back afterwards. But then I suppose I would need to do a check for that uncommon character first.

-- 
📛 Christopher Howard
🚀 gemini://gem.librehacker.com
🌐 http://gem.librehacker.com

בראשית ברא אלהים את השמים ואת הארץ



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regular expressions and user-escaped characters
  2024-12-02 22:04 Regular expressions and user-escaped characters Christopher Howard
@ 2024-12-02 22:32 ` Joost Kremers
  2024-12-02 22:50   ` Joost Kremers
  2024-12-03 14:01 ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2024-12-02 22:32 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

On Mon, Dec 02 2024, Christopher Howard wrote:
> Hi, what do you do in a regular expression if you want to match a
> character, but not a the same character that has been escaped by the user.
> E.g., if I want my regular expression to look for ?\[ (ASCII 91), matching
> string "[" and "a[a" but not string "\\[" or "a\\[a", if you follow me. Is
> this possible with just a regular expression?

You may get away with something like "[^\\][[]", though keep in mind that
that does not match a ?[ not preceded by a backslash, but rather a ?[
preceded by a character that is not a backslash. Depending on your use
case, that might suffice, though, esp. if you use a capturing group:

```
(let ((str "a[a"))
  (when (string-match "[^\\]\\([[]\\)" str)
    (match-string 1 str)))

=> "["
```

vs.:

```
(let ((str "a\\[a"))
  (when (string-match "[^\\]\\([[]\\)" str)
    (match-string 1 str)))
=> nil
```

The "proper" way to do this would be to use negative lookbehind,
`"(?<!\\)[[])"`, but Emacs' regexp engine does not support that.

> If not, what is a good workaround? I was wondering about, say, replacing
> all the escaped characters first with some uncommon character (like a
> control code) and then converting back afterwards. But then I suppose I
> would need to do a check for that uncommon character first.

That would probably work.

-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regular expressions and user-escaped characters
  2024-12-02 22:32 ` Joost Kremers
@ 2024-12-02 22:50   ` Joost Kremers
  2024-12-02 23:09     ` Joost Kremers
  0 siblings, 1 reply; 5+ messages in thread
From: Joost Kremers @ 2024-12-02 22:50 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

On Mon, Dec 02 2024, Joost Kremers wrote:
> You may get away with something like "[^\\][[]", though keep in mind that
> that does not match a ?[ not preceded by a backslash, but rather a ?[
> preceded by a character that is not a backslash.

Mind you, what I forgot to mention: this means that a ?[ at the start of a
string won't be found. A possible solution to that might be to prepend some
character to the string before matching.


-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regular expressions and user-escaped characters
  2024-12-02 22:50   ` Joost Kremers
@ 2024-12-02 23:09     ` Joost Kremers
  0 siblings, 0 replies; 5+ messages in thread
From: Joost Kremers @ 2024-12-02 23:09 UTC (permalink / raw)
  To: Christopher Howard; +Cc: Help Gnu Emacs Mailing List

On 2 December 2024 23:51:49 Joost Kremers <joostkremers@fastmail.fm> wrote:

> On Mon, Dec 02 2024, Joost Kremers wrote:
>> You may get away with something like "[^\\][[]", though keep in mind that
>> that does not match a ?[ not preceded by a backslash, but rather a ?[
>> preceded by a character that is not a backslash.
>
> Mind you, what I forgot to mention: this means that a ?[ at the start of a
> string won't be found. A possible solution to that might be to prepend some
> character to the string before matching.

Or, try usimg \\| to match either a ?[ at the start ot the string or a ?[ 
preceded by a character other than a backslash...

\\(?:^[[]\\|[^\\][[]\\)

Whew...

> --
> Joost Kremers
> Life has its moments






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regular expressions and user-escaped characters
  2024-12-02 22:04 Regular expressions and user-escaped characters Christopher Howard
  2024-12-02 22:32 ` Joost Kremers
@ 2024-12-03 14:01 ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2024-12-03 14:01 UTC (permalink / raw)
  To: help-gnu-emacs

> Hi, what do you do in a regular expression if you want to match a character,
> but not a the same character that has been escaped by the user. E.g., if
> I want my regular expression to look for ?\[ (ASCII 91), matching string "["
> and "a[a" but not string "\\[" or "a\\[a", if you follow me. Is this
> possible with just a regular expression?

The "usual" way we do that is with the godawful:

    "\\(?:^\\|[^\\]\\(?:\\\\\\\\\\)*\\)\\["

This is careful to match the [ if it's preceded by an even number
of backslashes.  But beware that it makes more than the actual [, so if
you start the search from a point that's looking at a [, it won't find
it (except if it's at the beginning of the line).

> If not, what is a good workaround?

Just use a regexp which matches all [ (regardless of any previous
backslashes) and then check afterwards, in ELisp, whether it's preceded
by an odd number of backslashes, e.g. with something like

    (save-excursion
      (goto-char <FOO>)
      (zerop (% (skip-chars-backward "\\") 2)))


- Stefan




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-03 14:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-02 22:04 Regular expressions and user-escaped characters Christopher Howard
2024-12-02 22:32 ` Joost Kremers
2024-12-02 22:50   ` Joost Kremers
2024-12-02 23:09     ` Joost Kremers
2024-12-03 14:01 ` Stefan Monnier via Users list for the GNU Emacs text editor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).