How to get a concatenation of the negations with rx (ex: [^a][^b])?

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* How to get a concatenation of the negations with rx (ex: [^a][^b])?
@ 2023-11-11 20:17 Edgar Lux
  2023-11-11 21:00 ` Emanuel Berg
  2023-11-12  7:03 ` Michael Heerdegen
  0 siblings, 2 replies; 11+ messages in thread
From: Edgar Lux @ 2023-11-11 20:17 UTC (permalink / raw)
  To: Help Gnu Emacs

Hello. I am trying to get this regular expression:

    "[^a][^b]"

in an easier way. I thought that I could do 

    (rx (not (seq "a" "b")))

but that got me 

    Debugger entered--Lisp error: (error "Illegal argument to rx ‘not’: (seq \"a\" \"b\")")

The error is very clear, but I would like to know if there is a smart way of achieving the same without having to type:

    (rx (seq (not "a") (not "b")))

which produces

    "[^a][^b]"

Thank you very much.

-- 
Sent with https://mailfence.com  
Secure and private email



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux
@ 2023-11-11 21:00 ` Emanuel Berg
  2023-11-13 19:26   ` tomas
  2023-11-12  7:03 ` Michael Heerdegen
  1 sibling, 1 reply; 11+ messages in thread
From: Emanuel Berg @ 2023-11-11 21:00 UTC (permalink / raw)
  To: help-gnu-emacs

Edgar Lux wrote:

> Hello. I am trying to get this regular expression:
>
>     "[^a][^b]"
>
> in an easier way. I thought that I could do
>
>     (rx (not (seq "a" "b")))
>
> but that got me
>
>     Debugger entered--Lisp error: (error "Illegal argument
>     to rx ‘not’: (seq \"a\" \"b\")")
>
> The error is very clear, but I would like to know if there
> is a smart way of achieving the same without having to type:
>
>     (rx (seq (not "a") (not "b")))
>
> which produces
>
>     "[^a][^b]"

(rx (not (any "a" "b")))

"[^ab]"

?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux
  2023-11-11 21:00 ` Emanuel Berg
@ 2023-11-12  7:03 ` Michael Heerdegen
  2023-11-12  7:26   ` tomas
  1 sibling, 1 reply; 11+ messages in thread
From: Michael Heerdegen @ 2023-11-12  7:03 UTC (permalink / raw)
  To: help-gnu-emacs

Edgar Lux <edgarlux@mailfence.com> writes:

> Hello. I am trying to get this regular expression:
>
>     "[^a][^b]"
>
> in an easier way. I thought that I could do 
>
>     (rx (not (seq "a" "b")))

No,

> but that got me 
>
>     Debugger entered--Lisp error: (error "Illegal argument to rx
> ‘not’: (seq \"a\" \"b\")")

this would also express something different: `not' is not distributive
over `seq' concatenation of regexps -- a string starting with a character
different from 'a' followed by a character different from 'b' is something
different than a string not starting with "ab".

> The error is very clear,

I think `match a string not starting with "ab"' is not even expressible
using a standalone standard regexp.  That would require regexps to be
able to backtrack when matching, and it would be something more general
than regexps.

> but I would like to know if there is a smart way of achieving the same
> without having to type:
>
>     (rx (seq (not "a") (not "b")))

Unless you actually wanted to express something different: the `seq'
wrapper is redundant, but with

  (rx (not "a") (not "b"))

a local maximum of smartness is reached.

Michael.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-12  7:03 ` Michael Heerdegen
@ 2023-11-12  7:26   ` tomas
  2023-11-12  8:28     ` Yuri Khan
  0 siblings, 1 reply; 11+ messages in thread
From: tomas @ 2023-11-12  7:26 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On Sun, Nov 12, 2023 at 08:03:18AM +0100, Michael Heerdegen wrote:
> Edgar Lux <edgarlux@mailfence.com> writes:
> 
> > Hello. I am trying to get this regular expression:
> >
> >     "[^a][^b]"
> >
> > in an easier way. I thought that I could do 
> >
> >     (rx (not (seq "a" "b")))
> 
> No,

[...]

> I think `match a string not starting with "ab"' is not even expressible
> using a standalone standard regexp.  That would require regexps to be
> able to backtrack when matching, and it would be something more general
> than regexps.

Actually... the complement of a regular language is also a regular
language, so there should be a regexp for that, too. But regexps
don't seem to have a negation operator very often, do they? (I
know of negated lookarounds, e.g. in PCRE, but...)

Off the bat I don't know of a general way to construct a regexp
matching the complement of a language.

In the above, case, you might just do something like "[^a].|.[^b]",
if I'm not making an off-by-one error (yes, I'm missing "^.$" ;-)

It reminds one of De Morgan...

So in rx, (not (seq "a" "b")) might rather correspond to something
like (or (seq (not "a") anychar) (seq anychar (not "b"))) modulo
corner cases like an "one char sequence", which also wouldn't be
"ab".

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-12  7:26   ` tomas
@ 2023-11-12  8:28     ` Yuri Khan
  2023-11-12 10:38       ` tomas
  0 siblings, 1 reply; 11+ messages in thread
From: Yuri Khan @ 2023-11-12  8:28 UTC (permalink / raw)
  To: tomas; +Cc: Michael Heerdegen, help-gnu-emacs

On Sun, 12 Nov 2023 at 14:27, <tomas@tuxteam.de> wrote:

> Actually... the complement of a regular language is also a regular
> language, so there should be a regexp for that, too.

In all(?) the courses that taught me the theory of regular languages,
the construction of a negation of a regexp would be suitable as the
practical part of an exam question sheet.

As the first step, it is relatively straightforward to build a
non-deterministic finite state machine from the original regexp; then
we complement that NDFA’s set of accepting states; and then it’s a
tedious, error-prone job of building a regexp equivalent to that
complemented NDFA.

The regexp thus obtained is often not a pretty sight, either.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-12  8:28     ` Yuri Khan
@ 2023-11-12 10:38       ` tomas
  2023-11-12 11:53         ` Michael Heerdegen
  2023-11-13  8:46         ` Anders Munch
  0 siblings, 2 replies; 11+ messages in thread
From: tomas @ 2023-11-12 10:38 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Michael Heerdegen, help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

On Sun, Nov 12, 2023 at 03:28:21PM +0700, Yuri Khan wrote:
> On Sun, 12 Nov 2023 at 14:27, <tomas@tuxteam.de> wrote:
> 
> > Actually... the complement of a regular language is also a regular
> > language, so there should be a regexp for that, too.
> 
> In all(?) the courses that taught me the theory of regular languages,
> the construction of a negation of a regexp would be suitable as the
> practical part of an exam question sheet.
> 
> As the first step, it is relatively straightforward to build a
> non-deterministic finite state machine from the original regexp; then
> we complement that NDFA’s set of accepting states; and then it’s a
> tedious, error-prone job of building a regexp equivalent to that
> complemented NDFA.

OK -- this was roughly my train of thought: build the NFA, then
invert that... OMG. Then I decided this is better left as an
exercise to the reader.

But since a regexp library is building the NFA anyway, I think
it's surprising that it doesn't support negation. But I haven't
ever tried to implement that; perhaps I'd know the answer then :)

> The regexp thus obtained is often not a pretty sight, either.

I believe you right away on that!

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-12 10:38       ` tomas
@ 2023-11-12 11:53         ` Michael Heerdegen
  2023-11-13  8:46         ` Anders Munch
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Heerdegen @ 2023-11-12 11:53 UTC (permalink / raw)
  To: help-gnu-emacs

tomas@tuxteam.de writes:

> > The regexp thus obtained is often not a pretty sight, either.
>
> I believe you right away on that!

I guess the result could involve huge `or' specs that simply list all
possibilities of a finite but huge number of remaining character
combinations not fulfilling a certain requirement.  At least that's what
I would guess how the conversion result of the negated automaton would
look like.

Michael.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-12 10:38       ` tomas
  2023-11-12 11:53         ` Michael Heerdegen
@ 2023-11-13  8:46         ` Anders Munch
  2023-11-13  9:24           ` tomas
  1 sibling, 1 reply; 11+ messages in thread
From: Anders Munch @ 2023-11-13  8:46 UTC (permalink / raw)
  To: help-gnu-emacs

tomas@tuxteam.de wrote:
> OK -- this was roughly my train of thought: build the NFA, then invert that... OMG. Then I decided this is better left as an exercise to the reader.

At the DFA level it's easy.  So you can just convert the NFA to a DFA and work from there.  I did that exercise once upon a time, let me see if I can remember it.

First convert to DFA. For every node, add an outgoing edge to the acceptance state for every character that doesn't already have an outgoing node. Remove all edges to the acceptance state that were in the original DFA.

The main problem is not implementation.  It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for.  It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself.

regards,
Anders

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-13  8:46         ` Anders Munch
@ 2023-11-13  9:24           ` tomas
  2023-12-24 11:54             ` tomas
  0 siblings, 1 reply; 11+ messages in thread
From: tomas @ 2023-11-13  9:24 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On Mon, Nov 13, 2023 at 08:46:15AM +0000, Anders Munch wrote:
> tomas@tuxteam.de wrote:
> > OK -- this was roughly my train of thought: build the NFA, then invert that... OMG. Then I decided this is better left as an exercise to the reader.
> 
> At the DFA level it's easy.  So you can just convert the NFA to a DFA and work from there.  I did that exercise once upon a time, let me see if I can remember it.
> 
> First convert to DFA.

Which already involves a power set. Uh, oh ;-)

>                       For every node, add an outgoing edge to the acceptance state for every character that doesn't already have an outgoing node. Remove all edges to the acceptance state that were in the original DFA.
> 
> The main problem is not implementation.  It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for.  It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself.

Yes, I guess this is more or less what I hand-waved away with
my "modulo corner cases" (ain't natural language wonderful? ;)

For your example, one would have to append .* to all non-end
anchored (i.e. those not ending with $) regexps to better match
usual expectations. But who knows whether that's all.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-11 21:00 ` Emanuel Berg
@ 2023-11-13 19:26   ` tomas
  0 siblings, 0 replies; 11+ messages in thread
From: tomas @ 2023-11-13 19:26 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

On Sat, Nov 11, 2023 at 10:00:12PM +0100, Emanuel Berg wrote:
> Edgar Lux wrote:
> 
> > Hello. I am trying to get this regular expression:
> >
> >     "[^a][^b]"
> >
> > in an easier way. I thought that I could do
> >
> >     (rx (not (seq "a" "b")))
> >
> > but that got me
> >
> >     Debugger entered--Lisp error: (error "Illegal argument
> >     to rx ‘not’: (seq \"a\" \"b\")")
> >
> > The error is very clear, but I would like to know if there
> > is a smart way of achieving the same without having to type:
> >
> >     (rx (seq (not "a") (not "b")))
> >
> > which produces
> >
> >     "[^a][^b]"
> 
> (rx (not (any "a" "b")))
> 
> "[^ab]"

(string-match "[^a][^b]" "ba")
  => 0 ; note: 0 means it found a match at pos 0

(string-match "[^ab]" "ba")
  => nil

No.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])?
  2023-11-13  9:24           ` tomas
@ 2023-12-24 11:54             ` tomas
  0 siblings, 0 replies; 11+ messages in thread
From: tomas @ 2023-12-24 11:54 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

[...]

Late in this thread, but I fell into some
rabbit hole :)

> > The main problem is not implementation.  It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for.  It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself.
> 
> Yes, I guess this is more or less what I hand-waved away with
> my "modulo corner cases" (ain't natural language wonderful? ;)
> 
> For your example, one would have to append .* to all non-end
> anchored (i.e. those not ending with $) regexps to better match
> usual expectations. But who knows whether that's all.

Turns out that there is another, less known way to look at
at regular expressions. Here are two nice links i[1] [2] (the
second also shows that there are practical applications).

This "other way" actually has complement as one basic element
of the regexp language.

The regular languages defined there are equivalent to the
traditional ones, and at the end, finite automata are constructed
and all that, but boy, I'd have killed for "complement" some
time or other.

I have the hunch that complement is somehow related to (positive
and negative) lookahead and lookbehind as known from Perl,
but I'm not quite sure yet.

Enjoy

  Regular expressions and Brzozowski derivatives
  [1] https://en.wikipedia.org/wiki/Brzozowski_derivative
  [2] https://www.khoury.northeastern.edu/home/turon/re-deriv.pdf
-- 
tomás

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-12-24 11:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux
2023-11-11 21:00 ` Emanuel Berg
2023-11-13 19:26   ` tomas
2023-11-12  7:03 ` Michael Heerdegen
2023-11-12  7:26   ` tomas
2023-11-12  8:28     ` Yuri Khan
2023-11-12 10:38       ` tomas
2023-11-12 11:53         ` Michael Heerdegen
2023-11-13  8:46         ` Anders Munch
2023-11-13  9:24           ` tomas
2023-12-24 11:54             ` tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).