* How to get a concatenation of the negations with rx (ex: [^a][^b])? @ 2023-11-11 20:17 Edgar Lux 2023-11-11 21:00 ` Emanuel Berg 2023-11-12 7:03 ` Michael Heerdegen 0 siblings, 2 replies; 11+ messages in thread From: Edgar Lux @ 2023-11-11 20:17 UTC (permalink / raw) To: Help Gnu Emacs Hello. I am trying to get this regular expression: "[^a][^b]" in an easier way. I thought that I could do (rx (not (seq "a" "b"))) but that got me Debugger entered--Lisp error: (error "Illegal argument to rx ‘not’: (seq \"a\" \"b\")") The error is very clear, but I would like to know if there is a smart way of achieving the same without having to type: (rx (seq (not "a") (not "b"))) which produces "[^a][^b]" Thank you very much. -- Sent with https://mailfence.com Secure and private email ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux @ 2023-11-11 21:00 ` Emanuel Berg 2023-11-13 19:26 ` tomas 2023-11-12 7:03 ` Michael Heerdegen 1 sibling, 1 reply; 11+ messages in thread From: Emanuel Berg @ 2023-11-11 21:00 UTC (permalink / raw) To: help-gnu-emacs Edgar Lux wrote: > Hello. I am trying to get this regular expression: > > "[^a][^b]" > > in an easier way. I thought that I could do > > (rx (not (seq "a" "b"))) > > but that got me > > Debugger entered--Lisp error: (error "Illegal argument > to rx ‘not’: (seq \"a\" \"b\")") > > The error is very clear, but I would like to know if there > is a smart way of achieving the same without having to type: > > (rx (seq (not "a") (not "b"))) > > which produces > > "[^a][^b]" (rx (not (any "a" "b"))) "[^ab]" ? -- underground experts united https://dataswamp.org/~incal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-11 21:00 ` Emanuel Berg @ 2023-11-13 19:26 ` tomas 0 siblings, 0 replies; 11+ messages in thread From: tomas @ 2023-11-13 19:26 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 829 bytes --] On Sat, Nov 11, 2023 at 10:00:12PM +0100, Emanuel Berg wrote: > Edgar Lux wrote: > > > Hello. I am trying to get this regular expression: > > > > "[^a][^b]" > > > > in an easier way. I thought that I could do > > > > (rx (not (seq "a" "b"))) > > > > but that got me > > > > Debugger entered--Lisp error: (error "Illegal argument > > to rx ‘not’: (seq \"a\" \"b\")") > > > > The error is very clear, but I would like to know if there > > is a smart way of achieving the same without having to type: > > > > (rx (seq (not "a") (not "b"))) > > > > which produces > > > > "[^a][^b]" > > (rx (not (any "a" "b"))) > > "[^ab]" (string-match "[^a][^b]" "ba") => 0 ; note: 0 means it found a match at pos 0 (string-match "[^ab]" "ba") => nil No. Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux 2023-11-11 21:00 ` Emanuel Berg @ 2023-11-12 7:03 ` Michael Heerdegen 2023-11-12 7:26 ` tomas 1 sibling, 1 reply; 11+ messages in thread From: Michael Heerdegen @ 2023-11-12 7:03 UTC (permalink / raw) To: help-gnu-emacs Edgar Lux <edgarlux@mailfence.com> writes: > Hello. I am trying to get this regular expression: > > "[^a][^b]" > > in an easier way. I thought that I could do > > (rx (not (seq "a" "b"))) No, > but that got me > > Debugger entered--Lisp error: (error "Illegal argument to rx > ‘not’: (seq \"a\" \"b\")") this would also express something different: `not' is not distributive over `seq' concatenation of regexps -- a string starting with a character different from 'a' followed by a character different from 'b' is something different than a string not starting with "ab". > The error is very clear, I think `match a string not starting with "ab"' is not even expressible using a standalone standard regexp. That would require regexps to be able to backtrack when matching, and it would be something more general than regexps. > but I would like to know if there is a smart way of achieving the same > without having to type: > > (rx (seq (not "a") (not "b"))) Unless you actually wanted to express something different: the `seq' wrapper is redundant, but with (rx (not "a") (not "b")) a local maximum of smartness is reached. Michael. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-12 7:03 ` Michael Heerdegen @ 2023-11-12 7:26 ` tomas 2023-11-12 8:28 ` Yuri Khan 0 siblings, 1 reply; 11+ messages in thread From: tomas @ 2023-11-12 7:26 UTC (permalink / raw) To: Michael Heerdegen; +Cc: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 1313 bytes --] On Sun, Nov 12, 2023 at 08:03:18AM +0100, Michael Heerdegen wrote: > Edgar Lux <edgarlux@mailfence.com> writes: > > > Hello. I am trying to get this regular expression: > > > > "[^a][^b]" > > > > in an easier way. I thought that I could do > > > > (rx (not (seq "a" "b"))) > > No, [...] > I think `match a string not starting with "ab"' is not even expressible > using a standalone standard regexp. That would require regexps to be > able to backtrack when matching, and it would be something more general > than regexps. Actually... the complement of a regular language is also a regular language, so there should be a regexp for that, too. But regexps don't seem to have a negation operator very often, do they? (I know of negated lookarounds, e.g. in PCRE, but...) Off the bat I don't know of a general way to construct a regexp matching the complement of a language. In the above, case, you might just do something like "[^a].|.[^b]", if I'm not making an off-by-one error (yes, I'm missing "^.$" ;-) It reminds one of De Morgan... So in rx, (not (seq "a" "b")) might rather correspond to something like (or (seq (not "a") anychar) (seq anychar (not "b"))) modulo corner cases like an "one char sequence", which also wouldn't be "ab". Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-12 7:26 ` tomas @ 2023-11-12 8:28 ` Yuri Khan 2023-11-12 10:38 ` tomas 0 siblings, 1 reply; 11+ messages in thread From: Yuri Khan @ 2023-11-12 8:28 UTC (permalink / raw) To: tomas; +Cc: Michael Heerdegen, help-gnu-emacs On Sun, 12 Nov 2023 at 14:27, <tomas@tuxteam.de> wrote: > Actually... the complement of a regular language is also a regular > language, so there should be a regexp for that, too. In all(?) the courses that taught me the theory of regular languages, the construction of a negation of a regexp would be suitable as the practical part of an exam question sheet. As the first step, it is relatively straightforward to build a non-deterministic finite state machine from the original regexp; then we complement that NDFA’s set of accepting states; and then it’s a tedious, error-prone job of building a regexp equivalent to that complemented NDFA. The regexp thus obtained is often not a pretty sight, either. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-12 8:28 ` Yuri Khan @ 2023-11-12 10:38 ` tomas 2023-11-12 11:53 ` Michael Heerdegen 2023-11-13 8:46 ` Anders Munch 0 siblings, 2 replies; 11+ messages in thread From: tomas @ 2023-11-12 10:38 UTC (permalink / raw) To: Yuri Khan; +Cc: Michael Heerdegen, help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 1227 bytes --] On Sun, Nov 12, 2023 at 03:28:21PM +0700, Yuri Khan wrote: > On Sun, 12 Nov 2023 at 14:27, <tomas@tuxteam.de> wrote: > > > Actually... the complement of a regular language is also a regular > > language, so there should be a regexp for that, too. > > In all(?) the courses that taught me the theory of regular languages, > the construction of a negation of a regexp would be suitable as the > practical part of an exam question sheet. > > As the first step, it is relatively straightforward to build a > non-deterministic finite state machine from the original regexp; then > we complement that NDFA’s set of accepting states; and then it’s a > tedious, error-prone job of building a regexp equivalent to that > complemented NDFA. OK -- this was roughly my train of thought: build the NFA, then invert that... OMG. Then I decided this is better left as an exercise to the reader. But since a regexp library is building the NFA anyway, I think it's surprising that it doesn't support negation. But I haven't ever tried to implement that; perhaps I'd know the answer then :) > The regexp thus obtained is often not a pretty sight, either. I believe you right away on that! Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-12 10:38 ` tomas @ 2023-11-12 11:53 ` Michael Heerdegen 2023-11-13 8:46 ` Anders Munch 1 sibling, 0 replies; 11+ messages in thread From: Michael Heerdegen @ 2023-11-12 11:53 UTC (permalink / raw) To: help-gnu-emacs tomas@tuxteam.de writes: > > The regexp thus obtained is often not a pretty sight, either. > > I believe you right away on that! I guess the result could involve huge `or' specs that simply list all possibilities of a finite but huge number of remaining character combinations not fulfilling a certain requirement. At least that's what I would guess how the conversion result of the negated automaton would look like. Michael. ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-12 10:38 ` tomas 2023-11-12 11:53 ` Michael Heerdegen @ 2023-11-13 8:46 ` Anders Munch 2023-11-13 9:24 ` tomas 1 sibling, 1 reply; 11+ messages in thread From: Anders Munch @ 2023-11-13 8:46 UTC (permalink / raw) To: help-gnu-emacs tomas@tuxteam.de wrote: > OK -- this was roughly my train of thought: build the NFA, then invert that... OMG. Then I decided this is better left as an exercise to the reader. At the DFA level it's easy. So you can just convert the NFA to a DFA and work from there. I did that exercise once upon a time, let me see if I can remember it. First convert to DFA. For every node, add an outgoing edge to the acceptance state for every character that doesn't already have an outgoing node. Remove all edges to the acceptance state that were in the original DFA. The main problem is not implementation. It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for. It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself. regards, Anders ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-13 8:46 ` Anders Munch @ 2023-11-13 9:24 ` tomas 2023-12-24 11:54 ` tomas 0 siblings, 1 reply; 11+ messages in thread From: tomas @ 2023-11-13 9:24 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 1455 bytes --] On Mon, Nov 13, 2023 at 08:46:15AM +0000, Anders Munch wrote: > tomas@tuxteam.de wrote: > > OK -- this was roughly my train of thought: build the NFA, then invert that... OMG. Then I decided this is better left as an exercise to the reader. > > At the DFA level it's easy. So you can just convert the NFA to a DFA and work from there. I did that exercise once upon a time, let me see if I can remember it. > > First convert to DFA. Which already involves a power set. Uh, oh ;-) > For every node, add an outgoing edge to the acceptance state for every character that doesn't already have an outgoing node. Remove all edges to the acceptance state that were in the original DFA. > > The main problem is not implementation. It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for. It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself. Yes, I guess this is more or less what I hand-waved away with my "modulo corner cases" (ain't natural language wonderful? ;) For your example, one would have to append .* to all non-end anchored (i.e. those not ending with $) regexps to better match usual expectations. But who knows whether that's all. Cheers -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to get a concatenation of the negations with rx (ex: [^a][^b])? 2023-11-13 9:24 ` tomas @ 2023-12-24 11:54 ` tomas 0 siblings, 0 replies; 11+ messages in thread From: tomas @ 2023-12-24 11:54 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 1639 bytes --] [...] Late in this thread, but I fell into some rabbit hole :) > > The main problem is not implementation. It's that it's not obvious what to use them for in the variable-length searches that regexes are typically used for. It's just confusing that the string "abz" is a match for the regular expression "not ab", and if you were looking for a two-character string that is not "ab", then a general negation operator isn't going to help you, at least not by itself. > > Yes, I guess this is more or less what I hand-waved away with > my "modulo corner cases" (ain't natural language wonderful? ;) > > For your example, one would have to append .* to all non-end > anchored (i.e. those not ending with $) regexps to better match > usual expectations. But who knows whether that's all. Turns out that there is another, less known way to look at at regular expressions. Here are two nice links i[1] [2] (the second also shows that there are practical applications). This "other way" actually has complement as one basic element of the regexp language. The regular languages defined there are equivalent to the traditional ones, and at the end, finite automata are constructed and all that, but boy, I'd have killed for "complement" some time or other. I have the hunch that complement is somehow related to (positive and negative) lookahead and lookbehind as known from Perl, but I'm not quite sure yet. Enjoy Regular expressions and Brzozowski derivatives [1] https://en.wikipedia.org/wiki/Brzozowski_derivative [2] https://www.khoury.northeastern.edu/home/turon/re-deriv.pdf -- tomás [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-24 11:54 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-11 20:17 How to get a concatenation of the negations with rx (ex: [^a][^b])? Edgar Lux 2023-11-11 21:00 ` Emanuel Berg 2023-11-13 19:26 ` tomas 2023-11-12 7:03 ` Michael Heerdegen 2023-11-12 7:26 ` tomas 2023-11-12 8:28 ` Yuri Khan 2023-11-12 10:38 ` tomas 2023-11-12 11:53 ` Michael Heerdegen 2023-11-13 8:46 ` Anders Munch 2023-11-13 9:24 ` tomas 2023-12-24 11:54 ` tomas
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.