unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Shy groups and * ...eh, what?
@ 2022-10-22  4:24 Michael Heerdegen
  2022-10-22  6:11 ` Heime
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-22  4:24 UTC (permalink / raw)
  To: Emacs mailing list

Hello,

I wanted to be sure I correctly understood that if you give multiple RX
arguments to the `rx' `*' operator, they are implicitly interpreted as a
sequence (AFAIU, that's the case.  An implicit `or' would also make
sense, that's why I wondered).

Anyway, here is what I tried:

(string-match-p
 (rx bos (* "a" "b") eos)
 "a")
==> 0

(string-match-p
 (rx bos (* "a" "b") eos)
 "b")
==> nil

Eh - what?  With evaluated `rx' forms this is

(string-match-p
 "\\`\\(?:ab\\)*\\'"
 "a")
==> 0

(string-match-p
 "\\`\\(?:ab\\)*\\'"
 "b")
==> nil

Makes no sense to me.  When I change the wrapping shy groups to normal
groups the result makes more sense to me:

(string-match-p
 "\\`\\(ab\\)*\\'"
 "a")
==> nil

(string-match-p
 "\\`\\(ab\\)*\\'"
 "b")
==> nil

Do I miss something or is it just a bug?


TIA,

Michael.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
@ 2022-10-22  6:11 ` Heime
  2022-10-22  6:39 ` tomas
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Heime @ 2022-10-22  6:11 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: Emacs mailing list

------- Original Message -------
On Saturday, October 22nd, 2022 at 4:24 AM, Michael Heerdegen <michael_heerdegen@web.de> wrote:


> Hello,
> 
> I wanted to be sure I correctly understood that if you give multiple RX
> arguments to the `rx'` ' operator, they are implicitly interpreted as a
> sequence (AFAIU, that's the case. An implicit `or' would also make
> sense, that's why I wondered).

You are correct, they should be interpreted as a sequence.
 
> Anyway, here is what I tried:
> 
> (string-match-p
> (rx bos ( "a" "b") eos)
> "a")
> ==> 0
> 
> 
> (string-match-p
> (rx bos (* "a" "b") eos)
> "b")
> ==> nil
> 
> 
> Eh - what? With evaluated `rx' forms this is (string-match-p "\\\\`\\(?:ab\\)*\\'"
> "a")
> ==> 0
> 
> 
> (string-match-p
> "\\`\\(?:ab\\)*\\'"
> "b")
> ==> nil
> 
> 
> Makes no sense to me. When I change the wrapping shy groups to normal
> groups the result makes more sense to me:
> 
> (string-match-p
> "\\`\\(ab\\)*\\'"
> "a")
> ==> nil
> 
> 
> (string-match-p
> "\\`\\(ab\\)*\\'"
> "b")
> ==> nil
> 
> 
> Do I miss something or is it just a bug?
> 
> 
> TIA,
> 
> Michael.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
  2022-10-22  6:11 ` Heime
@ 2022-10-22  6:39 ` tomas
  2022-10-23  2:47   ` Michael Heerdegen
  2022-10-22  6:49 ` Heime
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: tomas @ 2022-10-22  6:39 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

On Sat, Oct 22, 2022 at 06:24:39AM +0200, Michael Heerdegen wrote:
> Hello,
> 
> I wanted to be sure I correctly understood that if you give multiple RX
> arguments to the `rx' `*' operator, they are implicitly interpreted as a
> sequence (AFAIU, that's the case.  An implicit `or' would also make
> sense, that's why I wondered).

I didn't try to repeat your examples, but you are right: there is at least
a big smoking hole in the docs: all of the repetition operators seem to
take zero (one?) or more arguments according to the syntax shorthand, but
the text refers to just one term, like here:

  ‘(zero-or-more RX...)’
  ‘(0+ RX...)’
       Match the RXs zero or more times.  Greedy by default.
       Corresponding string regexp: ‘A*’ (greedy), ‘A*?’ (non-greedy)

...what is this `A' the text is referring to? Your experiments suggest
that it is the sequence of the `RX...' in the syntax shorthand (i.e.
(seq RX...), but the docs just leave it open :)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
  2022-10-22  6:11 ` Heime
  2022-10-22  6:39 ` tomas
@ 2022-10-22  6:49 ` Heime
  2022-10-22  7:37 ` Michael Heerdegen
  2022-10-22 19:54 ` [External] : " Drew Adams
  4 siblings, 0 replies; 17+ messages in thread
From: Heime @ 2022-10-22  6:49 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: Emacs mailing list






Sent with Proton Mail secure email.

------- Original Message -------
On Saturday, October 22nd, 2022 at 4:24 AM, Michael Heerdegen <michael_heerdegen@web.de> wrote:


> Hello,
> 
> I wanted to be sure I correctly understood that if you give multiple RX
> arguments to the `rx'` ' operator, they are implicitly interpreted as a
> sequence (AFAIU, that's the case. An implicit `or' would also make
> sense, that's why I wondered).
> 
> Anyway, here is what I tried:
> 
> (string-match-p
> (rx bos ( "a" "b") eos)
> "a")
> ==> 0
> 
> 
> (string-match-p
> (rx bos (* "a" "b") eos)
> "b")
> ==> nil
 
I usually do

(string-match-p (rx (seq bos (or "aa" "bb" "cc") (zero-or-more any) eos)) "bboeuoeu")

 
> Eh - what? With evaluated `rx' forms this is (string-match-p "\\\\`\\(?:ab\\)*\\'"
> "a")
> ==> 0
> 
> 
> (string-match-p
> "\\`\\(?:ab\\)*\\'"
> "b")
> ==> nil
> 
> 
> Makes no sense to me. When I change the wrapping shy groups to normal
> groups the result makes more sense to me:
> 
> (string-match-p
> "\\`\\(ab\\)*\\'"
> "a")
> ==> nil
> 
> 
> (string-match-p
> "\\`\\(ab\\)*\\'"
> "b")
> ==> nil
> 
> 
> Do I miss something or is it just a bug?
> 
> 
> TIA,
> 
> Michael.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
                   ` (2 preceding siblings ...)
  2022-10-22  6:49 ` Heime
@ 2022-10-22  7:37 ` Michael Heerdegen
  2022-10-22  8:34   ` tomas
  2022-10-22 11:12   ` Bruno Barbier
  2022-10-22 19:54 ` [External] : " Drew Adams
  4 siblings, 2 replies; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-22  7:37 UTC (permalink / raw)
  To: help-gnu-emacs

Hello again,

thanks for answers so far.  We may want to improve that aspect in the
docstring of `rx' - but note that the more important question was about
why "\\`\\(?:ab\\)*\\'" matches "a" - that looks like a bug to me.

Michael.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  7:37 ` Michael Heerdegen
@ 2022-10-22  8:34   ` tomas
  2022-10-22  8:41     ` Michael Heerdegen
  2022-10-22 11:12   ` Bruno Barbier
  1 sibling, 1 reply; 17+ messages in thread
From: tomas @ 2022-10-22  8:34 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

On Sat, Oct 22, 2022 at 09:37:40AM +0200, Michael Heerdegen wrote:
> Hello again,
> 
> thanks for answers so far.  We may want to improve that aspect in the
> docstring of `rx' - but note that the more important question was about
> why "\\`\\(?:ab\\)*\\'" matches "a" - that looks like a bug to me.

At least an inconsistency. Since the docs don't say what a repeat operator
with more (or less?) than one argument is supposed to mean...

Personally, I'd disallow repeat operators with argument counts different
from one.

Usually, I'm for extending interfaces as far as it gets, but in this case
there doesn't seem to be an obvious and compelling extension (sequence?
alternative?), so I'd feel that there is a footgun for little gain.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  8:34   ` tomas
@ 2022-10-22  8:41     ` Michael Heerdegen
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-22  8:41 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> > thanks for answers so far.  We may want to improve that aspect in the
> > docstring of `rx' - but note that the more important question was about
> > why "\\`\\(?:ab\\)*\\'" matches "a" - that looks like a bug to me.
>
> At least an inconsistency. Since the docs don't say what a repeat
> operator with more (or less?) than one argument is supposed to mean...

For the stringish regexp case there is only one argument IMO for the *
postfix operator, and that is the regexp "\\(?:ab\\)" in this case.

Michael.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  7:37 ` Michael Heerdegen
  2022-10-22  8:34   ` tomas
@ 2022-10-22 11:12   ` Bruno Barbier
  2022-10-23  2:45     ` Michael Heerdegen
  1 sibling, 1 reply; 17+ messages in thread
From: Bruno Barbier @ 2022-10-22 11:12 UTC (permalink / raw)
  To: Michael Heerdegen, help-gnu-emacs

Michael Heerdegen <michael_heerdegen@web.de> writes:

> Hello again,
>
> thanks for answers so far.  We may want to improve that aspect in the
> docstring of `rx' - but note that the more important question was about
> why "\\`\\(?:ab\\)*\\'" matches "a" - that looks like a bug to me.
>

Looks like a real bug to me too, when using shy groups: it should match
the whole "ab" or nothing, like it does when using implicit or explicit
groups.


And if you request exactly one "a", "a" doesn't match anymore ...

(string-match-p "\\`\\(?:a\\{1\\}b\\)*\\'" "a")
=> nil

I'm getting the same results with emacs 27, emacs 28 and emacs 29.

Shouldn't you use a more alarming title ?

Bruno
















^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [External] : Shy groups and * ...eh, what?
  2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
                   ` (3 preceding siblings ...)
  2022-10-22  7:37 ` Michael Heerdegen
@ 2022-10-22 19:54 ` Drew Adams
  4 siblings, 0 replies; 17+ messages in thread
From: Drew Adams @ 2022-10-22 19:54 UTC (permalink / raw)
  To: Michael Heerdegen, Emacs mailing list

See also https://emacs.stackexchange.com/q/74211.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22 11:12   ` Bruno Barbier
@ 2022-10-23  2:45     ` Michael Heerdegen
  2022-10-24  4:27       ` Emanuel Berg
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-23  2:45 UTC (permalink / raw)
  To: help-gnu-emacs

Bruno Barbier <brubar.cs@gmail.com> writes:

> Shouldn't you use a more alarming title ?

Yes, the subject was a mistake.  I created a bug report now - see
Bug#58726 (29.0.50; Bug in regexp matching with shy groups).


Thanks,

Michael.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-22  6:39 ` tomas
@ 2022-10-23  2:47   ` Michael Heerdegen
  2022-10-24  4:45     ` tomas
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-23  2:47 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

>   ‘(zero-or-more RX...)’
>   ‘(0+ RX...)’
>        Match the RXs zero or more times.  Greedy by default.
>        Corresponding string regexp: ‘A*’ (greedy), ‘A*?’ (non-greedy)
>
> ...what is this `A' the text is referring to? Your experiments suggest
> that it is the sequence of the `RX...' in the syntax shorthand (i.e.
> (seq RX...), but the docs just leave it open :)

I see it like you.

I opened Bug#58727 (29.0.50; rx doc: Semantics of RX...) for this
problem.


Thanks,

Michael.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-23  2:45     ` Michael Heerdegen
@ 2022-10-24  4:27       ` Emanuel Berg
  0 siblings, 0 replies; 17+ messages in thread
From: Emanuel Berg @ 2022-10-24  4:27 UTC (permalink / raw)
  To: help-gnu-emacs

Michael Heerdegen wrote:

>> Shouldn't you use a more alarming title ?
>
> Yes, the subject was a mistake. I created a bug report now -
> see Bug#58726 (29.0.50; Bug in regexp matching with shy
> groups).

Well, the first reaction to everything one hasn't seen before
that appears incorrect is always that of confusion ...

Intuition is a trained skill ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-23  2:47   ` Michael Heerdegen
@ 2022-10-24  4:45     ` tomas
  2022-10-24 21:01       ` Emanuel Berg
  2022-10-25  3:12       ` Michael Heerdegen
  0 siblings, 2 replies; 17+ messages in thread
From: tomas @ 2022-10-24  4:45 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 216 bytes --]

On Sun, Oct 23, 2022 at 04:47:24AM +0200, Michael Heerdegen wrote:

[...]

> I opened Bug#58727 (29.0.50; rx doc: Semantics of RX...) for this
> problem.
> 
> 
> Thanks,

Thank *you* :-)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-24  4:45     ` tomas
@ 2022-10-24 21:01       ` Emanuel Berg
  2022-10-25  3:12       ` Michael Heerdegen
  1 sibling, 0 replies; 17+ messages in thread
From: Emanuel Berg @ 2022-10-24 21:01 UTC (permalink / raw)
  To: help-gnu-emacs

tomas wrote:

>> I opened Bug#58727 (29.0.50; rx doc: Semantics of RX...)
>> for this problem.
>> 
>> Thanks,
>
> Thank *you* :-)

Very much so ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-24  4:45     ` tomas
  2022-10-24 21:01       ` Emanuel Berg
@ 2022-10-25  3:12       ` Michael Heerdegen
  2022-10-25  3:44         ` Emanuel Berg
  2022-10-25  4:43         ` tomas
  1 sibling, 2 replies; 17+ messages in thread
From: Michael Heerdegen @ 2022-10-25  3:12 UTC (permalink / raw)
  To: help-gnu-emacs

<tomas@tuxteam.de> writes:

> > Thanks,
>
> Thank *you* :-)
>
> Cheers

Actually, we all have to thank Mattias Engdegård who has fixed all of
this in the meantime (in master)  ;-)

Michael.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-25  3:12       ` Michael Heerdegen
@ 2022-10-25  3:44         ` Emanuel Berg
  2022-10-25  4:43         ` tomas
  1 sibling, 0 replies; 17+ messages in thread
From: Emanuel Berg @ 2022-10-25  3:44 UTC (permalink / raw)
  To: help-gnu-emacs

Michael Heerdegen wrote:

>>> Thanks
>>
>> Thank *you* :-)
>
> Actually, we all have to thank Mattias Engdegård who has
> fixed all of this in the meantime (in master) ;-)

You better believe it ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shy groups and * ...eh, what?
  2022-10-25  3:12       ` Michael Heerdegen
  2022-10-25  3:44         ` Emanuel Berg
@ 2022-10-25  4:43         ` tomas
  1 sibling, 0 replies; 17+ messages in thread
From: tomas @ 2022-10-25  4:43 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

On Tue, Oct 25, 2022 at 05:12:12AM +0200, Michael Heerdegen wrote:
> <tomas@tuxteam.de> writes:
> 
> > > Thanks,
> >
> > Thank *you* :-)
> >
> > Cheers
> 
> Actually, we all have to thank Mattias Engdegård who has fixed all of
> this in the meantime (in master)  ;-)

Wow. Thanks you all.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-10-25  4:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-22  4:24 Shy groups and * ...eh, what? Michael Heerdegen
2022-10-22  6:11 ` Heime
2022-10-22  6:39 ` tomas
2022-10-23  2:47   ` Michael Heerdegen
2022-10-24  4:45     ` tomas
2022-10-24 21:01       ` Emanuel Berg
2022-10-25  3:12       ` Michael Heerdegen
2022-10-25  3:44         ` Emanuel Berg
2022-10-25  4:43         ` tomas
2022-10-22  6:49 ` Heime
2022-10-22  7:37 ` Michael Heerdegen
2022-10-22  8:34   ` tomas
2022-10-22  8:41     ` Michael Heerdegen
2022-10-22 11:12   ` Bruno Barbier
2022-10-23  2:45     ` Michael Heerdegen
2022-10-24  4:27       ` Emanuel Berg
2022-10-22 19:54 ` [External] : " Drew Adams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).