unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#58727: 29.0.50; rx doc: Semantics of RX...
@ 2022-10-23  2:32 Michael Heerdegen
  2022-10-23 16:14 ` Mattias Engdegård
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Heerdegen @ 2022-10-23  2:32 UTC (permalink / raw)
  To: 58727


Hello,

please document the semantics of multiple RXs for the RX repetition
operators (and maybe grouping operators, too).

The resulting regexps are concatenating like with an implicit `seq'.
This is not trivial, though: in stringish regexps the repetition
operators are only unary, and different interpretations would make sense
for `rx' (implicit `seq', implicit `or').

The docstring of `rx' doesn't tell anything about this.  The manual has
sentences like

| ‘(zero-or-more RX...)’
| ‘(0+ RX...)’
|      Match the RXs zero or more times.  Greedy by default.
|      Corresponding string regexp: ‘A*’ (greedy), ‘A*?’ (non-greedy)

but that suffers from the same problem that the semantics of A are not
clear: A == (seq RX...) ?

Oh, and maybe let's also make more clear that `rx' always cares about
implicit grouping when necessary.  For example, in
(info "(elisp) Rx Constructs") it's not trivial that e.g. in

‘(seq RX...)’
‘(sequence RX...)’
‘(: RX...)’
‘(and RX...)’
     Match the RXs in sequence.  Without arguments, the expression
     matches the empty string.
     Corresponding string regexp: ‘AB...’ (subexpressions in sequence).

`rx' silently adds shy grouping to the result, and the corresponding string
regexp in this case is more precisely \(?:AB...\).  I think it is enough
to mention this implicit grouping feature once, but it is important to
spell it out.
  

TIA,

Michael.







^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#58727: 29.0.50; rx doc: Semantics of RX...
  2022-10-23  2:32 bug#58727: 29.0.50; rx doc: Semantics of RX Michael Heerdegen
@ 2022-10-23 16:14 ` Mattias Engdegård
  2022-10-24  2:34   ` Michael Heerdegen
  0 siblings, 1 reply; 5+ messages in thread
From: Mattias Engdegård @ 2022-10-23 16:14 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: 58727

> The resulting regexps are concatenating like with an implicit `seq'.
> This is not trivial, though: in stringish regexps the repetition
> operators are only unary, and different interpretations would make sense
> for `rx' (implicit `seq', implicit `or').

The rule is implicit concatenation unless specified otherwise; maybe we could say that in the leading paragraph. (`or` is the only place where concatenation isn't done.)

Otherwise I think we should grant our readers some common sense. It's not a formal specification but meant for humans to understand, and I'm quite sure they do.

> Oh, and maybe let's also make more clear that `rx' always cares about
> implicit grouping when necessary.

No, there is no such thing in rx. The manual provides corresponding string-notation constructs for orientation only.
This is important -- rx forms are defined by their semantics, not by what strings they translate to.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#58727: 29.0.50; rx doc: Semantics of RX...
  2022-10-23 16:14 ` Mattias Engdegård
@ 2022-10-24  2:34   ` Michael Heerdegen
  2022-10-24 12:49     ` Mattias Engdegård
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Heerdegen @ 2022-10-24  2:34 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 58727

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> > The resulting regexps are concatenating like with an implicit `seq'.
> > This is not trivial, though: in stringish regexps the repetition
> > operators are only unary, and different interpretations would make sense
> > for `rx' (implicit `seq', implicit `or').
>
> The rule is implicit concatenation unless specified otherwise; maybe
> we could say that in the leading paragraph. (`or` is the only place
> where concatenation isn't done.)

Yes, that would be good.


> > Oh, and maybe let's also make more clear that `rx' always cares
> > about implicit grouping when necessary.
>
> No, there is no such thing in rx.

I think you misunderstood what I meant, I meant the implicit shy
grouping added in the return value, as in

  (rx (or "ab" "cd")) ==> "\\(?:ab\\|cd\\)"
                           ^^^^^       ^^^
> The manual provides corresponding string-notation constructs for
> orientation only.  This is important -- rx forms are defined by their
> semantics, not by what strings they translate to.

Is this trivial however?  Is it clear that, even for people that see rx
more as a translator to stringish regexps, `rx' is that smart?

A sentence like "rx forms are defined by their semantics" would help to
make that clear I think.
Dunno, I'm just guessing that here is a potential for misunderstanding.

Telling about the implicit concatenation of RX... is the more important
point for me.


Thanks so far,

Michael. 





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#58727: 29.0.50; rx doc: Semantics of RX...
  2022-10-24  2:34   ` Michael Heerdegen
@ 2022-10-24 12:49     ` Mattias Engdegård
  2022-10-25  2:49       ` Michael Heerdegen
  0 siblings, 1 reply; 5+ messages in thread
From: Mattias Engdegård @ 2022-10-24 12:49 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: 58727

24 okt. 2022 kl. 04.34 skrev Michael Heerdegen <michael_heerdegen@web.de>:

>> The rule is implicit concatenation unless specified otherwise; maybe
>> we could say that in the leading paragraph. (`or` is the only place
>> where concatenation isn't done.)
> 
> Yes, that would be good.

Now added.

> I meant the implicit shy
> grouping added in the return value

Yes, and this is simply not a problem in rx, nor on the abstract regexp level -- it's just a feature of the surface syntax of string regexps but that's not something that the rx docs are or should be preoccupied with.

(For that matter, 'shy grouping' is terrible terminology: it's obscure wording for something that is generally known as bracketing to the general population.)

>  (rx (or "ab" "cd")) ==> "\\(?:ab\\|cd\\)"
>                           ^^^^^       ^^^

This happens to be a cosmetic flaw in rx: in this case the brackets shouldn't be there at all, but getting rid of them is currently more trouble than it's worth. It does not affect matching performance. See it as an excess of packaging material which does not increase the shipping costs.

>> The manual provides corresponding string-notation constructs for
>> orientation only.  This is important -- rx forms are defined by their
>> semantics, not by what strings they translate to.
> 
> Is this trivial however?  Is it clear that, even for people that see rx
> more as a translator to stringish regexps, `rx' is that smart?

It's not that rx is smart, it's that it's not completely broken. Mentioning that rx adds brackets now and then is tantamount to saying that it's not buggy. 

We don't say that the byte-compiler emits jump instructions as needed, not just because it's superfluous information but also because such a statement suggests that it's not.

> A sentence like "rx forms are defined by their semantics" would help to
> make that clear I think.

Well, I added a phrase to that effect as well.

Thank you for your comments and suggestions!






^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#58727: 29.0.50; rx doc: Semantics of RX...
  2022-10-24 12:49     ` Mattias Engdegård
@ 2022-10-25  2:49       ` Michael Heerdegen
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Heerdegen @ 2022-10-25  2:49 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 58727-done

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> > A sentence like "rx forms are defined by their semantics" would help
> > to make that clear I think.
>
> Well, I added a phrase to that effect as well.

Thanks - I hope it was not too much.

> Thank you for your comments and suggestions!

And thank you for the implementation of these!


Regards,

Michael.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-25  2:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-23  2:32 bug#58727: 29.0.50; rx doc: Semantics of RX Michael Heerdegen
2022-10-23 16:14 ` Mattias Engdegård
2022-10-24  2:34   ` Michael Heerdegen
2022-10-24 12:49     ` Mattias Engdegård
2022-10-25  2:49       ` Michael Heerdegen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).