* rx.el sexp regexp syntax (WAS: Off Topic)
@ 2018-05-24 10:47 Noam Postavsky
2018-05-24 10:58 ` Van L
2018-05-25 2:57 ` Richard Stallman
0 siblings, 2 replies; 54+ messages in thread
From: Noam Postavsky @ 2018-05-24 10:47 UTC (permalink / raw)
To: Emacs developers; +Cc: Van L, Eli Zaretskii, Richard Stallman
On 24 May 2018 at 05:03, Robert Pluim <rpluim@gmail.com> wrote:
> Richard Stallman <rms@gnu.org> writes:
>> > > We have such things but we haven't adopted any of them in Emacs itself.
>>
>> > Doesn't rx.el qualify?
>>
>> It's an example of what I said. We have it, but we don't actually use it
>> much if at all. This suggests to me that it has drawbacks which prevent
>> it from being clearly superior.
Ah, I misunderstood what you meant by "adopted".
> Iʼve never used rx.el because I didnʼt know it existed. Itʼs not
> described in the regular expression chapter of the emacs lisp
> reference manual, nor in the emacs user manual.
>
>> If someone comes up with a replacement syntax that reduces the drawbacks,
>> we might start using it all the time.
>
> Documenting rx.el would be a more productive use of time, I think,
> especially since Iʼve now noticed via reading rx.el that we already
> have sregex as well. Let's pick one rather than inventing a third
> syntax.
sregex.el is in lisp/obsolete/, so we have picked rx.el.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky
@ 2018-05-24 10:58 ` Van L
2018-05-25 2:57 ` Richard Stallman
1 sibling, 0 replies; 54+ messages in thread
From: Van L @ 2018-05-24 10:58 UTC (permalink / raw)
To: Noam Postavsky; +Cc: Emacs developers
> Noam Postavsky writes:
>
> Ah, I misunderstood what you meant by "adopted”.
A round-trip test for Emacs with voice-recognition is to listen to two and three year olds for requests like:
Hey, Emacs, roar like a fierce tiger.
Hey, Emacs, sing like a black bird.
Hey, Emacs, what do zebras do?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky
2018-05-24 10:58 ` Van L
@ 2018-05-25 2:57 ` Richard Stallman
2018-05-25 8:52 ` Pierre Neidhardt
1 sibling, 1 reply; 54+ messages in thread
From: Richard Stallman @ 2018-05-25 2:57 UTC (permalink / raw)
To: Noam Postavsky; +Cc: van, eliz, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Documenting rx.el better and more visibly is a good idea. We can see
if people find it convenient to use.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-25 2:57 ` Richard Stallman
@ 2018-05-25 8:52 ` Pierre Neidhardt
2018-05-25 15:51 ` Alan Mackenzie
2018-05-27 20:16 ` Stefan Monnier
0 siblings, 2 replies; 54+ messages in thread
From: Pierre Neidhardt @ 2018-05-25 8:52 UTC (permalink / raw)
To: rms; +Cc: van, eliz, Noam Postavsky, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]
rx.el is one of the best concepts I've discovered in a long time.
It's another instance of "Don't come up with a new (mini)language when
Lisp can do better": it's easier to learn, more flexible, easier to
write, much easier to read and as a consequence much more maintainable.
> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.
> -- Jamie Zawinski
It's also much more "programmable" thanks to its `eval' expression.
(It's possible to count!)
See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/
for some nice examples.
I think it's high time we moved away from traditional regexps and
embraced the concept of rx.el. I'm thinking of implementing it for
Guile.
At the moment the rx.el implementation is built on top of Emacs regexps
which are implemented in C. I believe this does not use the power of
Lisp as much as it could.
The traditional regexps work in two steps: first build a blackbox
automaton from the string expression, then test if the input matches.
Building the automaton is costly. In C, we build it once and save the
result in a variable so that every regexp match does not rebuild the
automaton each time.
In high-level languages, automatons are automatically cached to save the
cost of building them.
The rx.el library/concept could alleviate this issue altogether: because
we express the automaton directly in Lisp, the parsing step is not
needed and thus the building cost could be tremendously reduced.
So the rx.el building steps
rx expression -> regexp string -> C regexp automaton
could boil down to simply
rx automaton
It would be interesting to compare the performance. This also means
that there would be no need for caching on behalf of the supporting
language.
What do you think?
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-25 8:52 ` Pierre Neidhardt
@ 2018-05-25 15:51 ` Alan Mackenzie
2018-05-25 16:47 ` Pierre Neidhardt
` (2 more replies)
2018-05-27 20:16 ` Stefan Monnier
1 sibling, 3 replies; 54+ messages in thread
From: Alan Mackenzie @ 2018-05-25 15:51 UTC (permalink / raw)
To: Pierre Neidhardt; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky
Hello, Pierre.
On Fri, May 25, 2018 at 10:52:03 +0200, Pierre Neidhardt wrote:
> rx.el is one of the best concepts I've discovered in a long time.
> It's another instance of "Don't come up with a new (mini)language when
> Lisp can do better": it's easier to learn, more flexible, easier to
> write, much easier to read and as a consequence much more maintainable.
Much easier than what? Than the putative mini-language that doesn't get
written?
> > Some people, when confronted with a problem, think "I know, I'll use
> > regular expressions." Now they have two problems.
> > -- Jamie Zawinski
> It's also much more "programmable" thanks to its `eval' expression.
> (It's possible to count!)
> See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/
> for some nice examples.
> I think it's high time we moved away from traditional regexps and
> embraced the concept of rx.el. I'm thinking of implementing it for
> Guile.
There's nothing stopping anybody from using rx.el. However, people have
mostly _not_ used it. The "I think it's high time ...." suggests in
some way forcing people to use it. Before mandating something like
this, I think we should find out why it's not already in common use.
> At the moment the rx.el implementation is built on top of Emacs regexps
> which are implemented in C. I believe this does not use the power of
> Lisp as much as it could.
But would any alternative use the power of regexps?
> The traditional regexps work in two steps: first build a blackbox
> automaton from the string expression, then test if the input matches.
> Building the automaton is costly. In C, we build it once and save the
> result in a variable so that every regexp match does not rebuild the
> automaton each time.
Emacs has a (moderately large) cache of regexps, so that building the
automatons is done very rarely. Possibly just once each for each
session of Emacs.
> In high-level languages, automatons are automatically cached to save the
> cost of building them.
Emacs Lisp does this too.
> The rx.el library/concept could alleviate this issue altogether: because
> we express the automaton directly in Lisp, the parsing step is not
> needed and thus the building cost could be tremendously reduced.
> So the rx.el building steps
> rx expression -> regexp string -> C regexp automaton
> could boil down to simply
> rx automaton
I don't see what you're trying to save, here. At some stage, the regexp
source, in whatever form, needs to be converted to an automaton.
Are you suggesting here building an interpreter in Lisp directly to
execute rx expressions?
> It would be interesting to compare the performance. This also means
> that there would be no need for caching on behalf of the supporting
> language.
I will predict that an rx interpreter built in Lisp will be two orders
of magnitude slower than the current regexp machine, where both the
construction of an automaton, and the byte-code interpreter which runs
it are written in C (and probably quite optimised C at that).
Regexp performance is critical to Emacs's performance in general.
> What do you think?
I think we will, in the main, carry on using conventional regular
expressions expressed as strings. I can't get excited about rx syntax,
which I'm sure would be just as tedious, and possibly more difficult to
read than a standard regexp. Analagously, as a musician, I read
standard musical notation (with sets of five lines and dots) far more
easily and fluently than I could any "simplified" system designed for
beginners, which would be bloated by comparison.
Regular expressions can be difficult. I don't believe this difficulty
lies, in the main, in the compact notation used to express them. Rather
it lies in the concepts and the semantics of the regexp elements, and
being able to express a "mental automaton" in regexp semantics.
> --
> Pierre Neidhardt
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-25 15:51 ` Alan Mackenzie
@ 2018-05-25 16:47 ` Pierre Neidhardt
2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen
2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
2018-05-27 16:56 ` Tom Tromey
2018-05-27 20:23 ` Stefan Monnier
2 siblings, 2 replies; 54+ messages in thread
From: Pierre Neidhardt @ 2018-05-25 16:47 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky
[-- Attachment #1: Type: text/plain, Size: 5929 bytes --]
Alan Mackenzie <acm@muc.de> writes:
>> rx.el is one of the best concepts I've discovered in a long time.
>> It's another instance of "Don't come up with a new (mini)language when
>> Lisp can do better": it's easier to learn, more flexible, easier to
>> write, much easier to read and as a consequence much more maintainable.
>
> Much easier than what? Than the putative mini-language that doesn't get
> written?
I meant that in my opinion rx is easier to write than regexps. That it
is not popular is the root of the question here.
>> I think it's high time we moved away from traditional regexps and
>> embraced the concept of rx.el. I'm thinking of implementing it for
>> Guile.
>
> There's nothing stopping anybody from using rx.el. However, people have
> mostly _not_ used it. The "I think it's high time ...." suggests in
> some way forcing people to use it. Before mandating something like
> this, I think we should find out why it's not already in common use.
Sorry if you felt I was forcing, that wasn't my intention. I was
referring to the long period regexps have been around.
I thought the reason it's not already in common use had already been
discussed: it's barely referenced anywhere, it needs more advertising.
Correct me if this is wrong.
>> At the moment the rx.el implementation is built on top of Emacs regexps
>> which are implemented in C. I believe this does not use the power of
>> Lisp as much as it could.
>
> But would any alternative use the power of regexps?
Yes, rx.el is a drop-in replacement of regexps. What do you mean?
> Emacs has a (moderately large) cache of regexps, so that building the
> automatons is done very rarely. Possibly just once each for each
> session of Emacs.
That's the whole point: if possible (see below), remove the requirements
for regexp cache management.
>> In high-level languages, automatons are automatically cached to save the
>> cost of building them.
>
> Emacs Lisp does this too.
I did not exclude it :)
>> The rx.el library/concept could alleviate this issue altogether: because
>> we express the automaton directly in Lisp, the parsing step is not
>> needed and thus the building cost could be tremendously reduced.
>
>> So the rx.el building steps
>
>> rx expression -> regexp string -> C regexp automaton
>
>> could boil down to simply
>
>> rx automaton
>
> I don't see what you're trying to save, here. At some stage, the regexp
> source, in whatever form, needs to be converted to an automaton.
Yes, that's what I meant with "rx automaton". My suggestion (not
necessarily for Emacs Lisp) is to remove the step that converts the rx
symbolic automaton to a string, and the conversion from a string to the
actual automaton.
> Are you suggesting here building an interpreter in Lisp directly to
> execute rx expressions?
Yes, but maybe in Guile or some other Lisp. Don't know if it's feasible
in Emacs Lisp.
>> It would be interesting to compare the performance. This also means
>> that there would be no need for caching on behalf of the supporting
>> language.
>
> I will predict that an rx interpreter built in Lisp will be two orders
> of magnitude slower than the current regexp machine, where both the
> construction of an automaton, and the byte-code interpreter which runs
> it are written in C (and probably quite optimised C at that).
Obviously, and this is the prime reason why the author of rx.el
implemented it on top of C regexp. My point was that with a fast Lisp
(or a specifically designed C support), a Lisp automaton would be just
as fast: the Lisp code would directly map the equivalent C automaton.
Again, I have no clue if that's doable in Emacs Lisp.
> I can't get excited about rx syntax, which I'm sure would be just as
> tedious, and possibly more difficult to read than a standard regexp.
Have you used rx? The whole point of the library is to increase
readability, and it does a great job at it in my opinion.
> Analagously, as a musician, I read standard musical notation (with
> sets of five lines and dots) far more easily and fluently than I could
> any "simplified" system designed for beginners, which would be bloated
> by comparison.
rx.el is meant to be "simplified for beginners". You could also reverse
the analogy in saying that regexps are the "simplified version for
beginners"... The analogy does not map very well.
A better analogy would be the mapping between assembly and the
hexadecimal codes of CPU instructions: I don't think many people find
hexedecimal codes more explicit than assembly verbs and symbols
(although most assembly languages abuse abbreviations, but the
intention is there).
> Regular expressions can be difficult. I don't believe this difficulty
> lies, in the main, in the compact notation used to express them. Rather
> it lies in the concepts and the semantics of the regexp elements, and
> being able to express a "mental automaton" in regexp semantics.
The semantic between rx and regexp does not differ. It's purely
syntactical.
Let's consider some points:
- rx can be written over multiple lines and indented. This is a great
readibility booster for groups, which can be _grouped_ together with
linebreaks and indentation.
- rx does not require escaping any character with backslashes. This
is always a great source of confusion when switching from BRE to ERE,
between different interpreters and when storing regexp in Lisp strings
where backslashes must be escaped themselves for instance.
- Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have
a trivial _English_ counterpart in rx: (respectively "word-start",
nothing, "line-start" _and_ "not").
- No more special-case symbols like "-" for ranges or "^" (negation when
first character in square brackets). Thus less cognitive burden.
- The "^" has a double-meaning in regexp: "line-start" and "not".
The list goes on.
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 16:47 ` Pierre Neidhardt
@ 2018-05-25 18:01 ` Eric Abrahamsen
2018-05-25 18:12 ` Pierre Neidhardt
2018-05-27 20:27 ` Stefan Monnier
2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
1 sibling, 2 replies; 54+ messages in thread
From: Eric Abrahamsen @ 2018-05-25 18:01 UTC (permalink / raw)
To: emacs-devel
Pierre Neidhardt <pe.neidhardt@googlemail.com> writes:
> Alan Mackenzie <acm@muc.de> writes:
>
>>> rx.el is one of the best concepts I've discovered in a long time.
>>> It's another instance of "Don't come up with a new (mini)language when
>>> Lisp can do better": it's easier to learn, more flexible, easier to
>>> write, much easier to read and as a consequence much more maintainable.
>>
>> Much easier than what? Than the putative mini-language that doesn't get
>> written?
>
> I meant that in my opinion rx is easier to write than regexps. That it
> is not popular is the root of the question here.
Slightly off-topic: I wouldn't ever use rx unless I was writing a really
brutal regexp, but what I *would* use all day long would be a macro that
un-escaped backslashes for me. Ideally:
(string-match (rx-unescape "turn (left|right)") "turn right") => 0
But even this would be an improvement:
(string-match (rx-unescape "turn \(left\|right\)") "turn right") => 0
I looked in the repos, but didn't see any packages that do this.
As an aside, this thread led me to find the rx pcase matcher, something
I'd daydreamed about before, so I'm already pretty happy :)
Eric
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen
@ 2018-05-25 18:12 ` Pierre Neidhardt
2018-05-25 18:56 ` Eric Abrahamsen
2018-05-27 20:27 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Pierre Neidhardt @ 2018-05-25 18:12 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 186 bytes --]
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Slightly off-topic: I wouldn't ever use rx unless I was writing a really
> brutal regexp,
Why not?
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-25 16:47 ` Pierre Neidhardt
2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen
@ 2018-05-25 18:17 ` Alan Mackenzie
2018-05-25 20:35 ` Peter Neidhardt
2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen
1 sibling, 2 replies; 54+ messages in thread
From: Alan Mackenzie @ 2018-05-25 18:17 UTC (permalink / raw)
To: Pierre Neidhardt; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky
Hello again, Pierre.
On Fri, May 25, 2018 at 18:47:59 +0200, Pierre Neidhardt wrote:
> Alan Mackenzie <acm@muc.de> writes:
> >> rx.el is one of the best concepts I've discovered in a long time.
> >> It's another instance of "Don't come up with a new (mini)language when
> >> Lisp can do better": it's easier to learn, more flexible, easier to
> >> write, much easier to read and as a consequence much more maintainable.
> > Much easier than what? Than the putative mini-language that doesn't get
> > written?
> I meant that in my opinion rx is easier to write than regexps. That it
> is not popular is the root of the question here.
I think it will be easier only for beginners.
> >> I think it's high time we moved away from traditional regexps and
> >> embraced the concept of rx.el. I'm thinking of implementing it for
> >> Guile.
> > There's nothing stopping anybody from using rx.el. However, people have
> > mostly _not_ used it. The "I think it's high time ...." suggests in
> > some way forcing people to use it. Before mandating something like
> > this, I think we should find out why it's not already in common use.
> Sorry if you felt I was forcing, that wasn't my intention. I was
> referring to the long period regexps have been around.
> I thought the reason it's not already in common use had already been
> discussed: it's barely referenced anywhere, it needs more advertising.
> Correct me if this is wrong.
It may be part of the explanation. But more salient, I think, is that
hackers prefer powerful means of expression. A single character in a
string regexp has the power of a sexp in the corresponding rx regexp.
Paul Graham (at http://www.paulgraham.com) has had quite a bit to say
about this in the (distant) past. Conciseness of expression is where
it's at.
> >> At the moment the rx.el implementation is built on top of Emacs regexps
> >> which are implemented in C. I believe this does not use the power of
> >> Lisp as much as it could.
> > But would any alternative use the power of regexps?
> Yes, rx.el is a drop-in replacement of regexps. What do you mean?
I'm not sure, any more. Sorry.
> > Emacs has a (moderately large) cache of regexps, so that building the
> > automatons is done very rarely. Possibly just once each for each
> > session of Emacs.
> That's the whole point: if possible (see below), remove the requirements
> for regexp cache management.
I don't think that would be wise. Manipulating the cache is far faster
than generating the automatons at each use.
[ .... ]
> >> The rx.el library/concept could alleviate this issue altogether: because
> >> we express the automaton directly in Lisp, the parsing step is not
> >> needed and thus the building cost could be tremendously reduced.
> >> So the rx.el building steps
> >> rx expression -> regexp string -> C regexp automaton
> >> could boil down to simply
> >> rx automaton
> > I don't see what you're trying to save, here. At some stage, the regexp
> > source, in whatever form, needs to be converted to an automaton.
> Yes, that's what I meant with "rx automaton". My suggestion (not
> necessarily for Emacs Lisp) is to remove the step that converts the rx
> symbolic automaton to a string, and the conversion from a string to the
> actual automaton.
OK. That would save only a little, at automaton building time, which
likely would happen just once in any Emacs session.
> > Are you suggesting here building an interpreter in Lisp directly to
> > execute rx expressions?
> Yes, but maybe in Guile or some other Lisp. Don't know if it's feasible
> in Emacs Lisp.
> >> It would be interesting to compare the performance. This also means
> >> that there would be no need for caching on behalf of the supporting
> >> language.
> > I will predict that an rx interpreter built in Lisp will be two orders
> > of magnitude slower than the current regexp machine, where both the
> > construction of an automaton, and the byte-code interpreter which runs
> > it are written in C (and probably quite optimised C at that).
> Obviously, and this is the prime reason why the author of rx.el
> implemented it on top of C regexp. My point was that with a fast Lisp
> (or a specifically designed C support), a Lisp automaton would be just
> as fast: the Lisp code would directly map the equivalent C automaton.
> Again, I have no clue if that's doable in Emacs Lisp.
It might be. But it might be a lot of work for little benefit.
> > I can't get excited about rx syntax, which I'm sure would be just as
> > tedious, and possibly more difficult to read than a standard regexp.
> Have you used rx?
No. Neither have I used Cobol (much).
> The whole point of the library is to increase readability, and it does
> a great job at it in my opinion.
You seem to want to increase the readability for beginners, for people
who have laboriously to slog through an expression trying to make sense
of each bit of it. I don't think experienced regexp users have
difficulty with the syntax. I don't, for one.
There was a time when people thought that
ADD 1 TO A GIVING B
was more readable than
b = a + 1;
, and generations of programmers suffered as a result.
> > Analagously, as a musician, I read standard musical notation (with
> > sets of five lines and dots) far more easily and fluently than I could
> > any "simplified" system designed for beginners, which would be bloated
> > by comparison.
> rx.el is meant to be "simplified for beginners". You could also reverse
> the analogy in saying that regexps are the "simplified version for
> beginners"... The analogy does not map very well.
> A better analogy would be the mapping between assembly and the
> hexadecimal codes of CPU instructions: I don't think many people find
> hexedecimal codes more explicit than assembly verbs and symbols
> (although most assembly languages abuse abbreviations, but the
> intention is there).
Hexadecimal CPU codes aren't and aren't intended to be human-readable.
String regular expressions are.
> > Regular expressions can be difficult. I don't believe this difficulty
> > lies, in the main, in the compact notation used to express them. Rather
> > it lies in the concepts and the semantics of the regexp elements, and
> > being able to express a "mental automaton" in regexp semantics.
> The semantic between rx and regexp does not differ. It's purely
> syntactical.
Yes.
> Let's consider some points:
> - rx can be written over multiple lines and indented. This is a great
> readibility booster for groups, which can be _grouped_ together with
> linebreaks and indentation.
rx MUST be written over several lines and indented. A string regexp, by
contrast, usually fits onto a single line.
> - rx does not require escaping any character with backslashes. This
> is always a great source of confusion when switching from BRE to ERE,
> between different interpreters and when storing regexp in Lisp strings
> where backslashes must be escaped themselves for instance.
It is an inconvenience, yes, but I think you're exaggerating its
importance somewhat. In rx, literal characters have to be "escaped" by
string quotes. This might be an irritation.
> - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have
> a trivial _English_ counterpart in rx: (respectively "word-start",
> nothing, "line-start" _and_ "not").
The "English" counterpart used in rx is bulky and difficult to learn.
Somehow, you've got to learn that it's "word-start" and not
"word-beginning", that it's "not" and not "non", and so on. This is more
difficult than just learning \< and ^. If your native language isn't
English, it might be much more difficult.
> - No more special-case symbols like "-" for ranges or "^" (negation when
> first character in square brackets). Thus less cognitive burden.
That remains in dispute.
> - The "^" has a double-meaning in regexp: "line-start" and "not".
Yes, it is context dependent. I don't think this causes confusion in
practice.
> The list goes on.
Well, so far, on this list, two or three people have said they "like"
rx.el. Nobody has said "I'm going to be using rx.el in my programs from
now on". I don't think they will.
We'll see.
> --
> Pierre Neidhardt
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 18:12 ` Pierre Neidhardt
@ 2018-05-25 18:56 ` Eric Abrahamsen
2018-05-25 21:42 ` Clément Pit-Claudel
0 siblings, 1 reply; 54+ messages in thread
From: Eric Abrahamsen @ 2018-05-25 18:56 UTC (permalink / raw)
To: emacs-devel
Pierre Neidhardt <ambrevar@gmail.com> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Slightly off-topic: I wouldn't ever use rx unless I was writing a really
>> brutal regexp,
>
> Why not?
I think it's just what Alan is saying: if you're familiar with regexp,
it's easier to write them directly. Each of us has a different level of
familiarity, thus each of us has a different point at which we might
break off and use rx. A regexp that would make my eyes cross (with
"\\(\\(\\(") could be perfectly comprehensible to someone else.
Personally, I'd use rx with more than two nested layers of
group-capture, but that's only me.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic)
2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
@ 2018-05-25 20:35 ` Peter Neidhardt
2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen
1 sibling, 0 replies; 54+ messages in thread
From: Peter Neidhardt @ 2018-05-25 20:35 UTC (permalink / raw)
To: Alan Mackenzie; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky
[-- Attachment #1: Type: text/plain, Size: 4868 bytes --]
Alan Mackenzie <acm@muc.de> writes:
> It may be part of the explanation. But more salient, I think, is that
> hackers prefer powerful means of expression. A single character in a
> string regexp has the power of a sexp in the corresponding rx regexp.
> Paul Graham (at http://www.paulgraham.com) has had quite a bit to say
> about this in the (distant) past. Conciseness of expression is where
> it's at.
I think you are referring to this article:
http://paulgraham.com/ineq.html
> Another easy test is the number of characters in a program, but this
> is not very good either; some languages (Perl, for example) just use
> shorter identifiers than others.
>
> I think a better measure of the size of a program would be the number
> of elements, where an element is anything that would be a distinct
> node if you drew a tree representing the source code. The name of a
> variable or function is an element; an integer or a floating-point
> number is an element; a segment of literal text is an element; an
> element of a pattern, or a format directive, is an element; a new
> block is an element. There are borderline cases (is -5 two elements or
> one?) but I think most of them are the same for every language, so
> they don't affect comparisons much.
With this definition, rx and regexp have the same length (except for
`eval'). "Conciseness in characters" is not what Paul Graham was
referring to.
Others might think differently, for instance those who prefer Perl to
Lisp.
In the end this is all what it boils down to: the "Unix" hacker culture
vs. the Lisp one. The Unix tradition has long spread the use of
acronyms and and shortcuts. Lisp on the other hand (espcecially Scheme)
put a lot of emphasis on explicit full names.
My opinion is that acronyms and shortcuts were mostly useful in the
era of teletypes and limited terminals and shells. Now we have
completion and fuzzy-search, for which explicit full names not only make
sense but are necessary.
(It's much more intuitive to search for "string compare" in Emacs
Lisp than "str cmp" in C.)
In the end, rx vs. regexp reflects the same mindset difference.
>> Have you used rx?
>
> No. Neither have I used Cobol (much).
Cobol is not very relevant, let's focus on the discussion here. Try
using rx on some midly complex regular expressions, it could be
insightful for this discussion.
> You seem to want to increase the readability for beginners, for people
> who have laboriously to slog through an expression trying to make sense
> of each bit of it. I don't think experienced regexp users have
> difficulty with the syntax. I don't, for one.
>
> There was a time when people thought that
>
> ADD 1 TO A GIVING B
>
> was more readable than
>
> b = a + 1;
This is not what rx is about though. Your example does not show any
change in structure. rx does.
> Hexadecimal CPU codes aren't and aren't intended to be human-readable.
> String regular expressions are.
Well, "readable" is not black and white. If we can have "more readable",
then even better.
> rx MUST be written over several lines and indented. A string regexp, by
> contrast, usually fits onto a single line.
No, it does not have to be written over several lines. I don't know
where you got that from.
That said, is "fitting onto a single line" necessarily good?
>> - rx does not require escaping any character with backslashes. This
>> is always a great source of confusion when switching from BRE to ERE,
>> between different interpreters and when storing regexp in Lisp strings
>> where backslashes must be escaped themselves for instance.
>
>> - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have
>> a trivial _English_ counterpart in rx: (respectively "word-start",
>> nothing, "line-start" _and_ "not").
>
> The "English" counterpart used in rx is bulky and difficult to learn.
> Somehow, you've got to learn that it's "word-start" and not
> "word-beginning",
Could argue the same about "*" vs. "%". But words that have a meaning
in a natural language are easier to remember than arbitrary symbols.
> that it's "not" and not "non", and so on. This is more
> difficult than just learning \< and ^. If your native language isn't
> English, it might be much more difficult.
All programmers learn some basic English, say, "if then else". I don't
think that symbolic languages are easier to learn than natural languages
for human beings.
> Well, so far, on this list, two or three people have said they "like"
> rx.el. Nobody has said "I'm going to be using rx.el in my programs from
> now on".
Which is precisely why we are talking about it. To let people know,
pique their curiosity, let them try and report feedback.
"Not famous" does not equal bad quality. That's why we need to
communicate to give good products a better chance.
--
Peter Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
2018-05-25 20:35 ` Peter Neidhardt
@ 2018-05-25 21:01 ` Michael Heerdegen
2018-05-25 23:32 ` Peter Neidhardt
1 sibling, 1 reply; 54+ messages in thread
From: Michael Heerdegen @ 2018-05-25 21:01 UTC (permalink / raw)
To: Alan Mackenzie
Cc: rms, Noam Postavsky, emacs-devel, Pierre Neidhardt, van, eliz
Alan Mackenzie <acm@muc.de> writes:
> A string regexp, by contrast, usually fits onto a single line.
But regexps are tree-like structures. That's why rx, which uses sexps
(i.e. trees), is the easier to read representation for complicated
regexps than a one-dimensional string. Unless you have the ability to
form a representation in your head.
> The "English" counterpart used in rx is bulky and difficult to learn.
> Somehow, you've got to learn that it's "word-start" and not
> "word-beginning", that it's "not" and not "non", and so on.
That's IMHO the main reason why people avoid using rx. I wonder if that
aspect of rx could be improved (why not just use $ as synonym for bol
etc.)?
> This is more difficult than just learning \< and ^. If your native
> language isn't English, it might be much more difficult.
But also because you read the former more often.
OTOH btw, I find the documentation for rx more condensed than that of
the syntax of regexps.
In summary, I think both representations have their justification of
existence.
Michael.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 18:56 ` Eric Abrahamsen
@ 2018-05-25 21:42 ` Clément Pit-Claudel
2018-05-25 21:51 ` Eric Abrahamsen
0 siblings, 1 reply; 54+ messages in thread
From: Clément Pit-Claudel @ 2018-05-25 21:42 UTC (permalink / raw)
To: emacs-devel
On 2018-05-25 14:56, Eric Abrahamsen wrote:
> A regexp that would make my eyes cross (with
> "\\(\\(\\(") could be perfectly comprehensible to someone else.
> Personally, I'd use rx with more than two nested layers of
> group-capture, but that's only me.
Shameless advertisement: https://github.com/cpitclaudel/easy-escape
It's like prettify-symbols-mode, for regexps; it shows grouping parentheses in a different color, and hides the backslashes.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 21:42 ` Clément Pit-Claudel
@ 2018-05-25 21:51 ` Eric Abrahamsen
2018-05-25 22:27 ` Michael Heerdegen
0 siblings, 1 reply; 54+ messages in thread
From: Eric Abrahamsen @ 2018-05-25 21:51 UTC (permalink / raw)
To: emacs-devel
Clément Pit-Claudel <cpitclaudel@gmail.com> writes:
> On 2018-05-25 14:56, Eric Abrahamsen wrote:
>> A regexp that would make my eyes cross (with
>> "\\(\\(\\(") could be perfectly comprehensible to someone else.
>> Personally, I'd use rx with more than two nested layers of
>> group-capture, but that's only me.
>
> Shameless advertisement: https://github.com/cpitclaudel/easy-escape
> It's like prettify-symbols-mode, for regexps; it shows grouping
> parentheses in a different color, and hides the backslashes.
I know, a few hours ago I went hunting in the repos for relevant
packages, and found yours. It certainly helps! Though I'd still rather
have something that actually transforms the regexps at compile time...
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 21:51 ` Eric Abrahamsen
@ 2018-05-25 22:27 ` Michael Heerdegen
2018-05-25 22:44 ` Eric Abrahamsen
0 siblings, 1 reply; 54+ messages in thread
From: Michael Heerdegen @ 2018-05-25 22:27 UTC (permalink / raw)
To: Eric Abrahamsen; +Cc: emacs-devel
Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> I know, a few hours ago I went hunting in the repos for relevant
> packages, and found yours. It certainly helps! Though I'd still rather
> have something that actually transforms the regexps at compile time...
I don't understand what you mean. rx transforms regexps at compile
time.
Michael.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 22:27 ` Michael Heerdegen
@ 2018-05-25 22:44 ` Eric Abrahamsen
0 siblings, 0 replies; 54+ messages in thread
From: Eric Abrahamsen @ 2018-05-25 22:44 UTC (permalink / raw)
To: emacs-devel
Michael Heerdegen <michael_heerdegen@web.de> writes:
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> I know, a few hours ago I went hunting in the repos for relevant
>> packages, and found yours. It certainly helps! Though I'd still rather
>> have something that actually transforms the regexps at compile time...
>
> I don't understand what you mean. rx transforms regexps at compile
> time.
I mean, a macro that lets me write an unescaped regexp that gets
compiled to an escaped regexp. So:
(r "turn (left|right)") compiles to "turn \\(left\\|right\\)"
Basically like python's string literals.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen
@ 2018-05-25 23:32 ` Peter Neidhardt
0 siblings, 0 replies; 54+ messages in thread
From: Peter Neidhardt @ 2018-05-25 23:32 UTC (permalink / raw)
To: Michael Heerdegen
Cc: rms, Noam Postavsky, emacs-devel, van, Alan Mackenzie, eliz
[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]
Michael Heerdegen <michael_heerdegen@web.de> writes:
>> A string regexp, by contrast, usually fits onto a single line.
>
> But regexps are tree-like structures. That's why rx, which uses sexps
> (i.e. trees), is the easier to read representation for complicated
> regexps than a one-dimensional string. Unless you have the ability to
> form a representation in your head.
I did not think of this at first but I think it's an excellent,
fundamental point.
>> The "English" counterpart used in rx is bulky and difficult to learn.
>> Somehow, you've got to learn that it's "word-start" and not
>> "word-beginning", that it's "not" and not "non", and so on.
>
> That's IMHO the main reason why people avoid using rx. I wonder if that
> aspect of rx could be improved (why not just use $ as synonym for bol
> etc.)?
I guess you meant 'eol' ;)
rx supports synonyms and I think in general it's not a good idea.
That said, I really like that it uses meaningful words. So instead of
‘line-end’, ‘eol’
I'd leave it to only
‘line-end’
--
Peter Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 15:51 ` Alan Mackenzie
2018-05-25 16:47 ` Pierre Neidhardt
@ 2018-05-27 16:56 ` Tom Tromey
2018-05-27 20:16 ` Alan Mackenzie
2018-05-27 20:23 ` Stefan Monnier
2 siblings, 1 reply; 54+ messages in thread
From: Tom Tromey @ 2018-05-27 16:56 UTC (permalink / raw)
To: Alan Mackenzie
Cc: rms, Pierre Neidhardt, Noam Postavsky, emacs-devel, van, eliz
>>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
>> Building the automaton is costly. In C, we build it once and save the
>> result in a variable so that every regexp match does not rebuild the
>> automaton each time.
Alan> Emacs has a (moderately large) cache of regexps, so that building the
Alan> automatons is done very rarely. Possibly just once each for each
Alan> session of Emacs.
I wonder about both of these statements.
On the one hand, AFAICT the regex cache is 20 items. From search.c:
#define REGEXP_CACHE_SIZE 20
That seems pretty small to me, given how prevalent regexps are in elisp.
On the other hand, in the past when I have tried to profile Emacs, I
haven't seen regexp compilation show up too much. IIRC I did see regexp
matching and the GC. Maybe this just points out the efficacy of the
cache -- maybe 20 items is plenty.
Perhaps the regexp matcher could use some micro-optimizations, like the
token-threading the bytecode interpreter does.
Alan> Are you suggesting here building an interpreter in Lisp directly to
Alan> execute rx expressions?
It's interesting, IMO, to consider compiling rx (or regexps generally)
to lisp bytecode. Perhaps with the JIT, it would boost performance in
some cases. (It may be slower, but it's worthwhile to do the
experiment.)
For other work in this area see Stefan's lex-parse-re package. I think
it includes a regexp matcher in elisp.
Tom
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-27 16:56 ` Tom Tromey
@ 2018-05-27 20:16 ` Alan Mackenzie
0 siblings, 0 replies; 54+ messages in thread
From: Alan Mackenzie @ 2018-05-27 20:16 UTC (permalink / raw)
To: Tom Tromey; +Cc: rms, Pierre Neidhardt, Noam Postavsky, emacs-devel, van, eliz
Hello, Tom.
On Sun, May 27, 2018 at 10:56:36 -0600, Tom Tromey wrote:
> >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes:
> >> Building the automaton is costly. In C, we build it once and save the
> >> result in a variable so that every regexp match does not rebuild the
> >> automaton each time.
> Alan> Emacs has a (moderately large) cache of regexps, so that building the
> Alan> automatons is done very rarely. Possibly just once each for each
> Alan> session of Emacs.
> I wonder about both of these statements.
> On the one hand, AFAICT the regex cache is 20 items. From search.c:
> #define REGEXP_CACHE_SIZE 20
> That seems pretty small to me, given how prevalent regexps are in elisp.
Hmm. I must have misremembered. I thought the cache size was 60, for
some reason. Now that RAM is measured in gigabytes, we could probably
increase that 20 (if there's any need).
> On the other hand, in the past when I have tried to profile Emacs, I
> haven't seen regexp compilation show up too much. IIRC I did see regexp
> matching and the GC. Maybe this just points out the efficacy of the
> cache -- maybe 20 items is plenty.
Maybe. I just don't know.
> Perhaps the regexp matcher could use some micro-optimizations, like the
> token-threading the bytecode interpreter does.
> Alan> Are you suggesting here building an interpreter in Lisp directly to
> Alan> execute rx expressions?
> It's interesting, IMO, to consider compiling rx (or regexps generally)
> to lisp bytecode. Perhaps with the JIT, it would boost performance in
> some cases. (It may be slower, but it's worthwhile to do the
> experiment.)
> For other work in this area see Stefan's lex-parse-re package. I think
> it includes a regexp matcher in elisp.
I'll need to have a look at that.
> Tom
--
Alan Mackenzie (Nuremberg, Germany).
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 8:52 ` Pierre Neidhardt
2018-05-25 15:51 ` Alan Mackenzie
@ 2018-05-27 20:16 ` Stefan Monnier
2018-05-28 16:36 ` Pierre Neidhardt
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-05-27 20:16 UTC (permalink / raw)
To: emacs-devel
> rx.el is one of the best concepts I've discovered in a long time.
> It's another instance of "Don't come up with a new (mini)language when
> Lisp can do better": it's easier to learn, more flexible, easier to
> write, much easier to read and as a consequence much more maintainable.
FWIW, I find it's cumbersome in RX to define regexps piecewise.
E.g. with strings I can do things like:
(let* ((word-re "\\(?:\\sw\\|s_\\)+")
(spc-re "[ \t\n]*")
(re1 (concat spc-re "\\(" word-re "\\)" spc-re))
(re2 (concat spc-re "\\(" word-re "\\)(" word-re))))
but do the same with RX you need something like:
(let* ((word-re (rx ...))
(spc-re (rx ...))
(re1 (rx-to-string `(... ,spc-re ... ,word-re ...)))
(re2 (rx-to-string `(... ,spc-re ... ,word-re ...))))
I think `rx` would benefit from allowing to refer to variables.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 15:51 ` Alan Mackenzie
2018-05-25 16:47 ` Pierre Neidhardt
2018-05-27 16:56 ` Tom Tromey
@ 2018-05-27 20:23 ` Stefan Monnier
2 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-05-27 20:23 UTC (permalink / raw)
To: emacs-devel
>> It would be interesting to compare the performance. This also means
>> that there would be no need for caching on behalf of the supporting
>> language.
>
> I will predict that an rx interpreter built in Lisp will be two orders
> of magnitude slower than the current regexp machine, where both the
> construction of an automaton, and the byte-code interpreter which runs
> it are written in C (and probably quite optimised C at that).
The lex.el package in GNU ELPA has a matcher written in Elisp.
Its performance is actually pretty good compared to Emacs's builtin
regexp engine. But that's because lex.el builds a DFA, so the slow
evaluation of Elisp is compensated by a more efficient algorithm.
And of course, building the DFA takes a lot more time than the
regexp-compilation of regexp.c (both because of the language used and
the algorithm).
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen
2018-05-25 18:12 ` Pierre Neidhardt
@ 2018-05-27 20:27 ` Stefan Monnier
2018-05-28 16:37 ` Pierre Neidhardt
2018-06-02 19:33 ` Eric Abrahamsen
1 sibling, 2 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-05-27 20:27 UTC (permalink / raw)
To: emacs-devel
> brutal regexp, but what I *would* use all day long would be a macro that
> un-escaped backslashes for me. Ideally:
That'd be a good first step.
A second important step would be to easily embed comments and Elisp code
(mostly references to other Elisp variables).
Stfean
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-27 20:16 ` Stefan Monnier
@ 2018-05-28 16:36 ` Pierre Neidhardt
2018-05-28 17:04 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Pierre Neidhardt @ 2018-05-28 16:36 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 805 bytes --]
Stefan Monnier <monnier@iro.umontreal.ca> writes:
> FWIW, I find it's cumbersome in RX to define regexps piecewise.
> E.g. with strings I can do things like:
>
> (let* ((word-re "\\(?:\\sw\\|s_\\)+")
> (spc-re "[ \t\n]*")
> (re1 (concat spc-re "\\(" word-re "\\)" spc-re))
> (re2 (concat spc-re "\\(" word-re "\\)(" word-re))))
>
> but do the same with RX you need something like:
>
> (let* ((word-re (rx ...))
> (spc-re (rx ...))
> (re1 (rx-to-string `(... ,spc-re ... ,word-re ...)))
> (re2 (rx-to-string `(... ,spc-re ... ,word-re ...))))
>
> I think `rx` would benefit from allowing to refer to variables.
Not sure what you are trying to do, but doesn't it work with
(rx (... (eval VARIABLE) ...))
?
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-27 20:27 ` Stefan Monnier
@ 2018-05-28 16:37 ` Pierre Neidhardt
2018-05-28 17:15 ` Stefan Monnier
2018-06-02 19:33 ` Eric Abrahamsen
1 sibling, 1 reply; 54+ messages in thread
From: Pierre Neidhardt @ 2018-05-28 16:37 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 421 bytes --]
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> brutal regexp, but what I *would* use all day long would be a macro that
>> un-escaped backslashes for me. Ideally:
>
> That'd be a good first step.
> A second important step would be to easily embed comments and Elisp code
> (mostly references to other Elisp variables).
rx.el can do all that (with "eval") if I'm not mistaken.
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-28 16:36 ` Pierre Neidhardt
@ 2018-05-28 17:04 ` Stefan Monnier
0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-05-28 17:04 UTC (permalink / raw)
To: Pierre Neidhardt; +Cc: emacs-devel
> Not sure what you are trying to do, but doesn't it work with
>
> (rx (... (eval VARIABLE) ...))
This evaluates VARIABLE during the macro-expansion, so it only works if
VARIABLE exists during the macroexpansion. So it works if you can bind
your variables with the new `let-when-compile`, but it won't work with
lexically-scoped vars.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-28 16:37 ` Pierre Neidhardt
@ 2018-05-28 17:15 ` Stefan Monnier
2018-05-29 3:10 ` Richard Stallman
2018-05-29 8:27 ` Philipp Stephani
0 siblings, 2 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-05-28 17:15 UTC (permalink / raw)
To: Pierre Neidhardt; +Cc: emacs-devel
>>> brutal regexp, but what I *would* use all day long would be a macro that
>>> un-escaped backslashes for me. Ideally:
>> That'd be a good first step.
>> A second important step would be to easily embed comments and Elisp code
>> (mostly references to other Elisp variables).
> rx.el can do all that (with "eval") if I'm not mistaken.
The main problem of RX is not lack of features, but verbosity which for
me makes it disappointingly difficult to read (not always worse than
string regexps, admittedly, but still).
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-28 17:15 ` Stefan Monnier
@ 2018-05-29 3:10 ` Richard Stallman
2018-05-29 7:28 ` Robert Pluim
2018-05-29 8:27 ` Philipp Stephani
1 sibling, 1 reply; 54+ messages in thread
From: Richard Stallman @ 2018-05-29 3:10 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel, ambrevar
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> The main problem of RX is not lack of features, but verbosity which for
> me makes it disappointingly difficult to read (not always worse than
> string regexps, admittedly, but still).
Can someone design a more brief format
that is nonetheless more elegant and readable than rx format?
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-29 3:10 ` Richard Stallman
@ 2018-05-29 7:28 ` Robert Pluim
0 siblings, 0 replies; 54+ messages in thread
From: Robert Pluim @ 2018-05-29 7:28 UTC (permalink / raw)
To: Richard Stallman; +Cc: ambrevar, Stefan Monnier, emacs-devel
Richard Stallman <rms@gnu.org> writes:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > The main problem of RX is not lack of features, but verbosity which for
> > me makes it disappointingly difficult to read (not always worse than
> > string regexps, admittedly, but still).
>
> Can someone design a more brief format
> that is nonetheless more elegant and readable than rx format?
I thought that we werenʼt going to design yet-another-format. I find
rx quite readable. Itʼs a little verbose, but the advantage of not
having to deal with massive numbers of backslashes outweighs that for
me.
Robert
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-28 17:15 ` Stefan Monnier
2018-05-29 3:10 ` Richard Stallman
@ 2018-05-29 8:27 ` Philipp Stephani
2018-05-30 3:24 ` Richard Stallman
1 sibling, 1 reply; 54+ messages in thread
From: Philipp Stephani @ 2018-05-29 8:27 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel, Pierre Neidhardt
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]
Stefan Monnier <monnier@iro.umontreal.ca> schrieb am Mo., 28. Mai 2018 um
19:16 Uhr:
> >>> brutal regexp, but what I *would* use all day long would be a macro
> that
> >>> un-escaped backslashes for me. Ideally:
> >> That'd be a good first step.
> >> A second important step would be to easily embed comments and Elisp code
> >> (mostly references to other Elisp variables).
> > rx.el can do all that (with "eval") if I'm not mistaken.
>
> The main problem of RX is not lack of features, but verbosity which for
> me makes it disappointingly difficult to read (not always worse than
> string regexps, admittedly, but still).
>
>
FWIW, I think its verbosity is RX's main *advantage*. It makes regular
expressions so much easier to read that I stopped writing regex strings the
moment I discovered RX.
[-- Attachment #2: Type: text/html, Size: 1173 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-29 8:27 ` Philipp Stephani
@ 2018-05-30 3:24 ` Richard Stallman
2018-05-30 7:25 ` Robert Pluim
0 siblings, 1 reply; 54+ messages in thread
From: Richard Stallman @ 2018-05-30 3:24 UTC (permalink / raw)
To: Philipp Stephani; +Cc: ambrevar, monnier, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> FWIW, I think its verbosity is RX's main *advantage*. It makes regular
> expressions so much easier to read that I stopped writing regex strings the
> moment I discovered RX.
The clearer representation of structure is not the same thing as
verbosity. rx does both, but they are not the same thing. We could
envision making the structure more or less equally clear without
making the patterns so long.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-30 3:24 ` Richard Stallman
@ 2018-05-30 7:25 ` Robert Pluim
2018-05-31 3:53 ` Richard Stallman
` (2 more replies)
0 siblings, 3 replies; 54+ messages in thread
From: Robert Pluim @ 2018-05-30 7:25 UTC (permalink / raw)
To: Richard Stallman; +Cc: Philipp Stephani, emacs-devel, monnier, ambrevar
Richard Stallman <rms@gnu.org> writes:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > FWIW, I think its verbosity is RX's main *advantage*. It makes regular
> > expressions so much easier to read that I stopped writing regex strings the
> > moment I discovered RX.
>
> The clearer representation of structure is not the same thing as
> verbosity. rx does both, but they are not the same thing. We could
> envision making the structure more or less equally clear without
> making the patterns so long.
Itʼs not clear to me how you'd do that. Looking at rx-constituents,
quite a few of the verbose ways of specifying what to match already
have a succinct version, eg
sequence => and
zero-or-more => *
and frankly being able to write 'bos' rather than remembering '\\`' or
'symbol-start' rather than '\\_<' is a net win in my eyes.
Robert
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-30 7:25 ` Robert Pluim
@ 2018-05-31 3:53 ` Richard Stallman
2018-05-31 8:57 ` Robert Pluim
2018-05-31 4:13 ` Clément Pit-Claudel
2018-05-31 14:19 ` Stefan Monnier
2 siblings, 1 reply; 54+ messages in thread
From: Richard Stallman @ 2018-05-31 3:53 UTC (permalink / raw)
To: Robert Pluim; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > The clearer representation of structure is not the same thing as
> > verbosity. rx does both, but they are not the same thing. We could
> > envision making the structure more or less equally clear without
> > making the patterns so long.
> It's not clear to me how you'd do that.
I don't see a specific way either, but someone might come up with a way.
I'm suggesting this as a topic of investigation.
> and frankly being able to write 'bos' rather than remembering '\\`' or
> 'symbol-start' rather than '\\_<' is a net win in my eyes.
I agree, as regards those. On the other hand, those strings might not
be the best. Maybe 'text<' and 'sym<' would be better. We could
have a series of keywords, XYZ< and XYZ>, which would be as systematic
as now or more so, and shorter too.
--
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-30 7:25 ` Robert Pluim
2018-05-31 3:53 ` Richard Stallman
@ 2018-05-31 4:13 ` Clément Pit-Claudel
2018-05-31 14:19 ` Stefan Monnier
2 siblings, 0 replies; 54+ messages in thread
From: Clément Pit-Claudel @ 2018-05-31 4:13 UTC (permalink / raw)
To: emacs-devel
On 2018-05-30 03:25, Robert Pluim wrote:
> and frankly being able to write 'bos' rather than remembering '\\`'
Fun fact: \` and \' match the usual symbols used to delimit quoted terms in docstrings, `like-this'.
I find that the correspondence makes it very easy to remember the meaning of \` and \'.
Clément.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 3:53 ` Richard Stallman
@ 2018-05-31 8:57 ` Robert Pluim
0 siblings, 0 replies; 54+ messages in thread
From: Robert Pluim @ 2018-05-31 8:57 UTC (permalink / raw)
To: Richard Stallman; +Cc: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > > The clearer representation of structure is not the same thing as
> > > verbosity. rx does both, but they are not the same thing. We could
> > > envision making the structure more or less equally clear without
> > > making the patterns so long.
>
> > It's not clear to me how you'd do that.
>
> I don't see a specific way either, but someone might come up with a way.
> I'm suggesting this as a topic of investigation.
>
> > and frankly being able to write 'bos' rather than remembering '\\`' or
> > 'symbol-start' rather than '\\_<' is a net win in my eyes.
>
> I agree, as regards those. On the other hand, those strings might not
> be the best. Maybe 'text<' and 'sym<' would be better. We could
> have a series of keywords, XYZ< and XYZ>, which would be as systematic
> as now or more so, and shorter too.
What we have now is [be]o[lstw], which covers lines, strings, and
words. The only thing missing is symbols, which is easily fixed like
so [1]:
diff --git i/lisp/emacs-lisp/rx.el w/lisp/emacs-lisp/rx.el
index 8059bf2a6e..833321cd7b 100644
--- i/lisp/emacs-lisp/rx.el
+++ w/lisp/emacs-lisp/rx.el
@@ -170,7 +170,9 @@ rx-constituents
(word-boundary . "\\b")
(not-word-boundary . "\\B") ; sregex
(symbol-start . "\\_<")
+ (boS . "\\_<")
(symbol-end . "\\_>")
+ (eoS . "\\_>")
(syntax . (rx-syntax 1 1))
(not-syntax . (rx-not-syntax 1 1)) ; sregex
(category . (rx-category 1 1 rx-check-category))
Footnotes:
[1] Iʼm only half joking
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-30 7:25 ` Robert Pluim
2018-05-31 3:53 ` Richard Stallman
2018-05-31 4:13 ` Clément Pit-Claudel
@ 2018-05-31 14:19 ` Stefan Monnier
2018-05-31 15:43 ` Drew Adams
2 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-05-31 14:19 UTC (permalink / raw)
To: emacs-devel
> Itʼs not clear to me how you'd do that. Looking at rx-constituents,
> quite a few of the verbose ways of specifying what to match already
> have a succinct version, eg
>
> sequence => and
> zero-or-more => *
The verbosity for me is not so much in the identifier as in the "( ID
SPC ) SPC" and the need for quotation marks to surround actual
characters. So for example the string's single-char * turns into
a 5-char * in RX.
I really like the regularity, extensibility, and clear structure of RX,
but in practice it makes the regexps too long: short regexps are
simple enough that RX's advantages don't get a chance to shine, and more
complex regexps are made to spread too many lines for comfort.
That doesn't mean I don't like RX, by the way. Just that I expected I'd
really love it, and in the end I never use it because I never find it to
be significantly better (I do think it's significantly better when you
need to manipulate it programmatically, of course, which is why lex.el
takes an RX syntax as input).
Stefan
PS: By the way, we should deprecate the `and` shorthand for `sequence`,
because `and` in regexps could also mean "conjunction" (that's what
it means in lex.el).
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax
2018-05-31 14:19 ` Stefan Monnier
@ 2018-05-31 15:43 ` Drew Adams
2018-05-31 16:12 ` João Távora
0 siblings, 1 reply; 54+ messages in thread
From: Drew Adams @ 2018-05-31 15:43 UTC (permalink / raw)
To: Stefan Monnier, emacs-devel
> The verbosity for me is not so much in the identifier as in the "( ID
> SPC ) SPC" and the need for quotation marks to surround actual
> characters. So for example the string's single-char * turns into
> a 5-char * in RX.
>
> I really like the regularity, extensibility, and clear structure of RX,
> but in practice it makes the regexps too long: short regexps are
> simple enough that RX's advantages don't get a chance to shine, and more
> complex regexps are made to spread too many lines for comfort.
>
> That doesn't mean I don't like RX, by the way. Just that I expected I'd
> really love it, and in the end I never use it because I never find it to
> be significantly better (I do think it's significantly better when you
> need to manipulate it programmatically, of course, which is why lex.el
> takes an RX syntax as input).
This summary applies for me, as well.
Functions that transform a regexp string to an RX sexp and vice
versa would be very helpful.
Given such functions, I might use RX and the show-me-the-regexp
function to create a regexp string, which I'd leave in the code,
and I might use the show-me-the-RX function when I need to change
such a string (or think about it).
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 15:43 ` Drew Adams
@ 2018-05-31 16:12 ` João Távora
2018-05-31 16:18 ` Robert Pluim
0 siblings, 1 reply; 54+ messages in thread
From: João Távora @ 2018-05-31 16:12 UTC (permalink / raw)
To: Drew Adams; +Cc: Stefan Monnier, emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
> Given such functions, I might use RX and the show-me-the-regexp
> function to create a regexp string, which I'd leave in the code,
> and I might use the show-me-the-RX function when I need to change
> such a string (or think about it).
+1, FWIW. But are there such functions? </delurk>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 16:12 ` João Távora
@ 2018-05-31 16:18 ` Robert Pluim
2018-05-31 16:48 ` Basil L. Contovounesios
0 siblings, 1 reply; 54+ messages in thread
From: Robert Pluim @ 2018-05-31 16:18 UTC (permalink / raw)
To: João Távora; +Cc: Stefan Monnier, Drew Adams, emacs-devel
João Távora <joaotavora@gmail.com> writes:
> Drew Adams <drew.adams@oracle.com> writes:
>
>> Given such functions, I might use RX and the show-me-the-regexp
>> function to create a regexp string, which I'd leave in the code,
>> and I might use the show-me-the-RX function when I need to change
>> such a string (or think about it).
>
> +1, FWIW. But are there such functions? </delurk>
rx->regexp obviously exists, but Iʼm not aware of the reverse. Thank
you for volunteering! ;-)
Robert
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 16:18 ` Robert Pluim
@ 2018-05-31 16:48 ` Basil L. Contovounesios
2018-05-31 17:02 ` Basil L. Contovounesios
0 siblings, 1 reply; 54+ messages in thread
From: Basil L. Contovounesios @ 2018-05-31 16:48 UTC (permalink / raw)
To: emacs-devel
Robert Pluim <rpluim@gmail.com> writes:
> João Távora <joaotavora@gmail.com> writes:
>
>> Drew Adams <drew.adams@oracle.com> writes:
>>
>>> Given such functions, I might use RX and the show-me-the-regexp
>>> function to create a regexp string, which I'd leave in the code,
>>> and I might use the show-me-the-RX function when I need to change
>>> such a string (or think about it).
>>
>> +1, FWIW. But are there such functions? </delurk>
>
> rx->regexp obviously exists, but Iʼm not aware of the reverse.
At first glance, lex-parse-re from lex.el seems to fill the role, though
its output might need a little tweaking to be completely rx-compatible.
--
Basil
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 16:48 ` Basil L. Contovounesios
@ 2018-05-31 17:02 ` Basil L. Contovounesios
2018-05-31 18:40 ` João Távora
0 siblings, 1 reply; 54+ messages in thread
From: Basil L. Contovounesios @ 2018-05-31 17:02 UTC (permalink / raw)
To: emacs-devel
"Basil L. Contovounesios" <contovob@tcd.ie> writes:
> Robert Pluim <rpluim@gmail.com> writes:
>
>> João Távora <joaotavora@gmail.com> writes:
>>
>>> Drew Adams <drew.adams@oracle.com> writes:
>>>
>>>> Given such functions, I might use RX and the show-me-the-regexp
>>>> function to create a regexp string, which I'd leave in the code,
>>>> and I might use the show-me-the-RX function when I need to change
>>>> such a string (or think about it).
>>>
>>> +1, FWIW. But are there such functions? </delurk>
>>
>> rx->regexp obviously exists, but Iʼm not aware of the reverse.
>
> At first glance, lex-parse-re from lex.el seems to fill the role, though
^^^^^^
I meant lex-parse-re.el, which is part of the lex package on ELPA.
--
Basil
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-31 17:02 ` Basil L. Contovounesios
@ 2018-05-31 18:40 ` João Távora
0 siblings, 0 replies; 54+ messages in thread
From: João Távora @ 2018-05-31 18:40 UTC (permalink / raw)
To: Basil L. Contovounesios; +Cc: emacs-devel
"Basil L. Contovounesios" <contovob@tcd.ie> writes:
> "Basil L. Contovounesios" <contovob@tcd.ie> writes:
>
>> Robert Pluim <rpluim@gmail.com> writes:
>>
>>> João Távora <joaotavora@gmail.com> writes:
>>>
>>>> Drew Adams <drew.adams@oracle.com> writes:
>>>>
>>>>> Given such functions, I might use RX and the show-me-the-regexp
>>>>> function to create a regexp string, which I'd leave in the code,
>>>>> and I might use the show-me-the-RX function when I need to change
>>>>> such a string (or think about it).
>>>>
>>>> +1, FWIW. But are there such functions? </delurk>
>>>
>>> rx->regexp obviously exists, but Iʼm not aware of the reverse.
>>
>> At first glance, lex-parse-re from lex.el seems to fill the role, though
> ^^^^^^
> I meant lex-parse-re.el, which is part of the lex package on ELPA.
And, notably, https://github.com/joddie/pcre2el looks pretty good too.
João
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-05-27 20:27 ` Stefan Monnier
2018-05-28 16:37 ` Pierre Neidhardt
@ 2018-06-02 19:33 ` Eric Abrahamsen
2018-06-03 3:49 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Eric Abrahamsen @ 2018-06-02 19:33 UTC (permalink / raw)
To: emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> brutal regexp, but what I *would* use all day long would be a macro that
>> un-escaped backslashes for me. Ideally:
>
> That'd be a good first step.
> A second important step would be to easily embed comments and Elisp code
> (mostly references to other Elisp variables).
I played around with this, but it might not be possible to achieve
enough of the conveniences to make it worthwhile. I wanted to partially
reverse the sense of the backslash -- "(" would be translated to "\\("
-- but also to write backslash specials with a single backslash: ie the
first group reference could actually be written as "\1", rather than
"\\1". Of course, the string reader interprets that as a control
character, and I doubt there's any way around that, at least not in
lisp.
It's still nice to be able to write "\\(cat\\|dog\\)" as "(cat|dog)",
but I'm not sure it's worth it.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-02 19:33 ` Eric Abrahamsen
@ 2018-06-03 3:49 ` Stefan Monnier
2018-06-03 4:59 ` Eric Abrahamsen
2018-06-03 14:51 ` Helmut Eller
0 siblings, 2 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-06-03 3:49 UTC (permalink / raw)
To: emacs-devel
> I played around with this, but it might not be possible to achieve
> enough of the conveniences to make it worthwhile.
I think we'd need to extend the reader to provide a specialized syntax,
which in turns means changing elisp-mode, etc...
I think it's a fairly large amount of work, for fairly little benefit in
the end compared to what we can get today.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 3:49 ` Stefan Monnier
@ 2018-06-03 4:59 ` Eric Abrahamsen
2018-06-03 14:51 ` Helmut Eller
1 sibling, 0 replies; 54+ messages in thread
From: Eric Abrahamsen @ 2018-06-03 4:59 UTC (permalink / raw)
To: emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> I played around with this, but it might not be possible to achieve
>> enough of the conveniences to make it worthwhile.
>
> I think we'd need to extend the reader to provide a specialized syntax,
> which in turns means changing elisp-mode, etc...
> I think it's a fairly large amount of work, for fairly little benefit in
> the end compared to what we can get today.
Yup. Oh well, it was a beautiful dream.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 3:49 ` Stefan Monnier
2018-06-03 4:59 ` Eric Abrahamsen
@ 2018-06-03 14:51 ` Helmut Eller
2018-06-03 15:15 ` Eric Abrahamsen
1 sibling, 1 reply; 54+ messages in thread
From: Helmut Eller @ 2018-06-03 14:51 UTC (permalink / raw)
To: emacs-devel
On Sat, Jun 02 2018, Stefan Monnier wrote:
>> I played around with this, but it might not be possible to achieve
>> enough of the conveniences to make it worthwhile.
>
> I think we'd need to extend the reader to provide a specialized syntax,
> which in turns means changing elisp-mode, etc...
> I think it's a fairly large amount of work, for fairly little benefit in
> the end compared to what we can get today.
Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which
could translate Perl/Python regexp syntax to Emacs regexp syntax.
Helmut
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 14:51 ` Helmut Eller
@ 2018-06-03 15:15 ` Eric Abrahamsen
2018-06-03 15:53 ` Helmut Eller
0 siblings, 1 reply; 54+ messages in thread
From: Eric Abrahamsen @ 2018-06-03 15:15 UTC (permalink / raw)
To: emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
> On Sat, Jun 02 2018, Stefan Monnier wrote:
>
>>> I played around with this, but it might not be possible to achieve
>>> enough of the conveniences to make it worthwhile.
>>
>> I think we'd need to extend the reader to provide a specialized syntax,
>> which in turns means changing elisp-mode, etc...
>> I think it's a fairly large amount of work, for fairly little benefit in
>> the end compared to what we can get today.
>
> Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which
> could translate Perl/Python regexp syntax to Emacs regexp syntax.
I made something like that, but the only advantage is your example
above: open and close parentheses, and the vertical bar. Everything else
still has to be double-backslashed. It feels inconsistent, and isn't
that much of a benefit...
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 15:15 ` Eric Abrahamsen
@ 2018-06-03 15:53 ` Helmut Eller
2018-06-03 16:40 ` Eric Abrahamsen
2018-06-03 19:57 ` Drew Adams
0 siblings, 2 replies; 54+ messages in thread
From: Helmut Eller @ 2018-06-03 15:53 UTC (permalink / raw)
To: emacs-devel
On Sun, Jun 03 2018, Eric Abrahamsen wrote:
>> Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which
>> could translate Perl/Python regexp syntax to Emacs regexp syntax.
>
> I made something like that, but the only advantage is your example
> above: open and close parentheses, and the vertical bar. Everything else
> still has to be double-backslashed. It feels inconsistent, and isn't
> that much of a benefit...
Any benefit on a small example is bound to be small. And yes, in
practice most regexps fit on a single line. i.e. they are small.
I'm not sure what you mean with "everything else" as in this example
there's nothing else. The example would translate to: "\\(a\\|\\b\\)"
which, while it is one character shorter, is also quite ugly.
Maybe you mean that things like \w cannot occur in ordinary strings
without double-backslash. Hmm.. that's indeed a problem.
Helmut
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 15:53 ` Helmut Eller
@ 2018-06-03 16:40 ` Eric Abrahamsen
2018-06-03 19:57 ` Drew Adams
1 sibling, 0 replies; 54+ messages in thread
From: Eric Abrahamsen @ 2018-06-03 16:40 UTC (permalink / raw)
To: emacs-devel
Helmut Eller <eller.helmut@gmail.com> writes:
> On Sun, Jun 03 2018, Eric Abrahamsen wrote:
>
>>> Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which
>>> could translate Perl/Python regexp syntax to Emacs regexp syntax.
>>
>> I made something like that, but the only advantage is your example
>> above: open and close parentheses, and the vertical bar. Everything else
>> still has to be double-backslashed. It feels inconsistent, and isn't
>> that much of a benefit...
>
> Any benefit on a small example is bound to be small. And yes, in
> practice most regexps fit on a single line. i.e. they are small.
>
> I'm not sure what you mean with "everything else" as in this example
> there's nothing else. The example would translate to: "\\(a\\|\\b\\)"
> which, while it is one character shorter, is also quite ugly.
>
> Maybe you mean that things like \w cannot occur in ordinary strings
> without double-backslash. Hmm.. that's indeed a problem.
Yes, that's what I meant: all the other backslash constructions. If we
still have to write "(atl|choo)\\1", it doesn't feel consistent, and I
think doesn't save much mental overhead.
#+BEGIN_SRC elisp
(defmacro pcre (str)
(with-temp-buffer
(insert str)
(goto-char (point-min))
(while (< (point) (point-max))
(cond ((looking-at "\\\\\\([(|)]\\)")
;; Remove double backslashes.
(replace-match "\\1"))
((looking-at "\\([(|)]\\)")
(replace-match "\\\\\\1"))
;; Add parsing of comments and elisp forms here.
(t (forward-char))))
(buffer-string)))
#+END_SRC
I would do:
(defalias 'r 'pcre)
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax
2018-06-03 15:53 ` Helmut Eller
2018-06-03 16:40 ` Eric Abrahamsen
@ 2018-06-03 19:57 ` Drew Adams
2018-06-03 21:15 ` Eric Abrahamsen
2018-06-04 13:56 ` Stefan Monnier
1 sibling, 2 replies; 54+ messages in thread
From: Drew Adams @ 2018-06-03 19:57 UTC (permalink / raw)
To: Helmut Eller, emacs-devel
It's not just about confusion/obscurity due to the "extra"
backslashes (for `(', `)', and `|'). It's also about the
fact that regexps themselves can be complicated. For
example, `directory-listing-before-filename-regexp':
"\\([0-9][BkKMGTPEZY]? \\(\\([0-9][0-9][0-9][0-9]-\\)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9]\\(:[0-6][0-9]\\([.,][0-9]+\\)?\\( ?[-+][0-2][0-9][0-5][0-9]\\)?\\)?\\|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9]\\)\\|.*[0-9][BkKMGTPEZY]? \\(\\(\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[ 0-3][0-9]\\|[ 0-3][0-9]\\.? \\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.?\\) +\\([ 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\)\\|\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]\\|\\([ 0-1]?[0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? [ 0-3][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? +\\|[ 0-3][0-9] [ 0-1]?[0-9] +\\)\\([ 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)?\\)\\)\\) +"
Even after removing "extra" backslashes, it's still a bear:
"([0-9][BkKMGTPEZY]? (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]? ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.? ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?) +([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])? [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9] +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +"
As I said before:
> Functions that transform a regexp string to an RX
> sexp and vice versa would be very helpful.
(And yes, the latter exists.)
Ideally, we'd have the ability to put your cursor on
a regexp in some code and hit a key to:
* see a corresponding `rx' sexp and
* optionally replace the regexp with the `rx' sexp.
Just being able to see the `rx' sexp that corresponds
to a regexp in some code (even temporarily in a popup)
could help, I think.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 19:57 ` Drew Adams
@ 2018-06-03 21:15 ` Eric Abrahamsen
2018-06-03 23:23 ` Drew Adams
2018-06-04 13:56 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Eric Abrahamsen @ 2018-06-03 21:15 UTC (permalink / raw)
To: emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
> It's not just about confusion/obscurity due to the "extra"
> backslashes (for `(', `)', and `|'). It's also about the
> fact that regexps themselves can be complicated. For
> example, `directory-listing-before-filename-regexp':
>
> "\\([0-9][BkKMGTPEZY]?
> \\(\\([0-9][0-9][0-9][0-9]-\\)?[01][0-9]-[0-3][0-9][ T][
> 0-2][0-9][:.][0-5][0-9]\\(:[0-6][0-9]\\([.,][0-9]+\\)?\\(
> ?[-+][0-2][0-9][0-5][0-9]\\)?\\)?\\|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9]\\)\\|.*[0-9][BkKMGTPEZY]?
> \\(\\(\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[
> 0-3][0-9]\\|[ 0-3][0-9]\\.?
> \\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.?\\) +\\([
> 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\)\\|\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.?
> +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]\\|\\([
> 0-1]?[0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? [
> 0-3][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? +\\|[ 0-3][0-9] [ 0-1]?[0-9]
> +\\)\\([
> 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)?\\)\\)\\)
> +"
>
> Even after removing "extra" backslashes, it's still a bear:
>
> "([0-9][BkKMGTPEZY]? (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][
> T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?(
> ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]?
> ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[
> 0-3][0-9]\\.? ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?) +([
> 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?
> +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])?
> [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9] +)([
> 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +"
Sure -- the way I was thinking of it, anyway, you'd use a "pcre" macro
for simpler regexp, and `rx' for more complicated ones. The line between
simple and complex being drawn in a different place for each coder,
obviously. Though I think anyone would benefit from seeing that filename
regexp in `rx'!
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax
2018-06-03 21:15 ` Eric Abrahamsen
@ 2018-06-03 23:23 ` Drew Adams
0 siblings, 0 replies; 54+ messages in thread
From: Drew Adams @ 2018-06-03 23:23 UTC (permalink / raw)
To: Eric Abrahamsen, emacs-devel
> I think anyone would benefit from seeing that filename
> regexp in `rx'!
Which is why a function that returns a corresponding
`rx' sexp, given a regexp string, would be useful.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-03 19:57 ` Drew Adams
2018-06-03 21:15 ` Eric Abrahamsen
@ 2018-06-04 13:56 ` Stefan Monnier
2018-06-04 15:24 ` Drew Adams
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-06-04 13:56 UTC (permalink / raw)
To: emacs-devel
> Even after removing "extra" backslashes, it's still a bear:
>
> "([0-9][BkKMGTPEZY]?
> (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]?
> ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.?
> ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?)
> +([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?
> +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])?
> [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9]
> +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +"
For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has
fairly little importance: if written "raw" as above, it will be
indecipherable in any case.
To make it readable, you need to add human-level explanations
e.g. by adding comments and naming sub-elements. Which is indeed what
is done in the source code:
(defvar directory-listing-before-filename-regexp
(let* ((l "\\([A-Za-z]\\|[^\0-\177]\\)")
(l-or-quote "\\([A-Za-z']\\|[^\0-\177]\\)")
;; In some locales, month abbreviations are as short as 2 letters,
;; and they can be followed by ".".
;; In Breton, a month name can include a quote character.
(month (concat l-or-quote l-or-quote "+\\.?"))
(s " ")
(yyyy "[0-9][0-9][0-9][0-9]")
(dd "[ 0-3][0-9]")
(HH:MM "[ 0-2][0-9][:.][0-5][0-9]")
(seconds "[0-6][0-9]\\([.,][0-9]+\\)?")
(zone "[-+][0-2][0-9][0-5][0-9]")
(iso-mm-dd "[01][0-9]-[0-3][0-9]")
(iso-time (concat HH:MM "\\(:" seconds "\\( ?" zone "\\)?\\)?"))
(iso (concat "\\(\\(" yyyy "-\\)?" iso-mm-dd "[ T]" iso-time
"\\|" yyyy "-" iso-mm-dd "\\)"))
(western (concat "\\(" month s "+" dd "\\|" dd "\\.?" s month "\\)"
s "+"
"\\(" HH:MM "\\|" yyyy "\\)"))
(western-comma (concat month s "+" dd "," s "+" yyyy))
;; Japanese MS-Windows ls-lisp has one-digit months, and
;; omits the Kanji characters after month and day-of-month.
;; On Mac OS X 10.3, the date format in East Asian locales is
;; day-of-month digits followed by month digits.
(mm "[ 0-1]?[0-9]")
(east-asian
(concat "\\(" mm l "?" s dd l "?" s "+"
"\\|" dd s mm s "+" "\\)"
"\\(" HH:MM "\\|" yyyy l "?" "\\)")))
;; The "[0-9]" below requires the previous column to end in a digit.
;; This avoids recognizing `1 may 1997' as a date in the line:
;; -r--r--r-- 1 may 1997 1168 Oct 19 16:49 README
;; The "[BkKMGTPEZY]?" below supports "ls -alh" output.
;; For non-iso date formats, we add the ".*" in order to find
;; the last possible match. This avoids recognizing
;; `jservice 10 1024' as a date in the line:
;; drwxr-xr-x 3 jservice 10 1024 Jul 2 1997 esg-host
;; vc dired listings provide the state or blanks between file
;; permissions and date. The state is always surrounded by
;; parentheses:
;; -rw-r--r-- (modified) 2005-10-22 21:25 files.el
;; This is not supported yet.
(purecopy (concat "\\([0-9][BkKMGTPEZY]? " iso
"\\|.*[0-9][BkKMGTPEZY]? "
"\\(" western "\\|" western-comma "\\|" east-asian "\\)"
"\\) +")))
"Regular expression to match up to the file name in a directory listing.
The default value is designed to recognize dates and times
regardless of the language.")
-- Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax
2018-06-04 13:56 ` Stefan Monnier
@ 2018-06-04 15:24 ` Drew Adams
2018-06-04 15:44 ` Pierre Neidhardt
0 siblings, 1 reply; 54+ messages in thread
From: Drew Adams @ 2018-06-04 15:24 UTC (permalink / raw)
To: Stefan Monnier, emacs-devel
> For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has
> fairly little importance: if written "raw" as above, it will be
> indecipherable in any case.
>
> To make it readable, you need to add human-level explanations
> e.g. by adding comments and naming sub-elements. Which is indeed what
> is done in the source code:...
Agreed.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax
2018-06-04 15:24 ` Drew Adams
@ 2018-06-04 15:44 ` Pierre Neidhardt
0 siblings, 0 replies; 54+ messages in thread
From: Pierre Neidhardt @ 2018-06-04 15:44 UTC (permalink / raw)
To: Drew Adams; +Cc: Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 622 bytes --]
Drew Adams <drew.adams@oracle.com> writes:
>> For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has
>> fairly little importance: if written "raw" as above, it will be
>> indecipherable in any case.
>>
>> To make it readable, you need to add human-level explanations
>> e.g. by adding comments and naming sub-elements. Which is indeed what
>> is done in the source code:...
>
> Agreed.
The rx language seems to have space to add support for variables. Then
rx would really shine for such expressions: it would make it very
natural to explicitly name expressions and re-use them.
--
Pierre Neidhardt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2018-06-04 15:44 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky
2018-05-24 10:58 ` Van L
2018-05-25 2:57 ` Richard Stallman
2018-05-25 8:52 ` Pierre Neidhardt
2018-05-25 15:51 ` Alan Mackenzie
2018-05-25 16:47 ` Pierre Neidhardt
2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen
2018-05-25 18:12 ` Pierre Neidhardt
2018-05-25 18:56 ` Eric Abrahamsen
2018-05-25 21:42 ` Clément Pit-Claudel
2018-05-25 21:51 ` Eric Abrahamsen
2018-05-25 22:27 ` Michael Heerdegen
2018-05-25 22:44 ` Eric Abrahamsen
2018-05-27 20:27 ` Stefan Monnier
2018-05-28 16:37 ` Pierre Neidhardt
2018-05-28 17:15 ` Stefan Monnier
2018-05-29 3:10 ` Richard Stallman
2018-05-29 7:28 ` Robert Pluim
2018-05-29 8:27 ` Philipp Stephani
2018-05-30 3:24 ` Richard Stallman
2018-05-30 7:25 ` Robert Pluim
2018-05-31 3:53 ` Richard Stallman
2018-05-31 8:57 ` Robert Pluim
2018-05-31 4:13 ` Clément Pit-Claudel
2018-05-31 14:19 ` Stefan Monnier
2018-05-31 15:43 ` Drew Adams
2018-05-31 16:12 ` João Távora
2018-05-31 16:18 ` Robert Pluim
2018-05-31 16:48 ` Basil L. Contovounesios
2018-05-31 17:02 ` Basil L. Contovounesios
2018-05-31 18:40 ` João Távora
2018-06-02 19:33 ` Eric Abrahamsen
2018-06-03 3:49 ` Stefan Monnier
2018-06-03 4:59 ` Eric Abrahamsen
2018-06-03 14:51 ` Helmut Eller
2018-06-03 15:15 ` Eric Abrahamsen
2018-06-03 15:53 ` Helmut Eller
2018-06-03 16:40 ` Eric Abrahamsen
2018-06-03 19:57 ` Drew Adams
2018-06-03 21:15 ` Eric Abrahamsen
2018-06-03 23:23 ` Drew Adams
2018-06-04 13:56 ` Stefan Monnier
2018-06-04 15:24 ` Drew Adams
2018-06-04 15:44 ` Pierre Neidhardt
2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie
2018-05-25 20:35 ` Peter Neidhardt
2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen
2018-05-25 23:32 ` Peter Neidhardt
2018-05-27 16:56 ` Tom Tromey
2018-05-27 20:16 ` Alan Mackenzie
2018-05-27 20:23 ` Stefan Monnier
2018-05-27 20:16 ` Stefan Monnier
2018-05-28 16:36 ` Pierre Neidhardt
2018-05-28 17:04 ` Stefan Monnier
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.