* rx.el sexp regexp syntax (WAS: Off Topic) @ 2018-05-24 10:47 Noam Postavsky 2018-05-24 10:58 ` Van L 2018-05-25 2:57 ` Richard Stallman 0 siblings, 2 replies; 54+ messages in thread From: Noam Postavsky @ 2018-05-24 10:47 UTC (permalink / raw) To: Emacs developers; +Cc: Van L, Eli Zaretskii, Richard Stallman On 24 May 2018 at 05:03, Robert Pluim <rpluim@gmail.com> wrote: > Richard Stallman <rms@gnu.org> writes: >> > > We have such things but we haven't adopted any of them in Emacs itself. >> >> > Doesn't rx.el qualify? >> >> It's an example of what I said. We have it, but we don't actually use it >> much if at all. This suggests to me that it has drawbacks which prevent >> it from being clearly superior. Ah, I misunderstood what you meant by "adopted". > Iʼve never used rx.el because I didnʼt know it existed. Itʼs not > described in the regular expression chapter of the emacs lisp > reference manual, nor in the emacs user manual. > >> If someone comes up with a replacement syntax that reduces the drawbacks, >> we might start using it all the time. > > Documenting rx.el would be a more productive use of time, I think, > especially since Iʼve now noticed via reading rx.el that we already > have sregex as well. Let's pick one rather than inventing a third > syntax. sregex.el is in lisp/obsolete/, so we have picked rx.el. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky @ 2018-05-24 10:58 ` Van L 2018-05-25 2:57 ` Richard Stallman 1 sibling, 0 replies; 54+ messages in thread From: Van L @ 2018-05-24 10:58 UTC (permalink / raw) To: Noam Postavsky; +Cc: Emacs developers > Noam Postavsky writes: > > Ah, I misunderstood what you meant by "adopted”. A round-trip test for Emacs with voice-recognition is to listen to two and three year olds for requests like: Hey, Emacs, roar like a fierce tiger. Hey, Emacs, sing like a black bird. Hey, Emacs, what do zebras do? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky 2018-05-24 10:58 ` Van L @ 2018-05-25 2:57 ` Richard Stallman 2018-05-25 8:52 ` Pierre Neidhardt 1 sibling, 1 reply; 54+ messages in thread From: Richard Stallman @ 2018-05-25 2:57 UTC (permalink / raw) To: Noam Postavsky; +Cc: van, eliz, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Documenting rx.el better and more visibly is a good idea. We can see if people find it convenient to use. -- Dr Richard Stallman President, Free Software Foundation (https://gnu.org, https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-25 2:57 ` Richard Stallman @ 2018-05-25 8:52 ` Pierre Neidhardt 2018-05-25 15:51 ` Alan Mackenzie 2018-05-27 20:16 ` Stefan Monnier 0 siblings, 2 replies; 54+ messages in thread From: Pierre Neidhardt @ 2018-05-25 8:52 UTC (permalink / raw) To: rms; +Cc: van, eliz, Noam Postavsky, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1847 bytes --] rx.el is one of the best concepts I've discovered in a long time. It's another instance of "Don't come up with a new (mini)language when Lisp can do better": it's easier to learn, more flexible, easier to write, much easier to read and as a consequence much more maintainable. > Some people, when confronted with a problem, think "I know, I'll use > regular expressions." Now they have two problems. > -- Jamie Zawinski It's also much more "programmable" thanks to its `eval' expression. (It's possible to count!) See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/ for some nice examples. I think it's high time we moved away from traditional regexps and embraced the concept of rx.el. I'm thinking of implementing it for Guile. At the moment the rx.el implementation is built on top of Emacs regexps which are implemented in C. I believe this does not use the power of Lisp as much as it could. The traditional regexps work in two steps: first build a blackbox automaton from the string expression, then test if the input matches. Building the automaton is costly. In C, we build it once and save the result in a variable so that every regexp match does not rebuild the automaton each time. In high-level languages, automatons are automatically cached to save the cost of building them. The rx.el library/concept could alleviate this issue altogether: because we express the automaton directly in Lisp, the parsing step is not needed and thus the building cost could be tremendously reduced. So the rx.el building steps rx expression -> regexp string -> C regexp automaton could boil down to simply rx automaton It would be interesting to compare the performance. This also means that there would be no need for caching on behalf of the supporting language. What do you think? -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-25 8:52 ` Pierre Neidhardt @ 2018-05-25 15:51 ` Alan Mackenzie 2018-05-25 16:47 ` Pierre Neidhardt ` (2 more replies) 2018-05-27 20:16 ` Stefan Monnier 1 sibling, 3 replies; 54+ messages in thread From: Alan Mackenzie @ 2018-05-25 15:51 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky Hello, Pierre. On Fri, May 25, 2018 at 10:52:03 +0200, Pierre Neidhardt wrote: > rx.el is one of the best concepts I've discovered in a long time. > It's another instance of "Don't come up with a new (mini)language when > Lisp can do better": it's easier to learn, more flexible, easier to > write, much easier to read and as a consequence much more maintainable. Much easier than what? Than the putative mini-language that doesn't get written? > > Some people, when confronted with a problem, think "I know, I'll use > > regular expressions." Now they have two problems. > > -- Jamie Zawinski > It's also much more "programmable" thanks to its `eval' expression. > (It's possible to count!) > See http://francismurillo.github.io/2017-03-30-Exploring-Emacs-rx-Macro/ > for some nice examples. > I think it's high time we moved away from traditional regexps and > embraced the concept of rx.el. I'm thinking of implementing it for > Guile. There's nothing stopping anybody from using rx.el. However, people have mostly _not_ used it. The "I think it's high time ...." suggests in some way forcing people to use it. Before mandating something like this, I think we should find out why it's not already in common use. > At the moment the rx.el implementation is built on top of Emacs regexps > which are implemented in C. I believe this does not use the power of > Lisp as much as it could. But would any alternative use the power of regexps? > The traditional regexps work in two steps: first build a blackbox > automaton from the string expression, then test if the input matches. > Building the automaton is costly. In C, we build it once and save the > result in a variable so that every regexp match does not rebuild the > automaton each time. Emacs has a (moderately large) cache of regexps, so that building the automatons is done very rarely. Possibly just once each for each session of Emacs. > In high-level languages, automatons are automatically cached to save the > cost of building them. Emacs Lisp does this too. > The rx.el library/concept could alleviate this issue altogether: because > we express the automaton directly in Lisp, the parsing step is not > needed and thus the building cost could be tremendously reduced. > So the rx.el building steps > rx expression -> regexp string -> C regexp automaton > could boil down to simply > rx automaton I don't see what you're trying to save, here. At some stage, the regexp source, in whatever form, needs to be converted to an automaton. Are you suggesting here building an interpreter in Lisp directly to execute rx expressions? > It would be interesting to compare the performance. This also means > that there would be no need for caching on behalf of the supporting > language. I will predict that an rx interpreter built in Lisp will be two orders of magnitude slower than the current regexp machine, where both the construction of an automaton, and the byte-code interpreter which runs it are written in C (and probably quite optimised C at that). Regexp performance is critical to Emacs's performance in general. > What do you think? I think we will, in the main, carry on using conventional regular expressions expressed as strings. I can't get excited about rx syntax, which I'm sure would be just as tedious, and possibly more difficult to read than a standard regexp. Analagously, as a musician, I read standard musical notation (with sets of five lines and dots) far more easily and fluently than I could any "simplified" system designed for beginners, which would be bloated by comparison. Regular expressions can be difficult. I don't believe this difficulty lies, in the main, in the compact notation used to express them. Rather it lies in the concepts and the semantics of the regexp elements, and being able to express a "mental automaton" in regexp semantics. > -- > Pierre Neidhardt -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-25 15:51 ` Alan Mackenzie @ 2018-05-25 16:47 ` Pierre Neidhardt 2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen 2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie 2018-05-27 16:56 ` Tom Tromey 2018-05-27 20:23 ` Stefan Monnier 2 siblings, 2 replies; 54+ messages in thread From: Pierre Neidhardt @ 2018-05-25 16:47 UTC (permalink / raw) To: Alan Mackenzie; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky [-- Attachment #1: Type: text/plain, Size: 5929 bytes --] Alan Mackenzie <acm@muc.de> writes: >> rx.el is one of the best concepts I've discovered in a long time. >> It's another instance of "Don't come up with a new (mini)language when >> Lisp can do better": it's easier to learn, more flexible, easier to >> write, much easier to read and as a consequence much more maintainable. > > Much easier than what? Than the putative mini-language that doesn't get > written? I meant that in my opinion rx is easier to write than regexps. That it is not popular is the root of the question here. >> I think it's high time we moved away from traditional regexps and >> embraced the concept of rx.el. I'm thinking of implementing it for >> Guile. > > There's nothing stopping anybody from using rx.el. However, people have > mostly _not_ used it. The "I think it's high time ...." suggests in > some way forcing people to use it. Before mandating something like > this, I think we should find out why it's not already in common use. Sorry if you felt I was forcing, that wasn't my intention. I was referring to the long period regexps have been around. I thought the reason it's not already in common use had already been discussed: it's barely referenced anywhere, it needs more advertising. Correct me if this is wrong. >> At the moment the rx.el implementation is built on top of Emacs regexps >> which are implemented in C. I believe this does not use the power of >> Lisp as much as it could. > > But would any alternative use the power of regexps? Yes, rx.el is a drop-in replacement of regexps. What do you mean? > Emacs has a (moderately large) cache of regexps, so that building the > automatons is done very rarely. Possibly just once each for each > session of Emacs. That's the whole point: if possible (see below), remove the requirements for regexp cache management. >> In high-level languages, automatons are automatically cached to save the >> cost of building them. > > Emacs Lisp does this too. I did not exclude it :) >> The rx.el library/concept could alleviate this issue altogether: because >> we express the automaton directly in Lisp, the parsing step is not >> needed and thus the building cost could be tremendously reduced. > >> So the rx.el building steps > >> rx expression -> regexp string -> C regexp automaton > >> could boil down to simply > >> rx automaton > > I don't see what you're trying to save, here. At some stage, the regexp > source, in whatever form, needs to be converted to an automaton. Yes, that's what I meant with "rx automaton". My suggestion (not necessarily for Emacs Lisp) is to remove the step that converts the rx symbolic automaton to a string, and the conversion from a string to the actual automaton. > Are you suggesting here building an interpreter in Lisp directly to > execute rx expressions? Yes, but maybe in Guile or some other Lisp. Don't know if it's feasible in Emacs Lisp. >> It would be interesting to compare the performance. This also means >> that there would be no need for caching on behalf of the supporting >> language. > > I will predict that an rx interpreter built in Lisp will be two orders > of magnitude slower than the current regexp machine, where both the > construction of an automaton, and the byte-code interpreter which runs > it are written in C (and probably quite optimised C at that). Obviously, and this is the prime reason why the author of rx.el implemented it on top of C regexp. My point was that with a fast Lisp (or a specifically designed C support), a Lisp automaton would be just as fast: the Lisp code would directly map the equivalent C automaton. Again, I have no clue if that's doable in Emacs Lisp. > I can't get excited about rx syntax, which I'm sure would be just as > tedious, and possibly more difficult to read than a standard regexp. Have you used rx? The whole point of the library is to increase readability, and it does a great job at it in my opinion. > Analagously, as a musician, I read standard musical notation (with > sets of five lines and dots) far more easily and fluently than I could > any "simplified" system designed for beginners, which would be bloated > by comparison. rx.el is meant to be "simplified for beginners". You could also reverse the analogy in saying that regexps are the "simplified version for beginners"... The analogy does not map very well. A better analogy would be the mapping between assembly and the hexadecimal codes of CPU instructions: I don't think many people find hexedecimal codes more explicit than assembly verbs and symbols (although most assembly languages abuse abbreviations, but the intention is there). > Regular expressions can be difficult. I don't believe this difficulty > lies, in the main, in the compact notation used to express them. Rather > it lies in the concepts and the semantics of the regexp elements, and > being able to express a "mental automaton" in regexp semantics. The semantic between rx and regexp does not differ. It's purely syntactical. Let's consider some points: - rx can be written over multiple lines and indented. This is a great readibility booster for groups, which can be _grouped_ together with linebreaks and indentation. - rx does not require escaping any character with backslashes. This is always a great source of confusion when switching from BRE to ERE, between different interpreters and when storing regexp in Lisp strings where backslashes must be escaped themselves for instance. - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have a trivial _English_ counterpart in rx: (respectively "word-start", nothing, "line-start" _and_ "not"). - No more special-case symbols like "-" for ranges or "^" (negation when first character in square brackets). Thus less cognitive burden. - The "^" has a double-meaning in regexp: "line-start" and "not". The list goes on. -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 16:47 ` Pierre Neidhardt @ 2018-05-25 18:01 ` Eric Abrahamsen 2018-05-25 18:12 ` Pierre Neidhardt 2018-05-27 20:27 ` Stefan Monnier 2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie 1 sibling, 2 replies; 54+ messages in thread From: Eric Abrahamsen @ 2018-05-25 18:01 UTC (permalink / raw) To: emacs-devel Pierre Neidhardt <pe.neidhardt@googlemail.com> writes: > Alan Mackenzie <acm@muc.de> writes: > >>> rx.el is one of the best concepts I've discovered in a long time. >>> It's another instance of "Don't come up with a new (mini)language when >>> Lisp can do better": it's easier to learn, more flexible, easier to >>> write, much easier to read and as a consequence much more maintainable. >> >> Much easier than what? Than the putative mini-language that doesn't get >> written? > > I meant that in my opinion rx is easier to write than regexps. That it > is not popular is the root of the question here. Slightly off-topic: I wouldn't ever use rx unless I was writing a really brutal regexp, but what I *would* use all day long would be a macro that un-escaped backslashes for me. Ideally: (string-match (rx-unescape "turn (left|right)") "turn right") => 0 But even this would be an improvement: (string-match (rx-unescape "turn \(left\|right\)") "turn right") => 0 I looked in the repos, but didn't see any packages that do this. As an aside, this thread led me to find the rx pcase matcher, something I'd daydreamed about before, so I'm already pretty happy :) Eric ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen @ 2018-05-25 18:12 ` Pierre Neidhardt 2018-05-25 18:56 ` Eric Abrahamsen 2018-05-27 20:27 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Pierre Neidhardt @ 2018-05-25 18:12 UTC (permalink / raw) To: Eric Abrahamsen; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 186 bytes --] Eric Abrahamsen <eric@ericabrahamsen.net> writes: > Slightly off-topic: I wouldn't ever use rx unless I was writing a really > brutal regexp, Why not? -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 18:12 ` Pierre Neidhardt @ 2018-05-25 18:56 ` Eric Abrahamsen 2018-05-25 21:42 ` Clément Pit-Claudel 0 siblings, 1 reply; 54+ messages in thread From: Eric Abrahamsen @ 2018-05-25 18:56 UTC (permalink / raw) To: emacs-devel Pierre Neidhardt <ambrevar@gmail.com> writes: > Eric Abrahamsen <eric@ericabrahamsen.net> writes: > >> Slightly off-topic: I wouldn't ever use rx unless I was writing a really >> brutal regexp, > > Why not? I think it's just what Alan is saying: if you're familiar with regexp, it's easier to write them directly. Each of us has a different level of familiarity, thus each of us has a different point at which we might break off and use rx. A regexp that would make my eyes cross (with "\\(\\(\\(") could be perfectly comprehensible to someone else. Personally, I'd use rx with more than two nested layers of group-capture, but that's only me. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 18:56 ` Eric Abrahamsen @ 2018-05-25 21:42 ` Clément Pit-Claudel 2018-05-25 21:51 ` Eric Abrahamsen 0 siblings, 1 reply; 54+ messages in thread From: Clément Pit-Claudel @ 2018-05-25 21:42 UTC (permalink / raw) To: emacs-devel On 2018-05-25 14:56, Eric Abrahamsen wrote: > A regexp that would make my eyes cross (with > "\\(\\(\\(") could be perfectly comprehensible to someone else. > Personally, I'd use rx with more than two nested layers of > group-capture, but that's only me. Shameless advertisement: https://github.com/cpitclaudel/easy-escape It's like prettify-symbols-mode, for regexps; it shows grouping parentheses in a different color, and hides the backslashes. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 21:42 ` Clément Pit-Claudel @ 2018-05-25 21:51 ` Eric Abrahamsen 2018-05-25 22:27 ` Michael Heerdegen 0 siblings, 1 reply; 54+ messages in thread From: Eric Abrahamsen @ 2018-05-25 21:51 UTC (permalink / raw) To: emacs-devel Clément Pit-Claudel <cpitclaudel@gmail.com> writes: > On 2018-05-25 14:56, Eric Abrahamsen wrote: >> A regexp that would make my eyes cross (with >> "\\(\\(\\(") could be perfectly comprehensible to someone else. >> Personally, I'd use rx with more than two nested layers of >> group-capture, but that's only me. > > Shameless advertisement: https://github.com/cpitclaudel/easy-escape > It's like prettify-symbols-mode, for regexps; it shows grouping > parentheses in a different color, and hides the backslashes. I know, a few hours ago I went hunting in the repos for relevant packages, and found yours. It certainly helps! Though I'd still rather have something that actually transforms the regexps at compile time... ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 21:51 ` Eric Abrahamsen @ 2018-05-25 22:27 ` Michael Heerdegen 2018-05-25 22:44 ` Eric Abrahamsen 0 siblings, 1 reply; 54+ messages in thread From: Michael Heerdegen @ 2018-05-25 22:27 UTC (permalink / raw) To: Eric Abrahamsen; +Cc: emacs-devel Eric Abrahamsen <eric@ericabrahamsen.net> writes: > I know, a few hours ago I went hunting in the repos for relevant > packages, and found yours. It certainly helps! Though I'd still rather > have something that actually transforms the regexps at compile time... I don't understand what you mean. rx transforms regexps at compile time. Michael. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 22:27 ` Michael Heerdegen @ 2018-05-25 22:44 ` Eric Abrahamsen 0 siblings, 0 replies; 54+ messages in thread From: Eric Abrahamsen @ 2018-05-25 22:44 UTC (permalink / raw) To: emacs-devel Michael Heerdegen <michael_heerdegen@web.de> writes: > Eric Abrahamsen <eric@ericabrahamsen.net> writes: > >> I know, a few hours ago I went hunting in the repos for relevant >> packages, and found yours. It certainly helps! Though I'd still rather >> have something that actually transforms the regexps at compile time... > > I don't understand what you mean. rx transforms regexps at compile > time. I mean, a macro that lets me write an unescaped regexp that gets compiled to an escaped regexp. So: (r "turn (left|right)") compiles to "turn \\(left\\|right\\)" Basically like python's string literals. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen 2018-05-25 18:12 ` Pierre Neidhardt @ 2018-05-27 20:27 ` Stefan Monnier 2018-05-28 16:37 ` Pierre Neidhardt 2018-06-02 19:33 ` Eric Abrahamsen 1 sibling, 2 replies; 54+ messages in thread From: Stefan Monnier @ 2018-05-27 20:27 UTC (permalink / raw) To: emacs-devel > brutal regexp, but what I *would* use all day long would be a macro that > un-escaped backslashes for me. Ideally: That'd be a good first step. A second important step would be to easily embed comments and Elisp code (mostly references to other Elisp variables). Stfean ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-27 20:27 ` Stefan Monnier @ 2018-05-28 16:37 ` Pierre Neidhardt 2018-05-28 17:15 ` Stefan Monnier 2018-06-02 19:33 ` Eric Abrahamsen 1 sibling, 1 reply; 54+ messages in thread From: Pierre Neidhardt @ 2018-05-28 16:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 421 bytes --] Stefan Monnier <monnier@iro.umontreal.ca> writes: >> brutal regexp, but what I *would* use all day long would be a macro that >> un-escaped backslashes for me. Ideally: > > That'd be a good first step. > A second important step would be to easily embed comments and Elisp code > (mostly references to other Elisp variables). rx.el can do all that (with "eval") if I'm not mistaken. -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-28 16:37 ` Pierre Neidhardt @ 2018-05-28 17:15 ` Stefan Monnier 2018-05-29 3:10 ` Richard Stallman 2018-05-29 8:27 ` Philipp Stephani 0 siblings, 2 replies; 54+ messages in thread From: Stefan Monnier @ 2018-05-28 17:15 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: emacs-devel >>> brutal regexp, but what I *would* use all day long would be a macro that >>> un-escaped backslashes for me. Ideally: >> That'd be a good first step. >> A second important step would be to easily embed comments and Elisp code >> (mostly references to other Elisp variables). > rx.el can do all that (with "eval") if I'm not mistaken. The main problem of RX is not lack of features, but verbosity which for me makes it disappointingly difficult to read (not always worse than string regexps, admittedly, but still). Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-28 17:15 ` Stefan Monnier @ 2018-05-29 3:10 ` Richard Stallman 2018-05-29 7:28 ` Robert Pluim 2018-05-29 8:27 ` Philipp Stephani 1 sibling, 1 reply; 54+ messages in thread From: Richard Stallman @ 2018-05-29 3:10 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, ambrevar [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > The main problem of RX is not lack of features, but verbosity which for > me makes it disappointingly difficult to read (not always worse than > string regexps, admittedly, but still). Can someone design a more brief format that is nonetheless more elegant and readable than rx format? -- Dr Richard Stallman President, Free Software Foundation (https://gnu.org, https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-29 3:10 ` Richard Stallman @ 2018-05-29 7:28 ` Robert Pluim 0 siblings, 0 replies; 54+ messages in thread From: Robert Pluim @ 2018-05-29 7:28 UTC (permalink / raw) To: Richard Stallman; +Cc: ambrevar, Stefan Monnier, emacs-devel Richard Stallman <rms@gnu.org> writes: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > The main problem of RX is not lack of features, but verbosity which for > > me makes it disappointingly difficult to read (not always worse than > > string regexps, admittedly, but still). > > Can someone design a more brief format > that is nonetheless more elegant and readable than rx format? I thought that we werenʼt going to design yet-another-format. I find rx quite readable. Itʼs a little verbose, but the advantage of not having to deal with massive numbers of backslashes outweighs that for me. Robert ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-28 17:15 ` Stefan Monnier 2018-05-29 3:10 ` Richard Stallman @ 2018-05-29 8:27 ` Philipp Stephani 2018-05-30 3:24 ` Richard Stallman 1 sibling, 1 reply; 54+ messages in thread From: Philipp Stephani @ 2018-05-29 8:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, Pierre Neidhardt [-- Attachment #1: Type: text/plain, Size: 801 bytes --] Stefan Monnier <monnier@iro.umontreal.ca> schrieb am Mo., 28. Mai 2018 um 19:16 Uhr: > >>> brutal regexp, but what I *would* use all day long would be a macro > that > >>> un-escaped backslashes for me. Ideally: > >> That'd be a good first step. > >> A second important step would be to easily embed comments and Elisp code > >> (mostly references to other Elisp variables). > > rx.el can do all that (with "eval") if I'm not mistaken. > > The main problem of RX is not lack of features, but verbosity which for > me makes it disappointingly difficult to read (not always worse than > string regexps, admittedly, but still). > > FWIW, I think its verbosity is RX's main *advantage*. It makes regular expressions so much easier to read that I stopped writing regex strings the moment I discovered RX. [-- Attachment #2: Type: text/html, Size: 1173 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-29 8:27 ` Philipp Stephani @ 2018-05-30 3:24 ` Richard Stallman 2018-05-30 7:25 ` Robert Pluim 0 siblings, 1 reply; 54+ messages in thread From: Richard Stallman @ 2018-05-30 3:24 UTC (permalink / raw) To: Philipp Stephani; +Cc: ambrevar, monnier, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > FWIW, I think its verbosity is RX's main *advantage*. It makes regular > expressions so much easier to read that I stopped writing regex strings the > moment I discovered RX. The clearer representation of structure is not the same thing as verbosity. rx does both, but they are not the same thing. We could envision making the structure more or less equally clear without making the patterns so long. -- Dr Richard Stallman President, Free Software Foundation (https://gnu.org, https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-30 3:24 ` Richard Stallman @ 2018-05-30 7:25 ` Robert Pluim 2018-05-31 3:53 ` Richard Stallman ` (2 more replies) 0 siblings, 3 replies; 54+ messages in thread From: Robert Pluim @ 2018-05-30 7:25 UTC (permalink / raw) To: Richard Stallman; +Cc: Philipp Stephani, emacs-devel, monnier, ambrevar Richard Stallman <rms@gnu.org> writes: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > FWIW, I think its verbosity is RX's main *advantage*. It makes regular > > expressions so much easier to read that I stopped writing regex strings the > > moment I discovered RX. > > The clearer representation of structure is not the same thing as > verbosity. rx does both, but they are not the same thing. We could > envision making the structure more or less equally clear without > making the patterns so long. Itʼs not clear to me how you'd do that. Looking at rx-constituents, quite a few of the verbose ways of specifying what to match already have a succinct version, eg sequence => and zero-or-more => * and frankly being able to write 'bos' rather than remembering '\\`' or 'symbol-start' rather than '\\_<' is a net win in my eyes. Robert ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-30 7:25 ` Robert Pluim @ 2018-05-31 3:53 ` Richard Stallman 2018-05-31 8:57 ` Robert Pluim 2018-05-31 4:13 ` Clément Pit-Claudel 2018-05-31 14:19 ` Stefan Monnier 2 siblings, 1 reply; 54+ messages in thread From: Richard Stallman @ 2018-05-31 3:53 UTC (permalink / raw) To: Robert Pluim; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > The clearer representation of structure is not the same thing as > > verbosity. rx does both, but they are not the same thing. We could > > envision making the structure more or less equally clear without > > making the patterns so long. > It's not clear to me how you'd do that. I don't see a specific way either, but someone might come up with a way. I'm suggesting this as a topic of investigation. > and frankly being able to write 'bos' rather than remembering '\\`' or > 'symbol-start' rather than '\\_<' is a net win in my eyes. I agree, as regards those. On the other hand, those strings might not be the best. Maybe 'text<' and 'sym<' would be better. We could have a series of keywords, XYZ< and XYZ>, which would be as systematic as now or more so, and shorter too. -- Dr Richard Stallman President, Free Software Foundation (https://gnu.org, https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 3:53 ` Richard Stallman @ 2018-05-31 8:57 ` Robert Pluim 0 siblings, 0 replies; 54+ messages in thread From: Robert Pluim @ 2018-05-31 8:57 UTC (permalink / raw) To: Richard Stallman; +Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > > The clearer representation of structure is not the same thing as > > > verbosity. rx does both, but they are not the same thing. We could > > > envision making the structure more or less equally clear without > > > making the patterns so long. > > > It's not clear to me how you'd do that. > > I don't see a specific way either, but someone might come up with a way. > I'm suggesting this as a topic of investigation. > > > and frankly being able to write 'bos' rather than remembering '\\`' or > > 'symbol-start' rather than '\\_<' is a net win in my eyes. > > I agree, as regards those. On the other hand, those strings might not > be the best. Maybe 'text<' and 'sym<' would be better. We could > have a series of keywords, XYZ< and XYZ>, which would be as systematic > as now or more so, and shorter too. What we have now is [be]o[lstw], which covers lines, strings, and words. The only thing missing is symbols, which is easily fixed like so [1]: diff --git i/lisp/emacs-lisp/rx.el w/lisp/emacs-lisp/rx.el index 8059bf2a6e..833321cd7b 100644 --- i/lisp/emacs-lisp/rx.el +++ w/lisp/emacs-lisp/rx.el @@ -170,7 +170,9 @@ rx-constituents (word-boundary . "\\b") (not-word-boundary . "\\B") ; sregex (symbol-start . "\\_<") + (boS . "\\_<") (symbol-end . "\\_>") + (eoS . "\\_>") (syntax . (rx-syntax 1 1)) (not-syntax . (rx-not-syntax 1 1)) ; sregex (category . (rx-category 1 1 rx-check-category)) Footnotes: [1] Iʼm only half joking ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-30 7:25 ` Robert Pluim 2018-05-31 3:53 ` Richard Stallman @ 2018-05-31 4:13 ` Clément Pit-Claudel 2018-05-31 14:19 ` Stefan Monnier 2 siblings, 0 replies; 54+ messages in thread From: Clément Pit-Claudel @ 2018-05-31 4:13 UTC (permalink / raw) To: emacs-devel On 2018-05-30 03:25, Robert Pluim wrote: > and frankly being able to write 'bos' rather than remembering '\\`' Fun fact: \` and \' match the usual symbols used to delimit quoted terms in docstrings, `like-this'. I find that the correspondence makes it very easy to remember the meaning of \` and \'. Clément. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-30 7:25 ` Robert Pluim 2018-05-31 3:53 ` Richard Stallman 2018-05-31 4:13 ` Clément Pit-Claudel @ 2018-05-31 14:19 ` Stefan Monnier 2018-05-31 15:43 ` Drew Adams 2 siblings, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-05-31 14:19 UTC (permalink / raw) To: emacs-devel > Itʼs not clear to me how you'd do that. Looking at rx-constituents, > quite a few of the verbose ways of specifying what to match already > have a succinct version, eg > > sequence => and > zero-or-more => * The verbosity for me is not so much in the identifier as in the "( ID SPC ) SPC" and the need for quotation marks to surround actual characters. So for example the string's single-char * turns into a 5-char * in RX. I really like the regularity, extensibility, and clear structure of RX, but in practice it makes the regexps too long: short regexps are simple enough that RX's advantages don't get a chance to shine, and more complex regexps are made to spread too many lines for comfort. That doesn't mean I don't like RX, by the way. Just that I expected I'd really love it, and in the end I never use it because I never find it to be significantly better (I do think it's significantly better when you need to manipulate it programmatically, of course, which is why lex.el takes an RX syntax as input). Stefan PS: By the way, we should deprecate the `and` shorthand for `sequence`, because `and` in regexps could also mean "conjunction" (that's what it means in lex.el). ^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax 2018-05-31 14:19 ` Stefan Monnier @ 2018-05-31 15:43 ` Drew Adams 2018-05-31 16:12 ` João Távora 0 siblings, 1 reply; 54+ messages in thread From: Drew Adams @ 2018-05-31 15:43 UTC (permalink / raw) To: Stefan Monnier, emacs-devel > The verbosity for me is not so much in the identifier as in the "( ID > SPC ) SPC" and the need for quotation marks to surround actual > characters. So for example the string's single-char * turns into > a 5-char * in RX. > > I really like the regularity, extensibility, and clear structure of RX, > but in practice it makes the regexps too long: short regexps are > simple enough that RX's advantages don't get a chance to shine, and more > complex regexps are made to spread too many lines for comfort. > > That doesn't mean I don't like RX, by the way. Just that I expected I'd > really love it, and in the end I never use it because I never find it to > be significantly better (I do think it's significantly better when you > need to manipulate it programmatically, of course, which is why lex.el > takes an RX syntax as input). This summary applies for me, as well. Functions that transform a regexp string to an RX sexp and vice versa would be very helpful. Given such functions, I might use RX and the show-me-the-regexp function to create a regexp string, which I'd leave in the code, and I might use the show-me-the-RX function when I need to change such a string (or think about it). ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 15:43 ` Drew Adams @ 2018-05-31 16:12 ` João Távora 2018-05-31 16:18 ` Robert Pluim 0 siblings, 1 reply; 54+ messages in thread From: João Távora @ 2018-05-31 16:12 UTC (permalink / raw) To: Drew Adams; +Cc: Stefan Monnier, emacs-devel Drew Adams <drew.adams@oracle.com> writes: > Given such functions, I might use RX and the show-me-the-regexp > function to create a regexp string, which I'd leave in the code, > and I might use the show-me-the-RX function when I need to change > such a string (or think about it). +1, FWIW. But are there such functions? </delurk> ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 16:12 ` João Távora @ 2018-05-31 16:18 ` Robert Pluim 2018-05-31 16:48 ` Basil L. Contovounesios 0 siblings, 1 reply; 54+ messages in thread From: Robert Pluim @ 2018-05-31 16:18 UTC (permalink / raw) To: João Távora; +Cc: Stefan Monnier, Drew Adams, emacs-devel João Távora <joaotavora@gmail.com> writes: > Drew Adams <drew.adams@oracle.com> writes: > >> Given such functions, I might use RX and the show-me-the-regexp >> function to create a regexp string, which I'd leave in the code, >> and I might use the show-me-the-RX function when I need to change >> such a string (or think about it). > > +1, FWIW. But are there such functions? </delurk> rx->regexp obviously exists, but Iʼm not aware of the reverse. Thank you for volunteering! ;-) Robert ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 16:18 ` Robert Pluim @ 2018-05-31 16:48 ` Basil L. Contovounesios 2018-05-31 17:02 ` Basil L. Contovounesios 0 siblings, 1 reply; 54+ messages in thread From: Basil L. Contovounesios @ 2018-05-31 16:48 UTC (permalink / raw) To: emacs-devel Robert Pluim <rpluim@gmail.com> writes: > João Távora <joaotavora@gmail.com> writes: > >> Drew Adams <drew.adams@oracle.com> writes: >> >>> Given such functions, I might use RX and the show-me-the-regexp >>> function to create a regexp string, which I'd leave in the code, >>> and I might use the show-me-the-RX function when I need to change >>> such a string (or think about it). >> >> +1, FWIW. But are there such functions? </delurk> > > rx->regexp obviously exists, but Iʼm not aware of the reverse. At first glance, lex-parse-re from lex.el seems to fill the role, though its output might need a little tweaking to be completely rx-compatible. -- Basil ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 16:48 ` Basil L. Contovounesios @ 2018-05-31 17:02 ` Basil L. Contovounesios 2018-05-31 18:40 ` João Távora 0 siblings, 1 reply; 54+ messages in thread From: Basil L. Contovounesios @ 2018-05-31 17:02 UTC (permalink / raw) To: emacs-devel "Basil L. Contovounesios" <contovob@tcd.ie> writes: > Robert Pluim <rpluim@gmail.com> writes: > >> João Távora <joaotavora@gmail.com> writes: >> >>> Drew Adams <drew.adams@oracle.com> writes: >>> >>>> Given such functions, I might use RX and the show-me-the-regexp >>>> function to create a regexp string, which I'd leave in the code, >>>> and I might use the show-me-the-RX function when I need to change >>>> such a string (or think about it). >>> >>> +1, FWIW. But are there such functions? </delurk> >> >> rx->regexp obviously exists, but Iʼm not aware of the reverse. > > At first glance, lex-parse-re from lex.el seems to fill the role, though ^^^^^^ I meant lex-parse-re.el, which is part of the lex package on ELPA. -- Basil ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-31 17:02 ` Basil L. Contovounesios @ 2018-05-31 18:40 ` João Távora 0 siblings, 0 replies; 54+ messages in thread From: João Távora @ 2018-05-31 18:40 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: emacs-devel "Basil L. Contovounesios" <contovob@tcd.ie> writes: > "Basil L. Contovounesios" <contovob@tcd.ie> writes: > >> Robert Pluim <rpluim@gmail.com> writes: >> >>> João Távora <joaotavora@gmail.com> writes: >>> >>>> Drew Adams <drew.adams@oracle.com> writes: >>>> >>>>> Given such functions, I might use RX and the show-me-the-regexp >>>>> function to create a regexp string, which I'd leave in the code, >>>>> and I might use the show-me-the-RX function when I need to change >>>>> such a string (or think about it). >>>> >>>> +1, FWIW. But are there such functions? </delurk> >>> >>> rx->regexp obviously exists, but Iʼm not aware of the reverse. >> >> At first glance, lex-parse-re from lex.el seems to fill the role, though > ^^^^^^ > I meant lex-parse-re.el, which is part of the lex package on ELPA. And, notably, https://github.com/joddie/pcre2el looks pretty good too. João ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-27 20:27 ` Stefan Monnier 2018-05-28 16:37 ` Pierre Neidhardt @ 2018-06-02 19:33 ` Eric Abrahamsen 2018-06-03 3:49 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Eric Abrahamsen @ 2018-06-02 19:33 UTC (permalink / raw) To: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> brutal regexp, but what I *would* use all day long would be a macro that >> un-escaped backslashes for me. Ideally: > > That'd be a good first step. > A second important step would be to easily embed comments and Elisp code > (mostly references to other Elisp variables). I played around with this, but it might not be possible to achieve enough of the conveniences to make it worthwhile. I wanted to partially reverse the sense of the backslash -- "(" would be translated to "\\(" -- but also to write backslash specials with a single backslash: ie the first group reference could actually be written as "\1", rather than "\\1". Of course, the string reader interprets that as a control character, and I doubt there's any way around that, at least not in lisp. It's still nice to be able to write "\\(cat\\|dog\\)" as "(cat|dog)", but I'm not sure it's worth it. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-02 19:33 ` Eric Abrahamsen @ 2018-06-03 3:49 ` Stefan Monnier 2018-06-03 4:59 ` Eric Abrahamsen 2018-06-03 14:51 ` Helmut Eller 0 siblings, 2 replies; 54+ messages in thread From: Stefan Monnier @ 2018-06-03 3:49 UTC (permalink / raw) To: emacs-devel > I played around with this, but it might not be possible to achieve > enough of the conveniences to make it worthwhile. I think we'd need to extend the reader to provide a specialized syntax, which in turns means changing elisp-mode, etc... I think it's a fairly large amount of work, for fairly little benefit in the end compared to what we can get today. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 3:49 ` Stefan Monnier @ 2018-06-03 4:59 ` Eric Abrahamsen 2018-06-03 14:51 ` Helmut Eller 1 sibling, 0 replies; 54+ messages in thread From: Eric Abrahamsen @ 2018-06-03 4:59 UTC (permalink / raw) To: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> I played around with this, but it might not be possible to achieve >> enough of the conveniences to make it worthwhile. > > I think we'd need to extend the reader to provide a specialized syntax, > which in turns means changing elisp-mode, etc... > I think it's a fairly large amount of work, for fairly little benefit in > the end compared to what we can get today. Yup. Oh well, it was a beautiful dream. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 3:49 ` Stefan Monnier 2018-06-03 4:59 ` Eric Abrahamsen @ 2018-06-03 14:51 ` Helmut Eller 2018-06-03 15:15 ` Eric Abrahamsen 1 sibling, 1 reply; 54+ messages in thread From: Helmut Eller @ 2018-06-03 14:51 UTC (permalink / raw) To: emacs-devel On Sat, Jun 02 2018, Stefan Monnier wrote: >> I played around with this, but it might not be possible to achieve >> enough of the conveniences to make it worthwhile. > > I think we'd need to extend the reader to provide a specialized syntax, > which in turns means changing elisp-mode, etc... > I think it's a fairly large amount of work, for fairly little benefit in > the end compared to what we can get today. Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which could translate Perl/Python regexp syntax to Emacs regexp syntax. Helmut ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 14:51 ` Helmut Eller @ 2018-06-03 15:15 ` Eric Abrahamsen 2018-06-03 15:53 ` Helmut Eller 0 siblings, 1 reply; 54+ messages in thread From: Eric Abrahamsen @ 2018-06-03 15:15 UTC (permalink / raw) To: emacs-devel Helmut Eller <eller.helmut@gmail.com> writes: > On Sat, Jun 02 2018, Stefan Monnier wrote: > >>> I played around with this, but it might not be possible to achieve >>> enough of the conveniences to make it worthwhile. >> >> I think we'd need to extend the reader to provide a specialized syntax, >> which in turns means changing elisp-mode, etc... >> I think it's a fairly large amount of work, for fairly little benefit in >> the end compared to what we can get today. > > Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which > could translate Perl/Python regexp syntax to Emacs regexp syntax. I made something like that, but the only advantage is your example above: open and close parentheses, and the vertical bar. Everything else still has to be double-backslashed. It feels inconsistent, and isn't that much of a benefit... ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 15:15 ` Eric Abrahamsen @ 2018-06-03 15:53 ` Helmut Eller 2018-06-03 16:40 ` Eric Abrahamsen 2018-06-03 19:57 ` Drew Adams 0 siblings, 2 replies; 54+ messages in thread From: Helmut Eller @ 2018-06-03 15:53 UTC (permalink / raw) To: emacs-devel On Sun, Jun 03 2018, Eric Abrahamsen wrote: >> Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which >> could translate Perl/Python regexp syntax to Emacs regexp syntax. > > I made something like that, but the only advantage is your example > above: open and close parentheses, and the vertical bar. Everything else > still has to be double-backslashed. It feels inconsistent, and isn't > that much of a benefit... Any benefit on a small example is bound to be small. And yes, in practice most regexps fit on a single line. i.e. they are small. I'm not sure what you mean with "everything else" as in this example there's nothing else. The example would translate to: "\\(a\\|\\b\\)" which, while it is one character shorter, is also quite ugly. Maybe you mean that things like \w cannot occur in ordinary strings without double-backslash. Hmm.. that's indeed a problem. Helmut ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 15:53 ` Helmut Eller @ 2018-06-03 16:40 ` Eric Abrahamsen 2018-06-03 19:57 ` Drew Adams 1 sibling, 0 replies; 54+ messages in thread From: Eric Abrahamsen @ 2018-06-03 16:40 UTC (permalink / raw) To: emacs-devel Helmut Eller <eller.helmut@gmail.com> writes: > On Sun, Jun 03 2018, Eric Abrahamsen wrote: > >>> Maybe we could start with an ordinary macro, e.g. (pcre "(a|b)"), which >>> could translate Perl/Python regexp syntax to Emacs regexp syntax. >> >> I made something like that, but the only advantage is your example >> above: open and close parentheses, and the vertical bar. Everything else >> still has to be double-backslashed. It feels inconsistent, and isn't >> that much of a benefit... > > Any benefit on a small example is bound to be small. And yes, in > practice most regexps fit on a single line. i.e. they are small. > > I'm not sure what you mean with "everything else" as in this example > there's nothing else. The example would translate to: "\\(a\\|\\b\\)" > which, while it is one character shorter, is also quite ugly. > > Maybe you mean that things like \w cannot occur in ordinary strings > without double-backslash. Hmm.. that's indeed a problem. Yes, that's what I meant: all the other backslash constructions. If we still have to write "(atl|choo)\\1", it doesn't feel consistent, and I think doesn't save much mental overhead. #+BEGIN_SRC elisp (defmacro pcre (str) (with-temp-buffer (insert str) (goto-char (point-min)) (while (< (point) (point-max)) (cond ((looking-at "\\\\\\([(|)]\\)") ;; Remove double backslashes. (replace-match "\\1")) ((looking-at "\\([(|)]\\)") (replace-match "\\\\\\1")) ;; Add parsing of comments and elisp forms here. (t (forward-char)))) (buffer-string))) #+END_SRC I would do: (defalias 'r 'pcre) ^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax 2018-06-03 15:53 ` Helmut Eller 2018-06-03 16:40 ` Eric Abrahamsen @ 2018-06-03 19:57 ` Drew Adams 2018-06-03 21:15 ` Eric Abrahamsen 2018-06-04 13:56 ` Stefan Monnier 1 sibling, 2 replies; 54+ messages in thread From: Drew Adams @ 2018-06-03 19:57 UTC (permalink / raw) To: Helmut Eller, emacs-devel It's not just about confusion/obscurity due to the "extra" backslashes (for `(', `)', and `|'). It's also about the fact that regexps themselves can be complicated. For example, `directory-listing-before-filename-regexp': "\\([0-9][BkKMGTPEZY]? \\(\\([0-9][0-9][0-9][0-9]-\\)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9]\\(:[0-6][0-9]\\([.,][0-9]+\\)?\\( ?[-+][0-2][0-9][0-5][0-9]\\)?\\)?\\|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9]\\)\\|.*[0-9][BkKMGTPEZY]? \\(\\(\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[ 0-3][0-9]\\|[ 0-3][0-9]\\.? \\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.?\\) +\\([ 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\)\\|\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]\\|\\([ 0-1]?[0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? [ 0-3][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? +\\|[ 0-3][0-9] [ 0-1]?[0-9] +\\)\\([ 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)?\\)\\)\\) +" Even after removing "extra" backslashes, it's still a bear: "([0-9][BkKMGTPEZY]? (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]? ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.? ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?) +([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])? [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9] +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +" As I said before: > Functions that transform a regexp string to an RX > sexp and vice versa would be very helpful. (And yes, the latter exists.) Ideally, we'd have the ability to put your cursor on a regexp in some code and hit a key to: * see a corresponding `rx' sexp and * optionally replace the regexp with the `rx' sexp. Just being able to see the `rx' sexp that corresponds to a regexp in some code (even temporarily in a popup) could help, I think. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 19:57 ` Drew Adams @ 2018-06-03 21:15 ` Eric Abrahamsen 2018-06-03 23:23 ` Drew Adams 2018-06-04 13:56 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Eric Abrahamsen @ 2018-06-03 21:15 UTC (permalink / raw) To: emacs-devel Drew Adams <drew.adams@oracle.com> writes: > It's not just about confusion/obscurity due to the "extra" > backslashes (for `(', `)', and `|'). It's also about the > fact that regexps themselves can be complicated. For > example, `directory-listing-before-filename-regexp': > > "\\([0-9][BkKMGTPEZY]? > \\(\\([0-9][0-9][0-9][0-9]-\\)?[01][0-9]-[0-3][0-9][ T][ > 0-2][0-9][:.][0-5][0-9]\\(:[0-6][0-9]\\([.,][0-9]+\\)?\\( > ?[-+][0-2][0-9][0-5][0-9]\\)?\\)?\\|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9]\\)\\|.*[0-9][BkKMGTPEZY]? > \\(\\(\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? +[ > 0-3][0-9]\\|[ 0-3][0-9]\\.? > \\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.?\\) +\\([ > 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\)\\|\\([A-Za-z']\\|[^\0-\x7f]\\)\\([A-Za-z']\\|[^\0-\x7f]\\)+\\.? > +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]\\|\\([ > 0-1]?[0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? [ > 0-3][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)? +\\|[ 0-3][0-9] [ 0-1]?[0-9] > +\\)\\([ > 0-2][0-9][:.][0-5][0-9]\\|[0-9][0-9][0-9][0-9]\\([A-Za-z]\\|[^\0-\x7f]\\)?\\)\\)\\) > +" > > Even after removing "extra" backslashes, it's still a bear: > > "([0-9][BkKMGTPEZY]? (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ > T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( > ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]? > ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[ > 0-3][0-9]\\.? ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?) +([ > 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? > +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])? > [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9] +)([ > 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +" Sure -- the way I was thinking of it, anyway, you'd use a "pcre" macro for simpler regexp, and `rx' for more complicated ones. The line between simple and complex being drawn in a different place for each coder, obviously. Though I think anyone would benefit from seeing that filename regexp in `rx'! ^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax 2018-06-03 21:15 ` Eric Abrahamsen @ 2018-06-03 23:23 ` Drew Adams 0 siblings, 0 replies; 54+ messages in thread From: Drew Adams @ 2018-06-03 23:23 UTC (permalink / raw) To: Eric Abrahamsen, emacs-devel > I think anyone would benefit from seeing that filename > regexp in `rx'! Which is why a function that returns a corresponding `rx' sexp, given a regexp string, would be useful. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-03 19:57 ` Drew Adams 2018-06-03 21:15 ` Eric Abrahamsen @ 2018-06-04 13:56 ` Stefan Monnier 2018-06-04 15:24 ` Drew Adams 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-06-04 13:56 UTC (permalink / raw) To: emacs-devel > Even after removing "extra" backslashes, it's still a bear: > > "([0-9][BkKMGTPEZY]? > (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]? > ((([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.? > ([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.?) > +([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-\x7f])([A-Za-z']|[^\0-\x7f])+\\.? > +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-\x7f])? > [ 0-3][0-9]([A-Za-z]|[^\0-\x7f])? +|[ 0-3][0-9] [ 0-1]?[0-9] > +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-\x7f])?))) +" For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has fairly little importance: if written "raw" as above, it will be indecipherable in any case. To make it readable, you need to add human-level explanations e.g. by adding comments and naming sub-elements. Which is indeed what is done in the source code: (defvar directory-listing-before-filename-regexp (let* ((l "\\([A-Za-z]\\|[^\0-\177]\\)") (l-or-quote "\\([A-Za-z']\\|[^\0-\177]\\)") ;; In some locales, month abbreviations are as short as 2 letters, ;; and they can be followed by ".". ;; In Breton, a month name can include a quote character. (month (concat l-or-quote l-or-quote "+\\.?")) (s " ") (yyyy "[0-9][0-9][0-9][0-9]") (dd "[ 0-3][0-9]") (HH:MM "[ 0-2][0-9][:.][0-5][0-9]") (seconds "[0-6][0-9]\\([.,][0-9]+\\)?") (zone "[-+][0-2][0-9][0-5][0-9]") (iso-mm-dd "[01][0-9]-[0-3][0-9]") (iso-time (concat HH:MM "\\(:" seconds "\\( ?" zone "\\)?\\)?")) (iso (concat "\\(\\(" yyyy "-\\)?" iso-mm-dd "[ T]" iso-time "\\|" yyyy "-" iso-mm-dd "\\)")) (western (concat "\\(" month s "+" dd "\\|" dd "\\.?" s month "\\)" s "+" "\\(" HH:MM "\\|" yyyy "\\)")) (western-comma (concat month s "+" dd "," s "+" yyyy)) ;; Japanese MS-Windows ls-lisp has one-digit months, and ;; omits the Kanji characters after month and day-of-month. ;; On Mac OS X 10.3, the date format in East Asian locales is ;; day-of-month digits followed by month digits. (mm "[ 0-1]?[0-9]") (east-asian (concat "\\(" mm l "?" s dd l "?" s "+" "\\|" dd s mm s "+" "\\)" "\\(" HH:MM "\\|" yyyy l "?" "\\)"))) ;; The "[0-9]" below requires the previous column to end in a digit. ;; This avoids recognizing `1 may 1997' as a date in the line: ;; -r--r--r-- 1 may 1997 1168 Oct 19 16:49 README ;; The "[BkKMGTPEZY]?" below supports "ls -alh" output. ;; For non-iso date formats, we add the ".*" in order to find ;; the last possible match. This avoids recognizing ;; `jservice 10 1024' as a date in the line: ;; drwxr-xr-x 3 jservice 10 1024 Jul 2 1997 esg-host ;; vc dired listings provide the state or blanks between file ;; permissions and date. The state is always surrounded by ;; parentheses: ;; -rw-r--r-- (modified) 2005-10-22 21:25 files.el ;; This is not supported yet. (purecopy (concat "\\([0-9][BkKMGTPEZY]? " iso "\\|.*[0-9][BkKMGTPEZY]? " "\\(" western "\\|" western-comma "\\|" east-asian "\\)" "\\) +"))) "Regular expression to match up to the file name in a directory listing. The default value is designed to recognize dates and times regardless of the language.") -- Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: rx.el sexp regexp syntax 2018-06-04 13:56 ` Stefan Monnier @ 2018-06-04 15:24 ` Drew Adams 2018-06-04 15:44 ` Pierre Neidhardt 0 siblings, 1 reply; 54+ messages in thread From: Drew Adams @ 2018-06-04 15:24 UTC (permalink / raw) To: Stefan Monnier, emacs-devel > For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has > fairly little importance: if written "raw" as above, it will be > indecipherable in any case. > > To make it readable, you need to add human-level explanations > e.g. by adding comments and naming sub-elements. Which is indeed what > is done in the source code:... Agreed. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-06-04 15:24 ` Drew Adams @ 2018-06-04 15:44 ` Pierre Neidhardt 0 siblings, 0 replies; 54+ messages in thread From: Pierre Neidhardt @ 2018-06-04 15:44 UTC (permalink / raw) To: Drew Adams; +Cc: Stefan Monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 622 bytes --] Drew Adams <drew.adams@oracle.com> writes: >> For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has >> fairly little importance: if written "raw" as above, it will be >> indecipherable in any case. >> >> To make it readable, you need to add human-level explanations >> e.g. by adding comments and naming sub-elements. Which is indeed what >> is done in the source code:... > > Agreed. The rx language seems to have space to add support for variables. Then rx would really shine for such expressions: it would make it very natural to explicitly name expressions and re-use them. -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-25 16:47 ` Pierre Neidhardt 2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen @ 2018-05-25 18:17 ` Alan Mackenzie 2018-05-25 20:35 ` Peter Neidhardt 2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen 1 sibling, 2 replies; 54+ messages in thread From: Alan Mackenzie @ 2018-05-25 18:17 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky Hello again, Pierre. On Fri, May 25, 2018 at 18:47:59 +0200, Pierre Neidhardt wrote: > Alan Mackenzie <acm@muc.de> writes: > >> rx.el is one of the best concepts I've discovered in a long time. > >> It's another instance of "Don't come up with a new (mini)language when > >> Lisp can do better": it's easier to learn, more flexible, easier to > >> write, much easier to read and as a consequence much more maintainable. > > Much easier than what? Than the putative mini-language that doesn't get > > written? > I meant that in my opinion rx is easier to write than regexps. That it > is not popular is the root of the question here. I think it will be easier only for beginners. > >> I think it's high time we moved away from traditional regexps and > >> embraced the concept of rx.el. I'm thinking of implementing it for > >> Guile. > > There's nothing stopping anybody from using rx.el. However, people have > > mostly _not_ used it. The "I think it's high time ...." suggests in > > some way forcing people to use it. Before mandating something like > > this, I think we should find out why it's not already in common use. > Sorry if you felt I was forcing, that wasn't my intention. I was > referring to the long period regexps have been around. > I thought the reason it's not already in common use had already been > discussed: it's barely referenced anywhere, it needs more advertising. > Correct me if this is wrong. It may be part of the explanation. But more salient, I think, is that hackers prefer powerful means of expression. A single character in a string regexp has the power of a sexp in the corresponding rx regexp. Paul Graham (at http://www.paulgraham.com) has had quite a bit to say about this in the (distant) past. Conciseness of expression is where it's at. > >> At the moment the rx.el implementation is built on top of Emacs regexps > >> which are implemented in C. I believe this does not use the power of > >> Lisp as much as it could. > > But would any alternative use the power of regexps? > Yes, rx.el is a drop-in replacement of regexps. What do you mean? I'm not sure, any more. Sorry. > > Emacs has a (moderately large) cache of regexps, so that building the > > automatons is done very rarely. Possibly just once each for each > > session of Emacs. > That's the whole point: if possible (see below), remove the requirements > for regexp cache management. I don't think that would be wise. Manipulating the cache is far faster than generating the automatons at each use. [ .... ] > >> The rx.el library/concept could alleviate this issue altogether: because > >> we express the automaton directly in Lisp, the parsing step is not > >> needed and thus the building cost could be tremendously reduced. > >> So the rx.el building steps > >> rx expression -> regexp string -> C regexp automaton > >> could boil down to simply > >> rx automaton > > I don't see what you're trying to save, here. At some stage, the regexp > > source, in whatever form, needs to be converted to an automaton. > Yes, that's what I meant with "rx automaton". My suggestion (not > necessarily for Emacs Lisp) is to remove the step that converts the rx > symbolic automaton to a string, and the conversion from a string to the > actual automaton. OK. That would save only a little, at automaton building time, which likely would happen just once in any Emacs session. > > Are you suggesting here building an interpreter in Lisp directly to > > execute rx expressions? > Yes, but maybe in Guile or some other Lisp. Don't know if it's feasible > in Emacs Lisp. > >> It would be interesting to compare the performance. This also means > >> that there would be no need for caching on behalf of the supporting > >> language. > > I will predict that an rx interpreter built in Lisp will be two orders > > of magnitude slower than the current regexp machine, where both the > > construction of an automaton, and the byte-code interpreter which runs > > it are written in C (and probably quite optimised C at that). > Obviously, and this is the prime reason why the author of rx.el > implemented it on top of C regexp. My point was that with a fast Lisp > (or a specifically designed C support), a Lisp automaton would be just > as fast: the Lisp code would directly map the equivalent C automaton. > Again, I have no clue if that's doable in Emacs Lisp. It might be. But it might be a lot of work for little benefit. > > I can't get excited about rx syntax, which I'm sure would be just as > > tedious, and possibly more difficult to read than a standard regexp. > Have you used rx? No. Neither have I used Cobol (much). > The whole point of the library is to increase readability, and it does > a great job at it in my opinion. You seem to want to increase the readability for beginners, for people who have laboriously to slog through an expression trying to make sense of each bit of it. I don't think experienced regexp users have difficulty with the syntax. I don't, for one. There was a time when people thought that ADD 1 TO A GIVING B was more readable than b = a + 1; , and generations of programmers suffered as a result. > > Analagously, as a musician, I read standard musical notation (with > > sets of five lines and dots) far more easily and fluently than I could > > any "simplified" system designed for beginners, which would be bloated > > by comparison. > rx.el is meant to be "simplified for beginners". You could also reverse > the analogy in saying that regexps are the "simplified version for > beginners"... The analogy does not map very well. > A better analogy would be the mapping between assembly and the > hexadecimal codes of CPU instructions: I don't think many people find > hexedecimal codes more explicit than assembly verbs and symbols > (although most assembly languages abuse abbreviations, but the > intention is there). Hexadecimal CPU codes aren't and aren't intended to be human-readable. String regular expressions are. > > Regular expressions can be difficult. I don't believe this difficulty > > lies, in the main, in the compact notation used to express them. Rather > > it lies in the concepts and the semantics of the regexp elements, and > > being able to express a "mental automaton" in regexp semantics. > The semantic between rx and regexp does not differ. It's purely > syntactical. Yes. > Let's consider some points: > - rx can be written over multiple lines and indented. This is a great > readibility booster for groups, which can be _grouped_ together with > linebreaks and indentation. rx MUST be written over several lines and indented. A string regexp, by contrast, usually fits onto a single line. > - rx does not require escaping any character with backslashes. This > is always a great source of confusion when switching from BRE to ERE, > between different interpreters and when storing regexp in Lisp strings > where backslashes must be escaped themselves for instance. It is an inconvenience, yes, but I think you're exaggerating its importance somewhat. In rx, literal characters have to be "escaped" by string quotes. This might be an irritation. > - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have > a trivial _English_ counterpart in rx: (respectively "word-start", > nothing, "line-start" _and_ "not"). The "English" counterpart used in rx is bulky and difficult to learn. Somehow, you've got to learn that it's "word-start" and not "word-beginning", that it's "not" and not "non", and so on. This is more difficult than just learning \< and ^. If your native language isn't English, it might be much more difficult. > - No more special-case symbols like "-" for ranges or "^" (negation when > first character in square brackets). Thus less cognitive burden. That remains in dispute. > - The "^" has a double-meaning in regexp: "line-start" and "not". Yes, it is context dependent. I don't think this causes confusion in practice. > The list goes on. Well, so far, on this list, two or three people have said they "like" rx.el. Nobody has said "I'm going to be using rx.el in my programs from now on". I don't think they will. We'll see. > -- > Pierre Neidhardt -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax (WAS: Off Topic) 2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie @ 2018-05-25 20:35 ` Peter Neidhardt 2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen 1 sibling, 0 replies; 54+ messages in thread From: Peter Neidhardt @ 2018-05-25 20:35 UTC (permalink / raw) To: Alan Mackenzie; +Cc: van, eliz, emacs-devel, rms, Noam Postavsky [-- Attachment #1: Type: text/plain, Size: 4868 bytes --] Alan Mackenzie <acm@muc.de> writes: > It may be part of the explanation. But more salient, I think, is that > hackers prefer powerful means of expression. A single character in a > string regexp has the power of a sexp in the corresponding rx regexp. > Paul Graham (at http://www.paulgraham.com) has had quite a bit to say > about this in the (distant) past. Conciseness of expression is where > it's at. I think you are referring to this article: http://paulgraham.com/ineq.html > Another easy test is the number of characters in a program, but this > is not very good either; some languages (Perl, for example) just use > shorter identifiers than others. > > I think a better measure of the size of a program would be the number > of elements, where an element is anything that would be a distinct > node if you drew a tree representing the source code. The name of a > variable or function is an element; an integer or a floating-point > number is an element; a segment of literal text is an element; an > element of a pattern, or a format directive, is an element; a new > block is an element. There are borderline cases (is -5 two elements or > one?) but I think most of them are the same for every language, so > they don't affect comparisons much. With this definition, rx and regexp have the same length (except for `eval'). "Conciseness in characters" is not what Paul Graham was referring to. Others might think differently, for instance those who prefer Perl to Lisp. In the end this is all what it boils down to: the "Unix" hacker culture vs. the Lisp one. The Unix tradition has long spread the use of acronyms and and shortcuts. Lisp on the other hand (espcecially Scheme) put a lot of emphasis on explicit full names. My opinion is that acronyms and shortcuts were mostly useful in the era of teletypes and limited terminals and shells. Now we have completion and fuzzy-search, for which explicit full names not only make sense but are necessary. (It's much more intuitive to search for "string compare" in Emacs Lisp than "str cmp" in C.) In the end, rx vs. regexp reflects the same mindset difference. >> Have you used rx? > > No. Neither have I used Cobol (much). Cobol is not very relevant, let's focus on the discussion here. Try using rx on some midly complex regular expressions, it could be insightful for this discussion. > You seem to want to increase the readability for beginners, for people > who have laboriously to slog through an expression trying to make sense > of each bit of it. I don't think experienced regexp users have > difficulty with the syntax. I don't, for one. > > There was a time when people thought that > > ADD 1 TO A GIVING B > > was more readable than > > b = a + 1; This is not what rx is about though. Your example does not show any change in structure. rx does. > Hexadecimal CPU codes aren't and aren't intended to be human-readable. > String regular expressions are. Well, "readable" is not black and white. If we can have "more readable", then even better. > rx MUST be written over several lines and indented. A string regexp, by > contrast, usually fits onto a single line. No, it does not have to be written over several lines. I don't know where you got that from. That said, is "fitting onto a single line" necessarily good? >> - rx does not require escaping any character with backslashes. This >> is always a great source of confusion when switching from BRE to ERE, >> between different interpreters and when storing regexp in Lisp strings >> where backslashes must be escaped themselves for instance. > >> - Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have >> a trivial _English_ counterpart in rx: (respectively "word-start", >> nothing, "line-start" _and_ "not"). > > The "English" counterpart used in rx is bulky and difficult to learn. > Somehow, you've got to learn that it's "word-start" and not > "word-beginning", Could argue the same about "*" vs. "%". But words that have a meaning in a natural language are easier to remember than arbitrary symbols. > that it's "not" and not "non", and so on. This is more > difficult than just learning \< and ^. If your native language isn't > English, it might be much more difficult. All programmers learn some basic English, say, "if then else". I don't think that symbolic languages are easier to learn than natural languages for human beings. > Well, so far, on this list, two or three people have said they "like" > rx.el. Nobody has said "I'm going to be using rx.el in my programs from > now on". Which is precisely why we are talking about it. To let people know, pique their curiosity, let them try and report feedback. "Not famous" does not equal bad quality. That's why we need to communicate to give good products a better chance. -- Peter Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie 2018-05-25 20:35 ` Peter Neidhardt @ 2018-05-25 21:01 ` Michael Heerdegen 2018-05-25 23:32 ` Peter Neidhardt 1 sibling, 1 reply; 54+ messages in thread From: Michael Heerdegen @ 2018-05-25 21:01 UTC (permalink / raw) To: Alan Mackenzie Cc: rms, Noam Postavsky, emacs-devel, Pierre Neidhardt, van, eliz Alan Mackenzie <acm@muc.de> writes: > A string regexp, by contrast, usually fits onto a single line. But regexps are tree-like structures. That's why rx, which uses sexps (i.e. trees), is the easier to read representation for complicated regexps than a one-dimensional string. Unless you have the ability to form a representation in your head. > The "English" counterpart used in rx is bulky and difficult to learn. > Somehow, you've got to learn that it's "word-start" and not > "word-beginning", that it's "not" and not "non", and so on. That's IMHO the main reason why people avoid using rx. I wonder if that aspect of rx could be improved (why not just use $ as synonym for bol etc.)? > This is more difficult than just learning \< and ^. If your native > language isn't English, it might be much more difficult. But also because you read the former more often. OTOH btw, I find the documentation for rx more condensed than that of the syntax of regexps. In summary, I think both representations have their justification of existence. Michael. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen @ 2018-05-25 23:32 ` Peter Neidhardt 0 siblings, 0 replies; 54+ messages in thread From: Peter Neidhardt @ 2018-05-25 23:32 UTC (permalink / raw) To: Michael Heerdegen Cc: rms, Noam Postavsky, emacs-devel, van, Alan Mackenzie, eliz [-- Attachment #1: Type: text/plain, Size: 1107 bytes --] Michael Heerdegen <michael_heerdegen@web.de> writes: >> A string regexp, by contrast, usually fits onto a single line. > > But regexps are tree-like structures. That's why rx, which uses sexps > (i.e. trees), is the easier to read representation for complicated > regexps than a one-dimensional string. Unless you have the ability to > form a representation in your head. I did not think of this at first but I think it's an excellent, fundamental point. >> The "English" counterpart used in rx is bulky and difficult to learn. >> Somehow, you've got to learn that it's "word-start" and not >> "word-beginning", that it's "not" and not "non", and so on. > > That's IMHO the main reason why people avoid using rx. I wonder if that > aspect of rx could be improved (why not just use $ as synonym for bol > etc.)? I guess you meant 'eol' ;) rx supports synonyms and I think in general it's not a good idea. That said, I really like that it uses meaningful words. So instead of ‘line-end’, ‘eol’ I'd leave it to only ‘line-end’ -- Peter Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 15:51 ` Alan Mackenzie 2018-05-25 16:47 ` Pierre Neidhardt @ 2018-05-27 16:56 ` Tom Tromey 2018-05-27 20:16 ` Alan Mackenzie 2018-05-27 20:23 ` Stefan Monnier 2 siblings, 1 reply; 54+ messages in thread From: Tom Tromey @ 2018-05-27 16:56 UTC (permalink / raw) To: Alan Mackenzie Cc: rms, Pierre Neidhardt, Noam Postavsky, emacs-devel, van, eliz >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes: >> Building the automaton is costly. In C, we build it once and save the >> result in a variable so that every regexp match does not rebuild the >> automaton each time. Alan> Emacs has a (moderately large) cache of regexps, so that building the Alan> automatons is done very rarely. Possibly just once each for each Alan> session of Emacs. I wonder about both of these statements. On the one hand, AFAICT the regex cache is 20 items. From search.c: #define REGEXP_CACHE_SIZE 20 That seems pretty small to me, given how prevalent regexps are in elisp. On the other hand, in the past when I have tried to profile Emacs, I haven't seen regexp compilation show up too much. IIRC I did see regexp matching and the GC. Maybe this just points out the efficacy of the cache -- maybe 20 items is plenty. Perhaps the regexp matcher could use some micro-optimizations, like the token-threading the bytecode interpreter does. Alan> Are you suggesting here building an interpreter in Lisp directly to Alan> execute rx expressions? It's interesting, IMO, to consider compiling rx (or regexps generally) to lisp bytecode. Perhaps with the JIT, it would boost performance in some cases. (It may be slower, but it's worthwhile to do the experiment.) For other work in this area see Stefan's lex-parse-re package. I think it includes a regexp matcher in elisp. Tom ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-27 16:56 ` Tom Tromey @ 2018-05-27 20:16 ` Alan Mackenzie 0 siblings, 0 replies; 54+ messages in thread From: Alan Mackenzie @ 2018-05-27 20:16 UTC (permalink / raw) To: Tom Tromey; +Cc: rms, Pierre Neidhardt, Noam Postavsky, emacs-devel, van, eliz Hello, Tom. On Sun, May 27, 2018 at 10:56:36 -0600, Tom Tromey wrote: > >>>>> "Alan" == Alan Mackenzie <acm@muc.de> writes: > >> Building the automaton is costly. In C, we build it once and save the > >> result in a variable so that every regexp match does not rebuild the > >> automaton each time. > Alan> Emacs has a (moderately large) cache of regexps, so that building the > Alan> automatons is done very rarely. Possibly just once each for each > Alan> session of Emacs. > I wonder about both of these statements. > On the one hand, AFAICT the regex cache is 20 items. From search.c: > #define REGEXP_CACHE_SIZE 20 > That seems pretty small to me, given how prevalent regexps are in elisp. Hmm. I must have misremembered. I thought the cache size was 60, for some reason. Now that RAM is measured in gigabytes, we could probably increase that 20 (if there's any need). > On the other hand, in the past when I have tried to profile Emacs, I > haven't seen regexp compilation show up too much. IIRC I did see regexp > matching and the GC. Maybe this just points out the efficacy of the > cache -- maybe 20 items is plenty. Maybe. I just don't know. > Perhaps the regexp matcher could use some micro-optimizations, like the > token-threading the bytecode interpreter does. > Alan> Are you suggesting here building an interpreter in Lisp directly to > Alan> execute rx expressions? > It's interesting, IMO, to consider compiling rx (or regexps generally) > to lisp bytecode. Perhaps with the JIT, it would boost performance in > some cases. (It may be slower, but it's worthwhile to do the > experiment.) > For other work in this area see Stefan's lex-parse-re package. I think > it includes a regexp matcher in elisp. I'll need to have a look at that. > Tom -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 15:51 ` Alan Mackenzie 2018-05-25 16:47 ` Pierre Neidhardt 2018-05-27 16:56 ` Tom Tromey @ 2018-05-27 20:23 ` Stefan Monnier 2 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-05-27 20:23 UTC (permalink / raw) To: emacs-devel >> It would be interesting to compare the performance. This also means >> that there would be no need for caching on behalf of the supporting >> language. > > I will predict that an rx interpreter built in Lisp will be two orders > of magnitude slower than the current regexp machine, where both the > construction of an automaton, and the byte-code interpreter which runs > it are written in C (and probably quite optimised C at that). The lex.el package in GNU ELPA has a matcher written in Elisp. Its performance is actually pretty good compared to Emacs's builtin regexp engine. But that's because lex.el builds a DFA, so the slow evaluation of Elisp is compensated by a more efficient algorithm. And of course, building the DFA takes a lot more time than the regexp-compilation of regexp.c (both because of the language used and the algorithm). Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-25 8:52 ` Pierre Neidhardt 2018-05-25 15:51 ` Alan Mackenzie @ 2018-05-27 20:16 ` Stefan Monnier 2018-05-28 16:36 ` Pierre Neidhardt 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-05-27 20:16 UTC (permalink / raw) To: emacs-devel > rx.el is one of the best concepts I've discovered in a long time. > It's another instance of "Don't come up with a new (mini)language when > Lisp can do better": it's easier to learn, more flexible, easier to > write, much easier to read and as a consequence much more maintainable. FWIW, I find it's cumbersome in RX to define regexps piecewise. E.g. with strings I can do things like: (let* ((word-re "\\(?:\\sw\\|s_\\)+") (spc-re "[ \t\n]*") (re1 (concat spc-re "\\(" word-re "\\)" spc-re)) (re2 (concat spc-re "\\(" word-re "\\)(" word-re)))) but do the same with RX you need something like: (let* ((word-re (rx ...)) (spc-re (rx ...)) (re1 (rx-to-string `(... ,spc-re ... ,word-re ...))) (re2 (rx-to-string `(... ,spc-re ... ,word-re ...)))) I think `rx` would benefit from allowing to refer to variables. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-27 20:16 ` Stefan Monnier @ 2018-05-28 16:36 ` Pierre Neidhardt 2018-05-28 17:04 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Pierre Neidhardt @ 2018-05-28 16:36 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 805 bytes --] Stefan Monnier <monnier@iro.umontreal.ca> writes: > FWIW, I find it's cumbersome in RX to define regexps piecewise. > E.g. with strings I can do things like: > > (let* ((word-re "\\(?:\\sw\\|s_\\)+") > (spc-re "[ \t\n]*") > (re1 (concat spc-re "\\(" word-re "\\)" spc-re)) > (re2 (concat spc-re "\\(" word-re "\\)(" word-re)))) > > but do the same with RX you need something like: > > (let* ((word-re (rx ...)) > (spc-re (rx ...)) > (re1 (rx-to-string `(... ,spc-re ... ,word-re ...))) > (re2 (rx-to-string `(... ,spc-re ... ,word-re ...)))) > > I think `rx` would benefit from allowing to refer to variables. Not sure what you are trying to do, but doesn't it work with (rx (... (eval VARIABLE) ...)) ? -- Pierre Neidhardt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: rx.el sexp regexp syntax 2018-05-28 16:36 ` Pierre Neidhardt @ 2018-05-28 17:04 ` Stefan Monnier 0 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-05-28 17:04 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: emacs-devel > Not sure what you are trying to do, but doesn't it work with > > (rx (... (eval VARIABLE) ...)) This evaluates VARIABLE during the macro-expansion, so it only works if VARIABLE exists during the macroexpansion. So it works if you can bind your variables with the new `let-when-compile`, but it won't work with lexically-scoped vars. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2018-06-04 15:44 UTC | newest] Thread overview: 54+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-24 10:47 rx.el sexp regexp syntax (WAS: Off Topic) Noam Postavsky 2018-05-24 10:58 ` Van L 2018-05-25 2:57 ` Richard Stallman 2018-05-25 8:52 ` Pierre Neidhardt 2018-05-25 15:51 ` Alan Mackenzie 2018-05-25 16:47 ` Pierre Neidhardt 2018-05-25 18:01 ` rx.el sexp regexp syntax Eric Abrahamsen 2018-05-25 18:12 ` Pierre Neidhardt 2018-05-25 18:56 ` Eric Abrahamsen 2018-05-25 21:42 ` Clément Pit-Claudel 2018-05-25 21:51 ` Eric Abrahamsen 2018-05-25 22:27 ` Michael Heerdegen 2018-05-25 22:44 ` Eric Abrahamsen 2018-05-27 20:27 ` Stefan Monnier 2018-05-28 16:37 ` Pierre Neidhardt 2018-05-28 17:15 ` Stefan Monnier 2018-05-29 3:10 ` Richard Stallman 2018-05-29 7:28 ` Robert Pluim 2018-05-29 8:27 ` Philipp Stephani 2018-05-30 3:24 ` Richard Stallman 2018-05-30 7:25 ` Robert Pluim 2018-05-31 3:53 ` Richard Stallman 2018-05-31 8:57 ` Robert Pluim 2018-05-31 4:13 ` Clément Pit-Claudel 2018-05-31 14:19 ` Stefan Monnier 2018-05-31 15:43 ` Drew Adams 2018-05-31 16:12 ` João Távora 2018-05-31 16:18 ` Robert Pluim 2018-05-31 16:48 ` Basil L. Contovounesios 2018-05-31 17:02 ` Basil L. Contovounesios 2018-05-31 18:40 ` João Távora 2018-06-02 19:33 ` Eric Abrahamsen 2018-06-03 3:49 ` Stefan Monnier 2018-06-03 4:59 ` Eric Abrahamsen 2018-06-03 14:51 ` Helmut Eller 2018-06-03 15:15 ` Eric Abrahamsen 2018-06-03 15:53 ` Helmut Eller 2018-06-03 16:40 ` Eric Abrahamsen 2018-06-03 19:57 ` Drew Adams 2018-06-03 21:15 ` Eric Abrahamsen 2018-06-03 23:23 ` Drew Adams 2018-06-04 13:56 ` Stefan Monnier 2018-06-04 15:24 ` Drew Adams 2018-06-04 15:44 ` Pierre Neidhardt 2018-05-25 18:17 ` rx.el sexp regexp syntax (WAS: Off Topic) Alan Mackenzie 2018-05-25 20:35 ` Peter Neidhardt 2018-05-25 21:01 ` rx.el sexp regexp syntax Michael Heerdegen 2018-05-25 23:32 ` Peter Neidhardt 2018-05-27 16:56 ` Tom Tromey 2018-05-27 20:16 ` Alan Mackenzie 2018-05-27 20:23 ` Stefan Monnier 2018-05-27 20:16 ` Stefan Monnier 2018-05-28 16:36 ` Pierre Neidhardt 2018-05-28 17:04 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).