* Ugly regexps @ 2021-03-03 0:32 Stefan Monnier 2021-03-03 1:32 ` Stefan Kangas ` (4 more replies) 0 siblings, 5 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 0:32 UTC (permalink / raw) To: emacs-devel BTW, while this theme of ugly regexps keeps coming up, how 'bout we add a new function `ere` which converts between the ERE style of regexps where grouping parens are not escaped (and plain chars meant to match an actual paren need to be escaped instead) to ELisp-style regexps? So you can do (string-match (ere "\\(def(macro|un|subst) .{1,}")) instead of (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") ? Stefan (defun ere (re) "Convert an ERE-style regexp RE to an Emacs-style regexp." (let ((pos 0) (last 0) (chunks '())) (while (string-match "\\\\.\\|[{}()|]" re pos) (let ((beg (match-beginning 0)) (end (match-end 0))) (when (subregexp-context-p re beg) (cond ;; A normal paren: add a backslash. ((= (1+ beg) end) (push (substring re last beg) chunks) (setq last beg) (push "\\" chunks)) ;; A grouping paren: skip the backslash. ((memq (aref re (1+ beg)) '(?\( ?\) ?\{ ?\} ?\|)) (push (substring re last beg) chunks) (setq last (1+ beg))))) (setq pos end))) (mapconcat #'identity (nreverse (cons (substring re last) chunks)) ""))) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 0:32 Ugly regexps Stefan Monnier @ 2021-03-03 1:32 ` Stefan Kangas 2021-03-03 2:08 ` Stefan Kangas 2021-03-03 20:46 ` Alan Mackenzie 2021-03-03 6:00 ` Eli Zaretskii ` (3 subsequent siblings) 4 siblings, 2 replies; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 1:32 UTC (permalink / raw) To: Stefan Monnier, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > BTW, while this theme of ugly regexps keeps coming up, how 'bout we add > a new function `ere` which converts between the ERE style of regexps > where grouping parens are not escaped (and plain chars meant to match > an actual paren need to be escaped instead) to ELisp-style regexps? > > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > instead of > > (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > ? Sounds good to me. I was going to ask why not just do PCRE, but then I realized I'm not exactly sure what the syntactical differences are. (We obviously lack some features.) AFAIR, Emacs regexps don't exactly match GNU grep, egrep, Perl, or anything else really. So I cranked out my dusty old copy of Mastering Regular Expressions and found this overview: grep egrep Emacs Perl \? \+ \| ? + | ? + \| ? + | \( \) ( ) \( \) ( ) \< \> \< \> \b \B \b \B (Excerpt from Mastering Regular Expressions: Table 3-3: A (Very) Superficial Look at the Flavor of a Few Common Tools) This shows the differences that most commonly bites you, in my experience. While we're at it, has it ever been discussed to add support for the pcre library side-by-side with our homegrown regexp.c? It would give us sane (standard) syntax and some useful features "for free" (e.g. lookaround). I didn't test but a priori I would also assume the code to be much more performant than anything we could ever cook up ourselves. It is used by several high-profile projects. I would imagine we'd introduce entirely new function names for it. Perhaps even a completely new and improved API like Lars suggested a while back. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 1:32 ` Stefan Kangas @ 2021-03-03 2:08 ` Stefan Kangas 2021-03-03 6:19 ` Eli Zaretskii 2021-03-03 20:46 ` Alan Mackenzie 1 sibling, 1 reply; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 2:08 UTC (permalink / raw) To: Stefan Monnier, emacs-devel Stefan Kangas <stefankangas@gmail.com> writes: > While we're at it, has it ever been discussed to add support for the > pcre library side-by-side with our homegrown regexp.c? It would give us > sane (standard) syntax and some useful features "for free" > (e.g. lookaround). I didn't test but a priori I would also assume the > code to be much more performant than anything we could ever cook up > ourselves. It is used by several high-profile projects. Of course this had already been discussed. I found this interesting thread from 2012: https://lists.gnu.org/archive/html/emacs-devel/2012-01/msg00736.html Long story short, it may be a non-trivial job. In particular supporting the \s and \c operators seems like a hard nut to crack. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 2:08 ` Stefan Kangas @ 2021-03-03 6:19 ` Eli Zaretskii 0 siblings, 0 replies; 42+ messages in thread From: Eli Zaretskii @ 2021-03-03 6:19 UTC (permalink / raw) To: Stefan Kangas; +Cc: monnier, emacs-devel > From: Stefan Kangas <stefankangas@gmail.com> > Date: Tue, 2 Mar 2021 20:08:53 -0600 > > https://lists.gnu.org/archive/html/emacs-devel/2012-01/msg00736.html > > Long story short, it may be a non-trivial job. In particular supporting > the \s and \c operators seems like a hard nut to crack. Yes. But I think supporting non-ASCII characters is also not easy, and the main reason we still don't use Gnulib's regexp code in Emacs. Another worthy goal, if we are talking about this, is to support more of the Unicode Regular Expressions, at least at the functional level, if not syntactically. See https://unicode.org/reports/tr18/ for the details. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 1:32 ` Stefan Kangas 2021-03-03 2:08 ` Stefan Kangas @ 2021-03-03 20:46 ` Alan Mackenzie 2021-03-04 18:35 ` Stefan Kangas 1 sibling, 1 reply; 42+ messages in thread From: Alan Mackenzie @ 2021-03-03 20:46 UTC (permalink / raw) To: Stefan Kangas; +Cc: Stefan Monnier, emacs-devel Hello, Stefan. On Tue, Mar 02, 2021 at 19:32:23 -0600, Stefan Kangas wrote: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > BTW, while this theme of ugly regexps keeps coming up, how 'bout we add > > a new function `ere` which converts between the ERE style of regexps > > where grouping parens are not escaped (and plain chars meant to match > > an actual paren need to be escaped instead) to ELisp-style regexps? > > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > instead of > > (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > ? > Sounds good to me. > I was going to ask why not just do PCRE, but then I realized I'm not > exactly sure what the syntactical differences are. (We obviously lack > some features.) AFAIR, Emacs regexps don't exactly match GNU grep, > egrep, Perl, or anything else really. These things don't exactly match eachother, do they? > So I cranked out my dusty old copy of Mastering Regular Expressions and > found this overview: > grep egrep Emacs Perl > \? \+ \| ? + | ? + \| ? + | > \( \) ( ) \( \) ( ) > \< \> \< \> \b \B \b \B > (Excerpt from Mastering Regular Expressions: Table 3-3: A (Very) > Superficial Look at the Flavor of a Few Common Tools) > This shows the differences that most commonly bites you, in my > experience. The "biting" effect is surely small. I have little difficulty using grep, egrep and awk, all of whose regexp notations differ somewhat. > While we're at it, has it ever been discussed to add support for the > pcre library side-by-side with our homegrown regexp.c? It would give us > sane (standard) syntax and some useful features "for free" > (e.g. lookaround). I didn't test but a priori I would also assume the > code to be much more performant than anything we could ever cook up > ourselves. It is used by several high-profile projects. > I would imagine we'd introduce entirely new function names for it. > Perhaps even a completely new and improved API like Lars suggested a > while back. No, No, No, No! All these tools have one overarching thing in common, and that is they each have a single variety of regexp. That is, with the exception of Emacs, which also has a radically different source form, namely rx. Somebody pointed out the relatively small use of rx, and the same might happen for a new regexp notation. Or it might not, and we'd have two different notations side by side. This is surely something to avoid. There's not a lot wrong with Emacs's regexp notation. It works, works well, and we're all familiar with it. And there are many thousands of lines of lisp containing regexps, all of which are in the same variety. With the exception of those written with rx. To introduce a second (string) variety alongside Emacs regexps would cause confusion, and suck up effort better used for productive work. Just how is one meant to search for a regexp using grep, when one doesn't even know whether it follows Emacs conventions or some foreign set of conventions? -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 20:46 ` Alan Mackenzie @ 2021-03-04 18:35 ` Stefan Kangas 0 siblings, 0 replies; 42+ messages in thread From: Stefan Kangas @ 2021-03-04 18:35 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel Alan Mackenzie <acm@muc.de> writes: >> I was going to ask why not just do PCRE, but then I realized I'm not >> exactly sure what the syntactical differences are. (We obviously lack >> some features.) AFAIR, Emacs regexps don't exactly match GNU grep, >> egrep, Perl, or anything else really. > > These things don't exactly match eachother, do they? There is also a POSIX standard for BRE and ERE that we don't follow. My point is that we could match one of the above, even if they don't match each other. > The "biting" effect is surely small. I have little difficulty using > grep, egrep and awk, all of whose regexp notations differ somewhat. I am happy to hear that this works well for you. Two decades after writing my first regexp, I still tend to forget sometimes (oh wait is it \+ in sed again?). Then I have to look these stupid details up for the Nth time. > There's not a lot wrong with Emacs's regexp notation. It works, works > well, and we're all familiar with it. Of course it gets the job done in the sense that you can write a regexp that will match what you want. But it is overly verbose in common cases, making regexps harder than they need to be to read, understand and modify. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 0:32 Ugly regexps Stefan Monnier 2021-03-03 1:32 ` Stefan Kangas @ 2021-03-03 6:00 ` Eli Zaretskii 2021-03-03 15:46 ` Stefan Monnier 2021-03-03 7:09 ` Helmut Eller ` (2 subsequent siblings) 4 siblings, 1 reply; 42+ messages in thread From: Eli Zaretskii @ 2021-03-03 6:00 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Tue, 02 Mar 2021 19:32:20 -0500 > > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > instead of > > (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") Why not use 'rx' in those cases? IMO it makes the regexp even more easy to write and read. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 6:00 ` Eli Zaretskii @ 2021-03-03 15:46 ` Stefan Monnier 2021-03-03 16:30 ` Eli Zaretskii 0 siblings, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 15:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> So you can do >> >> (string-match (ere "\\(def(macro|un|subst) .{1,}")) >> >> instead of >> >> (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > Why not use 'rx' in those cases? Not sure what you mean by "those cases". I'm thinking this `ere` would be useful for the cases where the author finds `rx` unpalatable for some reason. > IMO it makes the regexp even more easy to write and read. I believe this depends on taste and circumstances. Experience shows that while some packages use `rx` extensively, most ELisp code doesn't. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 15:46 ` Stefan Monnier @ 2021-03-03 16:30 ` Eli Zaretskii 2021-03-03 17:44 ` Stefan Monnier 0 siblings, 1 reply; 42+ messages in thread From: Eli Zaretskii @ 2021-03-03 16:30 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: emacs-devel@gnu.org > Date: Wed, 03 Mar 2021 10:46:20 -0500 > > >> (string-match (ere "\\(def(macro|un|subst) .{1,}")) > >> > >> instead of > >> > >> (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > > > Why not use 'rx' in those cases? > > Not sure what you mean by "those cases". I'm thinking this `ere` would > be useful for the cases where the author finds `rx` unpalatable for > some reason. Why would someone find rx unpalatable? > > IMO it makes the regexp even more easy to write and read. > > I believe this depends on taste and circumstances. Experience shows > that while some packages use `rx` extensively, most ELisp code doesn't. If this is about personal preferences and tastes, then I think having 3 different flavors of regexps in our sources due to personal preferences is not necessarily a good idea. We have coding conventions for a reason. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 16:30 ` Eli Zaretskii @ 2021-03-03 17:44 ` Stefan Monnier 2021-03-03 18:46 ` Stefan Kangas 0 siblings, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 17:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> Not sure what you mean by "those cases". I'm thinking this `ere` would >> be useful for the cases where the author finds `rx` unpalatable for >> some reason. > Why would someone find rx unpalatable? Maybe just because of habit, but I think the main downside of `rx` is that it's very verbose, which ends up hiding the "text". For example in (rx "(def" (or "macro" "un" "subst"))) I find the `or` to get a bit in the way of my visual cortex recognizing the "defmacro" pattern above. > If this is about personal preferences and tastes, then I think having > 3 different flavors of regexps in our sources due to personal > preferences is not necessarily a good idea. Yes, it's the downside. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 17:44 ` Stefan Monnier @ 2021-03-03 18:46 ` Stefan Kangas 2021-03-03 19:21 ` Eli Zaretskii 2021-03-03 19:32 ` [External] : " Drew Adams 0 siblings, 2 replies; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 18:46 UTC (permalink / raw) To: Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Why would someone find rx unpalatable? > > Maybe just because of habit, but I think the main downside of `rx` is > that it's very verbose, which ends up hiding the "text". For example in > > (rx "(def" (or "macro" "un" "subst"))) > > I find the `or` to get a bit in the way of my visual cortex recognizing > the "defmacro" pattern above. It is also just another thing to learn. If you're just doing some basic ELisp functions for your personal editing you might not want to spend time parsing the docstring of `rx' just to say "^(foo|bar)". This applies also if you're just writing some small package that just needs a regexp or two. Also, `rx' does not translate to most other languages. So if you are learning regexps for the first time or are still struggling with them, you will IMO probably be better off staying away from `rx' for a while. Note also that you can't use `rx' syntax in `query-replace-regexp'. I am not surprised that we don't see `rx' used more, even if I would certainly wish that wasn't the case. Especially in our own sources. (It's too bad that we don't use it in our preloaded code, for example.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 18:46 ` Stefan Kangas @ 2021-03-03 19:21 ` Eli Zaretskii 2021-03-03 19:50 ` Stefan Kangas ` (2 more replies) 2021-03-03 19:32 ` [External] : " Drew Adams 1 sibling, 3 replies; 42+ messages in thread From: Eli Zaretskii @ 2021-03-03 19:21 UTC (permalink / raw) To: Stefan Kangas; +Cc: monnier, emacs-devel > From: Stefan Kangas <stefankangas@gmail.com> > Date: Wed, 3 Mar 2021 12:46:47 -0600 > Cc: emacs-devel@gnu.org > > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > >> Why would someone find rx unpalatable? > > > > Maybe just because of habit, but I think the main downside of `rx` is > > that it's very verbose, which ends up hiding the "text". For example in > > > > (rx "(def" (or "macro" "un" "subst"))) > > > > I find the `or` to get a bit in the way of my visual cortex recognizing > > the "defmacro" pattern above. > > It is also just another thing to learn. And ERE isn't? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 19:21 ` Eli Zaretskii @ 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 20:16 ` Stefan Kangas 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 19:58 ` Dmitry Gutov 2 siblings, 1 reply; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 19:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> It is also just another thing to learn. > > And ERE isn't? Exactly. I mean, it ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 19:50 ` Stefan Kangas @ 2021-03-03 20:16 ` Stefan Kangas 0 siblings, 0 replies; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 20:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Stefan Kangas <stefankangas@gmail.com> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >>> It is also just another thing to learn. >> >> And ERE isn't? > > Exactly. I mean, it [ My reply was accidentally sent before it was done, sorry: ] Exactly. It is what is used in most other programming languages. Whereas `rx' is specific for ELisp. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 19:21 ` Eli Zaretskii 2021-03-03 19:50 ` Stefan Kangas @ 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 19:58 ` Dmitry Gutov 2 siblings, 0 replies; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 19:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> It is also just another thing to learn. > > And ERE isn't? Exactly. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 19:21 ` Eli Zaretskii 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 19:50 ` Stefan Kangas @ 2021-03-03 19:58 ` Dmitry Gutov 2021-03-03 20:07 ` [External] : " Drew Adams 2021-03-04 5:47 ` Eli Zaretskii 2 siblings, 2 replies; 42+ messages in thread From: Dmitry Gutov @ 2021-03-03 19:58 UTC (permalink / raw) To: Eli Zaretskii, Stefan Kangas; +Cc: monnier, emacs-devel On 03.03.2021 21:21, Eli Zaretskii wrote: > And ERE isn't? To be fair, extended regular expressions are the regular expressions flavor most commonly used in the contemporary world, recent/popular programming languages, etc. So for a lot of people this won't be +1 thing to learn. ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [External] : Re: Ugly regexps 2021-03-03 19:58 ` Dmitry Gutov @ 2021-03-03 20:07 ` Drew Adams 2021-03-03 20:31 ` Stefan Kangas 2021-03-03 20:32 ` Stefan Monnier 2021-03-04 5:47 ` Eli Zaretskii 1 sibling, 2 replies; 42+ messages in thread From: Drew Adams @ 2021-03-03 20:07 UTC (permalink / raw) To: Dmitry Gutov, Eli Zaretskii, Stefan Kangas Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > And ERE isn't? > > To be fair, extended regular expressions are the regular expressions > flavor most commonly used in the contemporary world, recent/popular > programming languages, etc. > > So for a lot of people this won't be +1 thing to learn. See my previous message. It _will_ be a +1 to learn, in the context of Emacs, if people have to also learn the Elisp syntax anyway, for interactive use. Any way you look at it, I think, if the Elisp regexp syntax is what is used interactively (modulo extra backslashing), then adding another syntax means that using that other syntax is a +1 - extra learning. Not that extra learning is necessarily bad... ;-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [External] : Re: Ugly regexps 2021-03-03 20:07 ` [External] : " Drew Adams @ 2021-03-03 20:31 ` Stefan Kangas 2021-03-03 22:17 ` Drew Adams 2021-03-03 20:32 ` Stefan Monnier 1 sibling, 1 reply; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 20:31 UTC (permalink / raw) To: Drew Adams, Dmitry Gutov, Eli Zaretskii Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org Drew Adams <drew.adams@oracle.com> writes: >> To be fair, extended regular expressions are the regular expressions >> flavor most commonly used in the contemporary world, recent/popular >> programming languages, etc. >> >> So for a lot of people this won't be +1 thing to learn. > > See my previous message. It _will_ be a +1 to learn, > in the context of Emacs, if people have to also learn > the Elisp syntax anyway, for interactive use. We could add an option to prefer ERE in interactive use. ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [External] : Re: Ugly regexps 2021-03-03 20:31 ` Stefan Kangas @ 2021-03-03 22:17 ` Drew Adams 2021-03-03 22:32 ` Stefan Monnier 0 siblings, 1 reply; 42+ messages in thread From: Drew Adams @ 2021-03-03 22:17 UTC (permalink / raw) To: Stefan Kangas, Dmitry Gutov, Eli Zaretskii Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > >> for a lot of people this won't be +1 thing to learn. > > > > See my previous message. It _will_ be a +1 to learn, > > in the context of Emacs, if people have to also learn > > the Elisp syntax anyway, for interactive use. sk> We could add an option to prefer ERE in interactive use. Sure, if it were supported generally. sm> WRT interactive regexp syntax, I'm still hoping someone will write sm> a proper package that lets the user choose which regexp syntax to use. sm> Currently `re-builder` has such a thing, but it should really apply sm> "across the board", i.e. in `read-regexp`, in Isearch, and anywhere sm> else we read regexps from the keyboard. Sure, worth hoping. And then have an option to express one's preference. Or several options? Depending on what's implemented, maybe someone will prefer one thing for, say, Isearch query-replace*, and completion, and another thing for some other interactive uses? [But since Emacs (not so wisely, IMO) forbids commands from binding options, code couldn't just bind such a variable when calling `read-regexp'. `read-regexp' could accept another arg for this, of course, but then that too could, in a sense, override a user preference.] > This is largely orthogonal to what we use in ELisp code. In one sense, sure. And especially if we're now talking only about different regexp dialects, and not also about alternative ways, such as RX, to enter/create a regexp. But as I mentioned, I don't think it's orthogonal, in practice, to what people actually use when coding Elisp. I think they often code based on what they're used to using, which, at least for now, is mostly the interactive syntax (modulo backslashing, etc. for Elisp). Use of something like RX seems to be less common, so far. And then there's the question, interactively, of choosing one or another. Often you use a very simple regexp for searching or completion matching - even just a substring (no special chars). Less often you need a more complex regexp. Will someone want to use something like RX for simple patterns too? Would going through some kind of interactive RX dialog be cumbersome for something simple? We'd want to make sure that any dialog to be defined keeps the simple simple. `(rx "abc")' is simple enough - it should be possible to type just `abc', like we do now. (At least among regexp dialects, as opposed to something like RX, use of simple-vs-complex patterns shouldn't make any difference, from one dialect to another.) BTW, I see this in (emacs) `Rx Notation': The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in most interactive situations where a regexp is requested, such as when running ‘query-replace-regexp’ or in variable customization. I guess that's just saying that an RX-based dialog isn't available yet, not that it's inconceivable or couldn't be useful. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [External] : Re: Ugly regexps 2021-03-03 22:17 ` Drew Adams @ 2021-03-03 22:32 ` Stefan Monnier 0 siblings, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 22:32 UTC (permalink / raw) To: Drew Adams Cc: Eli Zaretskii, emacs-devel@gnu.org, Stefan Kangas, Dmitry Gutov > I guess that's just saying that an RX-based dialog isn't > available yet, not that it's inconceivable or couldn't > be useful. Indeed if we introduce some way to choose which dialect to use for interactive regexps, I'd fully expect the RX syntax to be one of the options. I could also imagine one of the options to be DWIMish (i.e. use RX if the regexp starts and ends with a paren). Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [External] : Re: Ugly regexps 2021-03-03 20:07 ` [External] : " Drew Adams 2021-03-03 20:31 ` Stefan Kangas @ 2021-03-03 20:32 ` Stefan Monnier 1 sibling, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 20:32 UTC (permalink / raw) To: Drew Adams Cc: Eli Zaretskii, emacs-devel@gnu.org, Stefan Kangas, Dmitry Gutov WRT interactive regexp syntax, I'm still hoping someone will write a proper package that lets the user choose which regexp syntax to use. Currently `re-builder` has such a thing, but it should really apply "across the board", i.e. in `read-regexp`, in Isearch, and anywhere else we read regexps from the keyboard. This is largely orthogonal to what we use in ELisp code. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 19:58 ` Dmitry Gutov 2021-03-03 20:07 ` [External] : " Drew Adams @ 2021-03-04 5:47 ` Eli Zaretskii 2021-03-04 10:49 ` Lars Ingebrigtsen 2021-03-04 14:25 ` Dmitry Gutov 1 sibling, 2 replies; 42+ messages in thread From: Eli Zaretskii @ 2021-03-04 5:47 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel, stefankangas, monnier > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 3 Mar 2021 21:58:17 +0200 > Content-Language: en-US > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > On 03.03.2021 21:21, Eli Zaretskii wrote: > > And ERE isn't? > > To be fair, extended regular expressions are the regular expressions > flavor most commonly used in the contemporary world, recent/popular > programming languages, etc. > > So for a lot of people this won't be +1 thing to learn. So we are now going to cater to users of other programs more than we cater to Emacs users who are used to the Emacs RE syntaxes? How does that make sense? Do other programs prefer the Emacs RE syntax to their own? We have rx for many years, and just recently enhanced it significantly. I fail to see how it would make sense to introduce yet another RE syntax into Emacs, with all the overhead that brings with it. Maybe it could make sense as an ELPA add-on, but not in core. More generally, I wish we stopped investing so much of our time and energy in cleanups and other support tasks, and more to add significant new applications and editing features. That would make more users happier, I think. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 5:47 ` Eli Zaretskii @ 2021-03-04 10:49 ` Lars Ingebrigtsen 2021-03-04 11:25 ` Mattias Engdegård ` (2 more replies) 2021-03-04 14:25 ` Dmitry Gutov 1 sibling, 3 replies; 42+ messages in thread From: Lars Ingebrigtsen @ 2021-03-04 10:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, stefankangas, monnier, Dmitry Gutov Eli Zaretskii <eliz@gnu.org> writes: > More generally, I wish we stopped investing so much of our time and > energy in cleanups and other support tasks, and more to add > significant new applications and editing features. That would make > more users happier, I think. I think users will be very happy to be able to use the regexp syntax they know (instead of the special Emacs regexp variants) in their .emacs files. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 10:49 ` Lars Ingebrigtsen @ 2021-03-04 11:25 ` Mattias Engdegård 2021-03-04 11:28 ` Alan Mackenzie 2021-03-04 14:11 ` Eli Zaretskii 2 siblings, 0 replies; 42+ messages in thread From: Mattias Engdegård @ 2021-03-04 11:25 UTC (permalink / raw) To: Lars Ingebrigtsen Cc: Eli Zaretskii, Dmitry Gutov, stefankangas, monnier, emacs-devel 4 mars 2021 kl. 11.49 skrev Lars Ingebrigtsen <larsi@gnus.org>: > I think users will be very happy to be able to use the regexp syntax > they know (instead of the special Emacs regexp variants) in their .emacs > files. Unfortunately it isn't the regexp syntax they know; it's a new variant, with subtle differences from what they may think it is. False friends include [] \d \s \w . ^ $ and so on: constructs that look like something they know well from other software but that have slightly (or completely) different meaning in Emacs. This doesn't mean that `ere` wouldn't be a useful addition, but that it should not be presented as "Regexps just like in Python (etc)! No Emacs quirks to worry about!" but exactly for what it is: a way to toggle the requirement for backslash-escaping (){}|, no more and no less. Someone who wants to write or understand an `ere` regexp has to read the Emacs regexp docs, then the `ere` documentation, and then mentally combine the two. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 10:49 ` Lars Ingebrigtsen 2021-03-04 11:25 ` Mattias Engdegård @ 2021-03-04 11:28 ` Alan Mackenzie 2021-03-04 14:11 ` Eli Zaretskii 2 siblings, 0 replies; 42+ messages in thread From: Alan Mackenzie @ 2021-03-04 11:28 UTC (permalink / raw) To: Lars Ingebrigtsen Cc: Eli Zaretskii, Dmitry Gutov, stefankangas, monnier, emacs-devel Hello, Lars. On Thu, Mar 04, 2021 at 11:49:48 +0100, Lars Ingebrigtsen wrote: > Eli Zaretskii <eliz@gnu.org> writes: > > More generally, I wish we stopped investing so much of our time and > > energy in cleanups and other support tasks, and more to add > > significant new applications and editing features. That would make > > more users happier, I think. > I think users will be very happy to be able to use the regexp syntax > they know (instead of the special Emacs regexp variants) in their .emacs > files. Emacs users know the Emacs regexp syntax. They may also be aware of other variants. There's nothing "special" about Emacs regexps - their makeup is simply one variant amongst several. I very much doubt users will be "very happy" about having to choose between two regexp syntaxes. I expect they are "happy" that each program they use has just one regexp syntax, if they even think about that at all. Introducing an alternative regexp syntax would cause bloat (which Emacs isn't short of), and impose extra work on Emacs hackers everywhere, who at the very least would need to put something like (let (alternative-regexp-syntax) ....) around all their entry points. I don't want that extra hassle, that extra bug source. In short, this proposal is a proposal to increase complexity, increase the workload on all hackers, and a source of future bugs. What we've got already works well enough. Why change it? > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 10:49 ` Lars Ingebrigtsen 2021-03-04 11:25 ` Mattias Engdegård 2021-03-04 11:28 ` Alan Mackenzie @ 2021-03-04 14:11 ` Eli Zaretskii 2 siblings, 0 replies; 42+ messages in thread From: Eli Zaretskii @ 2021-03-04 14:11 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: dgutov, stefankangas, monnier, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Date: Thu, 04 Mar 2021 11:49:48 +0100 > Cc: emacs-devel@gnu.org, stefankangas@gmail.com, monnier@iro.umontreal.ca, > Dmitry Gutov <dgutov@yandex.ru> > > Eli Zaretskii <eliz@gnu.org> writes: > > > More generally, I wish we stopped investing so much of our time and > > energy in cleanups and other support tasks, and more to add > > significant new applications and editing features. That would make > > more users happier, I think. > > I think users will be very happy to be able to use the regexp syntax > they know (instead of the special Emacs regexp variants) in their .emacs > files. The suggestion was to introduce this for general use, not just for user-private files. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 5:47 ` Eli Zaretskii 2021-03-04 10:49 ` Lars Ingebrigtsen @ 2021-03-04 14:25 ` Dmitry Gutov 2021-03-04 14:50 ` tomas 2021-03-04 15:11 ` Eli Zaretskii 1 sibling, 2 replies; 42+ messages in thread From: Dmitry Gutov @ 2021-03-04 14:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, stefankangas, monnier On 04.03.2021 07:47, Eli Zaretskii wrote: >> From: Dmitry Gutov <dgutov@yandex.ru> >> Date: Wed, 3 Mar 2021 21:58:17 +0200 >> Content-Language: en-US >> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org >> >> On 03.03.2021 21:21, Eli Zaretskii wrote: >>> And ERE isn't? >> >> To be fair, extended regular expressions are the regular expressions >> flavor most commonly used in the contemporary world, recent/popular >> programming languages, etc. >> >> So for a lot of people this won't be +1 thing to learn. > > So we are now going to cater to users of other programs more than we > cater to Emacs users who are used to the Emacs RE syntaxes? How does > that make sense? Do other programs prefer the Emacs RE syntax to > their own? Not "more". Just make an extra (fairly small) effort to accommodate them. > We have rx for many years, and just recently enhanced it > significantly. I fail to see how it would make sense to introduce yet > another RE syntax into Emacs, with all the overhead that brings with > it. Maybe it could make sense as an ELPA add-on, but not in core. The 'ere' function would be helpful to have in the core either way. As already mentioned in this thread, I have been using an equivalent of it for 5 years now. > More generally, I wish we stopped investing so much of our time and > energy in cleanups and other support tasks, and more to add > significant new applications and editing features. That would make > more users happier, I think. Perhaps if contributors didn't have to fight you about every little thing they need to change or fix (or if you responded to arguments, at least), we'll get more features over time. This is especially discouraging when the disagreement is over a minor change in a package I supposedly maintain (bug#44611 is the most glaring example). I can't threaten to slam the door and leave every time this happens, but this kind of malarkey sucks out a significant amount of time and enthusiasm. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 14:25 ` Dmitry Gutov @ 2021-03-04 14:50 ` tomas 2021-03-04 15:04 ` Dmitry Gutov 2021-03-04 15:05 ` Dmitry Gutov 2021-03-04 15:11 ` Eli Zaretskii 1 sibling, 2 replies; 42+ messages in thread From: tomas @ 2021-03-04 14:50 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Eli Zaretskii, stefankangas, monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 577 bytes --] On Thu, Mar 04, 2021 at 04:25:58PM +0200, Dmitry Gutov wrote: [...] > Perhaps if contributors didn't have to fight you about every little > thing they need to change or fix (or if you responded to arguments, > at least) [...] Please, be gentle. I think this is an unfair depiction of what Eli is doing here. Granted, he's not always easy to convince -- but I think that's part of his job as a maintainer. And he has nearly infinite patience in following up on discussions. I'm sure you can bring on your (I think founded) criticism in a more constructive way. Cheers - t [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 14:50 ` tomas @ 2021-03-04 15:04 ` Dmitry Gutov 2021-03-05 5:45 ` Richard Stallman 2021-03-04 15:05 ` Dmitry Gutov 1 sibling, 1 reply; 42+ messages in thread From: Dmitry Gutov @ 2021-03-04 15:04 UTC (permalink / raw) To: tomas; +Cc: Eli Zaretskii, stefankangas, monnier, emacs-devel On 04.03.2021 16:50, tomas@tuxteam.de wrote: > Please, be gentle. I think this is an unfair depiction of what Eli is > doing here. Yes, it was harsh, and it doesn't happen every single time, but it happens enough that I can feel overwhelmed and have to choose something else to do with my time. And it's not like it's easy to forget all the previous times it happened (unresolved bugs have a way of keeping one's attention returning to them). > Granted, he's not always easy to convince -- but I think > that's part of his job as a maintainer. If only it didn't step on, repeatedly, on my job as a maintainer. And feeling powerless (while still bearing responsibility) is not a great position to be in. > And he has nearly infinite > patience in following up on discussions. Just read the comments in that bug report. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 15:04 ` Dmitry Gutov @ 2021-03-05 5:45 ` Richard Stallman 2021-03-05 11:47 ` Dmitry Gutov 0 siblings, 1 reply; 42+ messages in thread From: Richard Stallman @ 2021-03-05 5:45 UTC (permalink / raw) To: Dmitry Gutov; +Cc: eliz, tomas, emacs-devel, stefankangas, monnier [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Eli has to be firm in order to avoid being bullied by insistent contributors. He is not supposed to give the maintainer of an individual Lisp package carte blanche. The maintainers of Emacs are in charge of all of Emacs. -- Dr Richard Stallman Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-05 5:45 ` Richard Stallman @ 2021-03-05 11:47 ` Dmitry Gutov 2021-03-06 5:11 ` Richard Stallman 0 siblings, 1 reply; 42+ messages in thread From: Dmitry Gutov @ 2021-03-05 11:47 UTC (permalink / raw) To: rms; +Cc: eliz, tomas, emacs-devel, stefankangas, monnier On 05.03.2021 07:45, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Eli has to be firm in order to avoid being bullied by insistent > contributors. He is not supposed to give the maintainer of an > individual Lisp package carte blanche. > > The maintainers of Emacs are in charge of all of Emacs. That's how it should work, yes. But then one should exercise their better judgment to avoid bullying the maintainers of individual Lisp packages over lesser disagreements. As well as recognize where their own proficiency ends and where it is more appropriate to delegate to others' technical opinion. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-05 11:47 ` Dmitry Gutov @ 2021-03-06 5:11 ` Richard Stallman 0 siblings, 0 replies; 42+ messages in thread From: Richard Stallman @ 2021-03-06 5:11 UTC (permalink / raw) To: Dmitry Gutov; +Cc: eliz, tomas, stefankangas, monnier, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > But then one should exercise their better judgment to avoid bullying the > maintainers of individual Lisp packages over lesser disagreements. Perhaps they are exercising their better judgment already. -- Dr Richard Stallman Chief GNUisance of the GNU Project (https://gnu.org) Founder, Free Software Foundation (https://fsf.org) Internet Hall-of-Famer (https://internethalloffame.org) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 14:50 ` tomas 2021-03-04 15:04 ` Dmitry Gutov @ 2021-03-04 15:05 ` Dmitry Gutov 1 sibling, 0 replies; 42+ messages in thread From: Dmitry Gutov @ 2021-03-04 15:05 UTC (permalink / raw) To: tomas; +Cc: Eli Zaretskii, stefankangas, monnier, emacs-devel On 04.03.2021 16:50, tomas@tuxteam.de wrote: > I'm sure you can bring on your (I think founded) criticism in a more > constructive way. I think I have tried many different ways of doing that by now. Including emailing Eli directly, of course. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-04 14:25 ` Dmitry Gutov 2021-03-04 14:50 ` tomas @ 2021-03-04 15:11 ` Eli Zaretskii 1 sibling, 0 replies; 42+ messages in thread From: Eli Zaretskii @ 2021-03-04 15:11 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel, stefankangas, monnier > Cc: stefankangas@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org > From: Dmitry Gutov <dgutov@yandex.ru> > Date: Thu, 4 Mar 2021 16:25:58 +0200 > > > More generally, I wish we stopped investing so much of our time and > > energy in cleanups and other support tasks, and more to add > > significant new applications and editing features. That would make > > more users happier, I think. > > Perhaps if contributors didn't have to fight you about every little > thing they need to change or fix (or if you responded to arguments, at > least), we'll get more features over time. This is especially > discouraging when the disagreement is over a minor change in a package I > supposedly maintain (bug#44611 is the most glaring example). > > I can't threaten to slam the door and leave every time this happens, but > this kind of malarkey sucks out a significant amount of time and enthusiasm. Ah, okay. So I'm the culprit. Thanks, noted. ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [External] : Re: Ugly regexps 2021-03-03 18:46 ` Stefan Kangas 2021-03-03 19:21 ` Eli Zaretskii @ 2021-03-03 19:32 ` Drew Adams 1 sibling, 0 replies; 42+ messages in thread From: Drew Adams @ 2021-03-03 19:32 UTC (permalink / raw) To: Stefan Kangas, Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel@gnu.org To add to what some others have said - Is RX usable as part of our interactive use of regexps? If so, great (assuming the UI for that is well done). If not, I'd say that's another reason that at least some of us might not bother with (or aren't in the habit of using) RX. I think that interactive use of regexps is the most important for Emacs - more important than what is used for Elisp. And if that means (as it does now) Elisp regexps, then that's what people will and should learn: Elisp regexp syntax. Of course, for interactive use, we already remove the need for double backslashing etc. But the regexp dialect that's available interactively is (so far) the Elisp one, not some other. I think that alone may explain limited use of RX in code. (Just a thought.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 0:32 Ugly regexps Stefan Monnier 2021-03-03 1:32 ` Stefan Kangas 2021-03-03 6:00 ` Eli Zaretskii @ 2021-03-03 7:09 ` Helmut Eller 2021-03-03 14:11 ` Stefan Kangas 2021-03-03 15:49 ` Stefan Monnier 2021-03-03 12:17 ` Dmitry Gutov 2021-03-03 13:57 ` Lars Ingebrigtsen 4 siblings, 2 replies; 42+ messages in thread From: Helmut Eller @ 2021-03-03 7:09 UTC (permalink / raw) To: emacs-devel On Tue, Mar 02 2021, Stefan Monnier wrote: > BTW, while this theme of ugly regexps keeps coming up, how 'bout we add > a new function `ere` which converts between the ERE style of regexps > where grouping parens are not escaped (and plain chars meant to match > an actual paren need to be escaped instead) to ELisp-style regexps? > > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) Better call it (rx (ere STRING)). Namespace pollution may not be prohibited by the law but mother Emacs may thank you anyway :-) Helmut ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 7:09 ` Helmut Eller @ 2021-03-03 14:11 ` Stefan Kangas 2021-03-03 16:40 ` Stefan Monnier 2021-03-03 15:49 ` Stefan Monnier 1 sibling, 1 reply; 42+ messages in thread From: Stefan Kangas @ 2021-03-03 14:11 UTC (permalink / raw) To: Helmut Eller, emacs-devel Helmut Eller <eller.helmut@gmail.com> writes: >> (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > Better call it (rx (ere STRING)). Namespace pollution may not be > prohibited by the law but mother Emacs may thank you anyway :-) FWIW, I'd strongly prefer `ere'. Some of us will be using this macro *a lot*. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 14:11 ` Stefan Kangas @ 2021-03-03 16:40 ` Stefan Monnier 0 siblings, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 16:40 UTC (permalink / raw) To: Stefan Kangas; +Cc: Helmut Eller, emacs-devel > Some of us will be using this macro *a lot*. Side note: it's a *function* Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 7:09 ` Helmut Eller 2021-03-03 14:11 ` Stefan Kangas @ 2021-03-03 15:49 ` Stefan Monnier 1 sibling, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 15:49 UTC (permalink / raw) To: Helmut Eller; +Cc: emacs-devel >> (string-match (ere "\\(def(macro|un|subst) .{1,}")) > Better call it (rx (ere STRING)). I think "every character counts" here. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 0:32 Ugly regexps Stefan Monnier ` (2 preceding siblings ...) 2021-03-03 7:09 ` Helmut Eller @ 2021-03-03 12:17 ` Dmitry Gutov 2021-03-03 15:48 ` Stefan Monnier 2021-03-03 13:57 ` Lars Ingebrigtsen 4 siblings, 1 reply; 42+ messages in thread From: Dmitry Gutov @ 2021-03-03 12:17 UTC (permalink / raw) To: Stefan Monnier, emacs-devel On 03.03.2021 02:32, Stefan Monnier wrote: > (defun ere (re) > "Convert an ERE-style regexp RE to an Emacs-style regexp." > (let ((pos 0) > (last 0) > (chunks '())) > (while (string-match "\\\\.\\|[{}()|]" re pos) > (let ((beg (match-beginning 0)) > (end (match-end 0))) > (when (subregexp-context-p re beg) > (cond > ;; A normal paren: add a backslash. > ((= (1+ beg) end) > (push (substring re last beg) chunks) (setq last beg) > (push "\\" chunks)) > ;; A grouping paren: skip the backslash. > ((memq (aref re (1+ beg)) '(?\( ?\) ?\{ ?\} ?\|)) > (push (substring re last beg) chunks) > (setq last (1+ beg))))) > (setq pos end))) > (mapconcat #'identity (nreverse (cons (substring re last) chunks)) ""))) See also xref--regexp-to-extended, my last attempt at RE->ERE conversion, though woefully lacking in tests. Its goal was to move in the other direction, but (unless I'm missing something about the syntax differences) this function is reversible. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 12:17 ` Dmitry Gutov @ 2021-03-03 15:48 ` Stefan Monnier 0 siblings, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2021-03-03 15:48 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel >> (defun ere (re) >> "Convert an ERE-style regexp RE to an Emacs-style regexp." >> (let ((pos 0) >> (last 0) >> (chunks '())) >> (while (string-match "\\\\.\\|[{}()|]" re pos) >> (let ((beg (match-beginning 0)) >> (end (match-end 0))) >> (when (subregexp-context-p re beg) >> (cond >> ;; A normal paren: add a backslash. >> ((= (1+ beg) end) >> (push (substring re last beg) chunks) (setq last beg) >> (push "\\" chunks)) >> ;; A grouping paren: skip the backslash. >> ((memq (aref re (1+ beg)) '(?\( ?\) ?\{ ?\} ?\|)) >> (push (substring re last beg) chunks) >> (setq last (1+ beg))))) >> (setq pos end))) >> (mapconcat #'identity (nreverse (cons (substring re last) chunks)) ""))) > > See also xref--regexp-to-extended, my last attempt at RE->ERE conversion, > though woefully lacking in tests. Oh, thanks for the pointer. > Its goal was to move in the other direction, but (unless I'm missing > something about the syntax differences) this function is reversible. Indeed it is, Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Ugly regexps 2021-03-03 0:32 Ugly regexps Stefan Monnier ` (3 preceding siblings ...) 2021-03-03 12:17 ` Dmitry Gutov @ 2021-03-03 13:57 ` Lars Ingebrigtsen 4 siblings, 0 replies; 42+ messages in thread From: Lars Ingebrigtsen @ 2021-03-03 13:57 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > So you can do > > (string-match (ere "\\(def(macro|un|subst) .{1,}")) > > instead of > > (string-match "(def\\(macro\\|un\\|subst\\) .\\{1,\\}") > > ? Sounds good to me. In some cases, introducing an alternative syntax can create confusion, but I don't think that's really the case here -- I think everybody knows this syntax, perhaps better than the Emacs regexp syntax. The byte compiler can do the transformation, I guess? (When it's a string literal, which is usually is.) So there should be no performance impact. And when Emacs finally grows support for a regexp object, then `ere' can return one of those. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2021-03-06 5:11 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-03 0:32 Ugly regexps Stefan Monnier 2021-03-03 1:32 ` Stefan Kangas 2021-03-03 2:08 ` Stefan Kangas 2021-03-03 6:19 ` Eli Zaretskii 2021-03-03 20:46 ` Alan Mackenzie 2021-03-04 18:35 ` Stefan Kangas 2021-03-03 6:00 ` Eli Zaretskii 2021-03-03 15:46 ` Stefan Monnier 2021-03-03 16:30 ` Eli Zaretskii 2021-03-03 17:44 ` Stefan Monnier 2021-03-03 18:46 ` Stefan Kangas 2021-03-03 19:21 ` Eli Zaretskii 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 20:16 ` Stefan Kangas 2021-03-03 19:50 ` Stefan Kangas 2021-03-03 19:58 ` Dmitry Gutov 2021-03-03 20:07 ` [External] : " Drew Adams 2021-03-03 20:31 ` Stefan Kangas 2021-03-03 22:17 ` Drew Adams 2021-03-03 22:32 ` Stefan Monnier 2021-03-03 20:32 ` Stefan Monnier 2021-03-04 5:47 ` Eli Zaretskii 2021-03-04 10:49 ` Lars Ingebrigtsen 2021-03-04 11:25 ` Mattias Engdegård 2021-03-04 11:28 ` Alan Mackenzie 2021-03-04 14:11 ` Eli Zaretskii 2021-03-04 14:25 ` Dmitry Gutov 2021-03-04 14:50 ` tomas 2021-03-04 15:04 ` Dmitry Gutov 2021-03-05 5:45 ` Richard Stallman 2021-03-05 11:47 ` Dmitry Gutov 2021-03-06 5:11 ` Richard Stallman 2021-03-04 15:05 ` Dmitry Gutov 2021-03-04 15:11 ` Eli Zaretskii 2021-03-03 19:32 ` [External] : " Drew Adams 2021-03-03 7:09 ` Helmut Eller 2021-03-03 14:11 ` Stefan Kangas 2021-03-03 16:40 ` Stefan Monnier 2021-03-03 15:49 ` Stefan Monnier 2021-03-03 12:17 ` Dmitry Gutov 2021-03-03 15:48 ` Stefan Monnier 2021-03-03 13:57 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).