* How to grok a complicated regex? @ 2015-03-13 21:35 Marcin Borkowski 2015-03-13 21:45 ` Marcin Borkowski ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Marcin Borkowski @ 2015-03-13 21:35 UTC (permalink / raw) To: Help Gnu Emacs mailing list Hi all, so I have this monstrosity [note: I know, there are much worse ones, too!]: "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" (it's in the org-latex--script-size function in ox-latex.el, if you're curious). I'm not asking “what does this match” – I can read it myself. But it comes with a considerable effort. Are you aware of any tools that might help to understand such regexen? I know about re-builder, but it’s well suited for constructing a regex matching a given string, not the other way round. For instance, show-paren-mode does not really help here, since it seems to pair “\\(“ with unescaped “)”. Any ideas? (Note: if there are no such tools, I might be tempted to craft one. Two things that come to my mind are proper highlighting of matching parens of various kinds and eldoc-like hints for all the regex constructs – I never seem to remember what does “\\`” do, for instance. Also, displaying the string with single backslashes and not in the way it is actually typed in in Elisp, with all the backslash escaping, might be helpful. Would there be a demand for such a tool larger than one person?) Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 21:35 How to grok a complicated regex? Marcin Borkowski @ 2015-03-13 21:45 ` Marcin Borkowski 2015-03-13 21:47 ` Alexis 2015-03-23 12:18 ` Vaidheeswaran C 2 siblings, 0 replies; 24+ messages in thread From: Marcin Borkowski @ 2015-03-13 21:45 UTC (permalink / raw) To: Help Gnu Emacs mailing list On 2015-03-13, at 22:35, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: > Hi all, > > so I have this monstrosity [note: I know, there are much worse ones, > too!]: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > > (it's in the org-latex--script-size function in ox-latex.el, if you're > curious). > > I'm not asking “what does this match” – I can read it myself. But it > comes with a considerable effort. Are you aware of any tools that might > help to understand such regexen? BTW, it turned out to be fairly simple after all, but I could see this only after passing it through (insert ...) in a temporary buffer, so that all the double backslashes stopped looking like a drunkard's nightmare. So even such a rudimentary "tool" (basically, temp buffer and `insert') did help a lot. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 21:35 How to grok a complicated regex? Marcin Borkowski 2015-03-13 21:45 ` Marcin Borkowski @ 2015-03-13 21:47 ` Alexis 2015-03-13 21:57 ` Marcin Borkowski 2015-03-23 12:18 ` Vaidheeswaran C 2 siblings, 1 reply; 24+ messages in thread From: Alexis @ 2015-03-13 21:47 UTC (permalink / raw) To: help-gnu-emacs On 2015-03-14T08:35:36+1100, Marcin Borkowski <mbork@wmi.amu.edu.pl> said: MB> I'm not asking “what does this match” – I can read it myself. MB> But it comes with a considerable effort. Are you aware of any MB> tools that might help to understand such regexen? `rxt-explain-elisp`? Alexis. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 21:47 ` Alexis @ 2015-03-13 21:57 ` Marcin Borkowski 0 siblings, 0 replies; 24+ messages in thread From: Marcin Borkowski @ 2015-03-13 21:57 UTC (permalink / raw) To: help-gnu-emacs On 2015-03-13, at 22:47, Alexis <flexibeast@gmail.com> wrote: > On 2015-03-14T08:35:36+1100, Marcin Borkowski > <mbork@wmi.amu.edu.pl> said: > > MB> I'm not asking “what does this match” – I can read it myself. > MB> But it comes with a considerable effort. Are you aware of > any MB> tools that might help to understand such regexen? > > `rxt-explain-elisp`? Interesting, I didn't know about this one. Thanks a lot, I'll take a look! > Alexis. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 21:35 How to grok a complicated regex? Marcin Borkowski 2015-03-13 21:45 ` Marcin Borkowski 2015-03-13 21:47 ` Alexis @ 2015-03-23 12:18 ` Vaidheeswaran C 2 siblings, 0 replies; 24+ messages in thread From: Vaidheeswaran C @ 2015-03-23 12:18 UTC (permalink / raw) To: help-gnu-emacs On Saturday 14 March 2015 03:05 AM, Marcin Borkowski wrote: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > > (it's in the org-latex--script-size function in ox-latex.el, if you're > curious). > > I'm not asking “what does this match” – I can read it myself. But it > comes with a considerable effort. Are you aware of any tools that might > help to understand such regexen? Get xr.el from http://debbugs.gnu.org/cgi/bugreport.cgi?msg=40;filename=xr.el;att=1;bug=13369 M-x load-library xr.el M-x pp-eval-expression RET (xr "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'") RET (seq bos (opt (or (seq "\\" (any "[" "(")) (one-or-more "$"))) (group (minimal-match (zero-or-more nonl))) (opt (or (seq "\\" (any ")" "]")) (one-or-more "$"))) eos) There is also lex (see http://elpa.gnu.org/packages/lex.html) which provides similar functionality. FWIW, my edit window "disappears" if I do (lex-parse-re "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'") ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org>]
* Re: How to grok a complicated regex? [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org> @ 2015-03-13 22:46 ` Emanuel Berg 2015-03-13 23:16 ` Marcin Borkowski [not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org> 2015-03-18 16:40 ` Alan Mackenzie 2015-04-25 4:23 ` Rusi 2 siblings, 2 replies; 24+ messages in thread From: Emanuel Berg @ 2015-03-13 22:46 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: > so I have this monstrosity [note: I know, there are > much worse ones, too!]: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > > (it's in the org-latex--script-size function in > ox-latex.el, if you're curious). > > I'm not asking “what does this match” – I can read > it myself. But it comes with a considerable effort. I dare say most people (even programmers) cannot read that so if you can that's great. As a math professional you are of course aware of the discipline called automata theory that deals with such things. Perhaps relational algebra might help to, if the data in the sets are strings. But automata theory should be it even more. Also, remember you don't have to understand those expressions. Often they are setup incrementally. They only need to be correct. The computer understands them - the programmer only understands the purpose, and the latest edition. Kind of risky, perhaps not what I math person would be appealed by, but I've constructed many that way so I know that method works. > Are you aware of any tools that might help to > understand such regexen? I have seen tools with which you can construct such expressions and they output figures, states, transitions, and so on. I wonder how advanced expression they can deal with? But if you get the basics right, it should be just basic building blocks that stick together and from there on the sky is the limit. Instead the problem is, as I see it: will those figures, balls and arrows, tagged with preconditions, postconditions, everything you can think of, will that actually be *clearer*? If I were to do it (which I am not thanks god) my answer would be *no*. The only way I could do it would instead be the opposite. Train the brain with such expressions - exactly as they are - day in, day out, until they are second nature. Example: a C++ OO project with classes and everything. Silly inheritance and interfaces. Some people would consider those pretty darn difficult to understand. But to the seasoned C++ programmer (no exaggerating here, a few years of focused training is enough) those programs are clear. For those guys, giving up writing C++ code and instead using some other representation (be it graphical or not) would be to in one stroke cripple their skills. So no, I think that representation is the best there is. To translate it back and forth would not only be very difficult to do - and even if possible, which of course it is, because a representation is just a representation of I don't know how many possible - I don't see the end result being any more clear: on the contrary, most likely. What I would do - try to get it more readable by using classes, string classes (do they exist?), and even more advanced constructs if necessary - as in this simple example: (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)") How do you define those? Can you identify any which aren't there, but could/should be? Example: say there is a class called "delimiters" which contain [, (, {, <, >, }, ), and ]. Can you split that up, in "opening-delimiters" and closing ditto? Second, exactly you mentioned - the font lock issue - work on that. You do know, of course, of font-lock-regexp-grouping-construct font-lock-regexp-grouping-backslash Are there more of those, that you can identify, and add? -- underground experts united ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 22:46 ` Emanuel Berg @ 2015-03-13 23:16 ` Marcin Borkowski 2015-03-14 0:12 ` Rasmus ` (2 more replies) [not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org> 1 sibling, 3 replies; 24+ messages in thread From: Marcin Borkowski @ 2015-03-13 23:16 UTC (permalink / raw) To: help-gnu-emacs On 2015-03-13, at 23:46, Emanuel Berg <embe8573@student.uu.se> wrote: > Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: > >> so I have this monstrosity [note: I know, there are >> much worse ones, too!]: >> >> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" >> >> (it's in the org-latex--script-size function in >> ox-latex.el, if you're curious). >> >> I'm not asking “what does this match” – I can read >> it myself. But it comes with a considerable effort. > > I dare say most people (even programmers) cannot read > that so if you can that's great. As a math Really? It's not /that/ difficult. You only need enough coffee (or tea, in my case), time and motivation. You don’t need a genius, or even IQ higher than, say, 90 or so. It's not really /difficult/. Intimidating, yes. Boring, possibly. Laborious (and mechanical), yes. But not /difficult/. > professional you are of course aware of the discipline > called automata theory that deals with such things. Well, as an analyst working in metric fixed point theory, that's just it. I'm /aware/ of automata theory – (almost) nothing more. ;-) > Perhaps relational algebra might help to, if the data > in the sets are strings. But automata theory should be > it even more. > > Also, remember you don't have to understand those > expressions. Often they are setup incrementally. They > only need to be correct. The computer understands them > - the programmer only understands the purpose, and the > latest edition. Kind of risky, perhaps not what I math > person would be appealed by, but I've constructed many > that way so I know that method works. That reminds me of the von Neumann quote: “In mathematics, you don’t /understand/ things – you just /get used/ to them.” >> Are you aware of any tools that might help to >> understand such regexen? > > I have seen tools with which you can construct such > expressions and they output figures, states, > transitions, and so on. I wonder how advanced > expression they can deal with? But if you get the > basics right, it should be just basic building blocks > that stick together and from there on the sky is the > limit. > > Instead the problem is, as I see it: will those > figures, balls and arrows, tagged with preconditions, > postconditions, everything you can think of, will that > actually be *clearer*? As we both point out, I’m not talking about changing the representation, but about making the existing one (which I agree is not /that/ bad) more comprehensible. Font lock, grouping and unescaping backslashes would be definitely helpful. OTOH, I can imagine that some kind of diagrams might be helpful for someone. The point is, in the end you have to read/write these regexen in their normal form anyway, so why not train yourself to understand their “default” representation instead of adding the burden of translationg between representations? > If I were to do it (which I am not thanks god) my > answer would be *no*. The only way I could do it would > instead be the opposite. Train the brain with such > expressions - exactly as they are - day in, day out, > until they are second nature. > > Example: a C++ OO project with classes and everything. > Silly inheritance and interfaces. Some people would > consider those pretty darn difficult to understand. > But to the seasoned C++ programmer (no exaggerating > here, a few years of focused training is enough) those > programs are clear. For those guys, giving up writing > C++ code and instead using some other representation > (be it graphical or not) would be to in one stroke > cripple their skills. > > So no, I think that representation is the best there > is. To translate it back and forth would not only be I’m not sure whether it’s the best – but it’s a standard (more or less, Emacs’ regexen are not really “standard” by today’s, well, standards – but hardly anything about Emacs is “standard” or “typical”, so who cares;-)). > very difficult to do - and even if possible, which of I disagree. I don’t think that such a translator would be a difficult one to write. If only I was a student again, with plenty of spare time, I might have taken the challenge and tried to write one in TeX, so that some TeX macro, given an (Emacs) regex would produce a nicely typeset diagram. Wow, what a nice project for a bachelor’s thesis. Wait a minute. Ohboyohboyohboy. I have to put this in my faculty’s database of potential topics. Poor students... ;-) (BTW, I did once write a poor man’s parser in pure TeX; since there were no regex engine written in TeX back then (now there is one!), I had to craft a simple automaton myself. Not an extremely pleasant work...) > course it is, because a representation is just a > representation of I don't know how many possible - I > don't see the end result being any more clear: on the > contrary, most likely. > > What I would do - try to get it more readable by using > classes, string classes (do they exist?), and even > more advanced constructs if necessary - as in this > simple example: > > (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)") > > How do you define those? Can you identify any which > aren't there, but could/should be? > > Example: say there is a class called "delimiters" > which contain [, (, {, <, >, }, ), and ]. Can you > split that up, in "opening-delimiters" and closing > ditto? > > Second, exactly you mentioned - the font lock issue - > work on that. > > You do know, of course, of > > font-lock-regexp-grouping-construct > font-lock-regexp-grouping-backslash > > Are there more of those, that you can identify, and > add? There could be quite a few. (As Alexis pointed out, a tool I was writing about seems to exist – if it’s not satisfactory, I could think about extending it somehow. Not very probable, though – I’m too busy now. If only someone could be paying me for goofing around and playing with Emacs hacks...) Thanks for your input, and best regards! -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 23:16 ` Marcin Borkowski @ 2015-03-14 0:12 ` Rasmus 2015-03-14 13:18 ` Stefan Monnier ` (2 more replies) 2015-03-14 5:14 ` Yuri Khan 2015-03-14 7:03 ` Drew Adams 2 siblings, 3 replies; 24+ messages in thread From: Rasmus @ 2015-03-14 0:12 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: > On 2015-03-13, at 23:46, Emanuel Berg <embe8573@student.uu.se> wrote: > >> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: >> >>> so I have this monstrosity [note: I know, there are >>> much worse ones, too!]: >>> >>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" >>> >>> (it's in the org-latex--script-size function in >>> ox-latex.el, if you're curious). >>> >>> I'm not asking “what does this match” – I can read >>> it myself. But it comes with a considerable effort. >> >> I dare say most people (even programmers) cannot read >> that so if you can that's great. > > Really? It's not /that/ difficult. You only need enough coffee (or > tea, in my case), time and motivation. > You don’t need a genius, or even IQ higher than, say, 90 or so. Damn. At least I know why I don't understand it now... To grok REs I sometimes prefer visualize regexps¹ over re-builder. Though re-builder has the advantage that it can understands \\ out of the box. You may also find highlight-regexp since it would color the different parentheses matches. Here's another project (for your students): adding lookaround to Emacs regexp /and/ have it merged. It would be *insanely(!)* at times. —Rasmus Footnotes: ¹ https://github.com/benma/visual-regexp.el -- A clever person solves a problem. A wise person avoids it ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-14 0:12 ` Rasmus @ 2015-03-14 13:18 ` Stefan Monnier [not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org> 2015-03-22 2:29 ` Tom Tromey 2 siblings, 0 replies; 24+ messages in thread From: Stefan Monnier @ 2015-03-14 13:18 UTC (permalink / raw) To: help-gnu-emacs > Here's another project (for your students): adding lookaround to Emacs > regexp /and/ have it merged. It would be *insanely(!)* at times. A better project: replace the regexp engine with one that does not backtrack all the time. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org>]
* Re: How to grok a complicated regex? [not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org> @ 2015-03-15 4:31 ` Rusi 0 siblings, 0 replies; 24+ messages in thread From: Rusi @ 2015-03-15 4:31 UTC (permalink / raw) To: help-gnu-emacs On Saturday, March 14, 2015 at 6:48:41 PM UTC+5:30, Stefan Monnier wrote: > > Here's another project (for your students): adding lookaround to Emacs > > regexp /and/ have it merged. It would be *insanely(!)* at times. > > A better project: replace the regexp engine with one that does not > backtrack all the time. > > > Stefan http://www.colm.net/open-source/ragel/ already exists It would be neat if it were part of emacs' core ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-14 0:12 ` Rasmus 2015-03-14 13:18 ` Stefan Monnier [not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org> @ 2015-03-22 2:29 ` Tom Tromey 2015-03-22 2:44 ` Rasmus 2 siblings, 1 reply; 24+ messages in thread From: Tom Tromey @ 2015-03-22 2:29 UTC (permalink / raw) To: Rasmus; +Cc: help-gnu-emacs Rasmus> Here's another project (for your students): adding lookaround to Emacs Rasmus> regexp /and/ have it merged. It would be *insanely(!)* at times. It was done once already and either rejected or never merged in :-( Tom ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-22 2:29 ` Tom Tromey @ 2015-03-22 2:44 ` Rasmus 0 siblings, 0 replies; 24+ messages in thread From: Rasmus @ 2015-03-22 2:44 UTC (permalink / raw) To: tom; +Cc: help-gnu-emacs Tom Tromey <tom@tromey.com> writes: > It was done once already and either rejected or never merged in :-( That's a real shame. Any particular reason? I need it for some Gnus settings that only takes regexp (I manage "public" mailing lists and my own catch-all email on one domain). I'm sure it could be useful in e.g. Org as well, though speed is of course an issue. —Rasmus -- When the facts change, I change my mind. What do you do, sir? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-13 23:16 ` Marcin Borkowski 2015-03-14 0:12 ` Rasmus @ 2015-03-14 5:14 ` Yuri Khan 2015-03-14 7:03 ` Drew Adams 2 siblings, 0 replies; 24+ messages in thread From: Yuri Khan @ 2015-03-14 5:14 UTC (permalink / raw) To: Marcin Borkowski; +Cc: help-gnu-emacs@gnu.org On Sat, Mar 14, 2015 at 5:16 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: >>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" >>> > It's not really /difficult/. > Intimidating, yes. Boring, possibly. Laborious (and mechanical), yes. > But not /difficult/. I tried it and it’s not very intimidating or boring or laborious or difficult. Here’s my thought process: First I unescape all backslashes, by global-replacing “\\” with “\”. Then I insert spaces at key points to separate the syntactic constructs. (Any literal spaces in the regexp need to be made explicit, e.g. by replacing as <space>.) \` \(?: \\ [([] \| \$+ \)? \(.*?\) \(?: \\ [])] \| \$+ \)? \' Imagining the parentheses and alternatives as nested boxes might help, too: ┌─────────┬─────┐ ╔═══╗ ┌─────────┬─────┐ \` │ \\ [([] │ \$+ │? ║.*?║ │ \\ [])] │ \$+ │? \' └─────────┴─────┘ ╚═══╝ └─────────┴─────┘ (Here the nesting level is just 1, so I didn’t actually need to draw it, just match.) Now I can read it: 1. start-of-string 2. optionally followed by either * a backslash and either an opening parenthesis or bracket * or one or more dollar signs 3. followed by any string, which is extracted as group 1 4. optionally followed by either * a backslash and either a closing bracket or parenthesis * or one or more dollar signs 5. followed by end-of-string I can further grok it as matching a valid (La)TeX math formula: $…$, $$…$$, \(…\), \[…\]; as well as some invalid markup such as $$$$…$$$, $…\], \(…\], $$…, etc. As for the bigger picture, I think, if a regular expression ends up difficult to read, it needs decomposed into small, easily digestible chunks, each with a descriptive name. Elisp has the let* form and the rx macro for this purpose. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: How to grok a complicated regex? 2015-03-13 23:16 ` Marcin Borkowski 2015-03-14 0:12 ` Rasmus 2015-03-14 5:14 ` Yuri Khan @ 2015-03-14 7:03 ` Drew Adams 2 siblings, 0 replies; 24+ messages in thread From: Drew Adams @ 2015-03-14 7:03 UTC (permalink / raw) To: Marcin Borkowski, help-gnu-emacs > I’m not talking about changing the representation, but about making the > existing one (which I agree is not /that/ bad) more comprehensible. > Font lock, grouping and unescaping backslashes would be definitely helpful. > > OTOH, I can imagine that some kind of diagrams might be helpful for > someone. The point is, in the end you have to read/write these regexen > in their normal form anyway, so why not train yourself to understand > their “default” representation instead of adding the burden of > translationg between representations? I agree that a visual aid can help with learning - about regexps in general and about Emacs regexp syntax in particular. The Emacs Wiki page about regexps provides suggestions about learning regexp syntax: http://www.emacswiki.org/emacs/RegularExpression. Incremental regexp searching (`C-M-s') is one good tool for learning. What it does not help so much with is subgroup matching - keeping the different groups straight when there are several possibilities. Rasmus mentioned that `visual-regexp.el' can help with that. Likewise, Icicles search: it highlights different subgroup matches differently. Here is a screenshot that shows a complex regexp (5 groups) and a diagram that maps each group to its highlighting: http://www.emacswiki.org/emacs/RegularExpression#RegexpsInIcicles The regexp: "(\([-a-z*]+\) *\((\(([-a-z]+ *\([^)]*\))\))\).*". A left paren, a name, possibly some whitespace, two left parens, a name, possibly some whitespace, possibly non right-paren chars, two right parens, and any chars other than newline. But grouped in a particular way. I find that it is more often the case, for a complicated regexp, that you encounter it readymade (in some existing code), and you want to see what it is all about and perhaps make a modification to it. That use case is more typical than is creating a complex regexp from scratch. As Emanuel said, such regexps are often arrived at incrementally - they start simpler and evolve. I recommend playing with existing regexps this way, seeing what they match by using them with a visual tool such as Icicles search, `visual-regexp.el', or even `C-M-s'. A tour through the Emacs source code will show you plenty of interesting regexps you can play with - font-lock keywords and patterns defining Emacs pages, sentences, etc. ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org>]
* Re: How to grok a complicated regex? [not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org> @ 2015-03-14 3:58 ` Emanuel Berg 2015-03-14 4:44 ` Emanuel Berg 0 siblings, 1 reply; 24+ messages in thread From: Emanuel Berg @ 2015-03-14 3:58 UTC (permalink / raw) To: help-gnu-emacs Marcin Borkowski <mbork@wmi.amu.edu.pl> writes: > Really? It's not /that/ difficult. You only need > enough coffee (or tea, in my case), time and > motivation. You don’t need a genius, or even IQ > higher than, say, 90 or so. It's not really > /difficult/. Intimidating, yes. Boring, possibly. > Laborious (and mechanical), yes. But not > /difficult/. I mean to be able to read it like you read the code of a programming language. What that takes is training like everything else. Instead of deconstructing and reconstructing complicated expressions like your example I would recommend starting small - the most basic building blocks over and over, then make them gradually more complicated by combinations, then combinations of combinations, ... It is the way a machine would process it (only the other way around), and it is the way a foreign natural language is acquired (almost always). "IQ" is a joke and has nothing to do with it unless IQ is defined by the ability to understand regular expression, which by the way I think isn't far away from how they test "IQ" (which says alot). > I disagree. I don’t think that such a translator > would be a difficult one to write. The compiler itself is perhaps not extremely difficult tho certainly not trivial. But that's only the first step. Then comes presenting it graphically, and make an editor. To get that to actually work, polished, and work better than just mastering and typing that form of code - I'm not convinced. > Wow, what a nice project for a bachelor’s thesis. > Wait a minute. Ohboyohboyohboy. I have to put this > in my faculty’s database of potential topics. Poor > students... ;-) That kind of autistic-genius, single-sided crazy stuff doesn't appeal to me (in fact I think it is destructive). I'm into execution and combinations - i.e. not focusing on the technique per se. As an example, when I did my Master in CS I had Lisp, C++, zsh, and LaTeX (and more), everything working together like glued to each other. I don't like one scientist to do all the thinking, I like on engineer that does everything and thinks at the same time. -- underground experts united ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-14 3:58 ` Emanuel Berg @ 2015-03-14 4:44 ` Emanuel Berg 2015-03-14 4:58 ` Emanuel Berg ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Emanuel Berg @ 2015-03-14 4:44 UTC (permalink / raw) To: help-gnu-emacs I don't understand this discussion anymore or what anyone are saying. The representation is difficult to read, but not that difficult, so there shouldn't be another representation tool, a tool which isn't that difficult to do, so it should be a Bachelor degree project. The show must go on! -- underground experts united ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-14 4:44 ` Emanuel Berg @ 2015-03-14 4:58 ` Emanuel Berg 2015-03-14 8:43 ` Thien-Thi Nguyen [not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 24+ messages in thread From: Emanuel Berg @ 2015-03-14 4:58 UTC (permalink / raw) To: help-gnu-emacs Emanuel Berg <embe8573@student.uu.se> writes: > I don't understand this discussion anymore or what > anyone are saying. > > The representation is difficult to read, but not that > difficult, so there shouldn't be another > representation tool, a tool which isn't that difficult > to do, so it should be a Bachelor degree project. > > The show must go on! OK, sorry about that. This discussion was interesting. The whole session was good. J'ai confiance. Long live techno-techno-totalitarianism! Now I'm too light-headed, so I'm hitting the paleo-sack: when I wake up in a week or so I'll read the latest messages and offer all assimilated insights that has struck me like a lighting bolt on Terra Prima. -- underground experts united ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-14 4:44 ` Emanuel Berg 2015-03-14 4:58 ` Emanuel Berg @ 2015-03-14 8:43 ` Thien-Thi Nguyen [not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 24+ messages in thread From: Thien-Thi Nguyen @ 2015-03-14 8:43 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 826 bytes --] () Emanuel Berg <embe8573@student.uu.se> () Sat, 14 Mar 2015 05:44:02 +0100 I don't understand this discussion anymore or what anyone are saying. I'm sorry we don't support backtracking in REader Generated EXPressions. Ha ha, just kidding. :-D M-x M-explore RET: I notice many times what people say, you respond with your personal preferences, without acknowledging in some way the validity of other people's pov. Maybe that method somehow interferes w/ your understanding of other people and their concerns. -- Thien-Thi Nguyen ------------------------------------------ (if you're human and you know it) read my lisp: (defun responsep (type via) (case type (technical (eq 'mailing-list via)) ...)) ----------------------------------------- GPG key: 4C807502 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org>]
* Re: How to grok a complicated regex? [not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org> @ 2015-03-20 1:05 ` Emanuel Berg 0 siblings, 0 replies; 24+ messages in thread From: Emanuel Berg @ 2015-03-20 1:05 UTC (permalink / raw) To: help-gnu-emacs Thien-Thi Nguyen <ttn@gnu.org> writes: > I notice many times what people say, you respond > with your personal preferences, without > acknowledging in some way the validity of other > people's pov. Maybe that method somehow interferes > w/ your understanding of other people and > their concerns. What do you mean? Aaanyway... For a person to be able to read those regexps that look like comic book insults is not to be expected. If someone is still able to do that congratulations to him/her, unless such an unusual talent comes with drawbacks in other areas of life... For a person who writes and reads such regexps every day, if such a person exists, he or she should acquire the skill to do so seamlessly, like I write, and you read, this English paragraph and ditto Elisp form: (setq fill-nobreak-predicate '(fill-single-char-nobreak-p fill-single-word-nobreak-p)) There should be no need at all of a thought process but instead instant recognition. How will such a person arrive at that skill level? Simple, he/she does it every day! There will be no need for a second representation or even illustrative tools. Such will be at best fun toys (very soon) as the actual representation will be the only one ever considered. For everyone else who perhaps does it now and then the (de/re)construction method like picking apart a math formula or a French MAB Model B pistol is nothing to be ashamed of. Or, for that matter the incremental method of understanding the general purpose and inserting the missing char whenever a problem appears. If anyone is very fond of the regexps and wishes to do them all the time and for this reason thinks of tools and toys as to be able to do that, that's fine, as long as one is aware why it is done (well, maybe that's not necessary come think of it). But if so, then I have an even better idea, namely an Emacs wiki page to which you can e-mail desired regexps, and then the group of regexp lovers can provide those after getting instruction either exactly what it should be, or the general problem to be solved, and then the can deliver it, stainless steel, and everyone is happy. -- underground experts united ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org> 2015-03-13 22:46 ` Emanuel Berg @ 2015-03-18 16:40 ` Alan Mackenzie 2015-03-19 8:15 ` Tassilo Horn 2015-04-25 4:23 ` Rusi 2 siblings, 1 reply; 24+ messages in thread From: Alan Mackenzie @ 2015-03-18 16:40 UTC (permalink / raw) To: help-gnu-emacs Hi, Marcin. Sorry if I'm a bit late to this discussion. Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote: > Hi all, > so I have this monstrosity [note: I know, there are much worse ones, > too!]: > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > (it's in the org-latex--script-size function in ox-latex.el, if you're > curious). > I'm not asking ?what does this match? ? I can read it myself. But it > comes with a considerable effort. Are you aware of any tools that might > help to understand such regexen? > I know about re-builder, but it?s well suited for constructing a regex > matching a given string, not the other way round. > For instance, show-paren-mode does not really help here, since it seems > to pair ?\\(? with unescaped ?)?. > Any ideas? I wrote myself the following tool. It's not production quality, but you might find it useful nonetheless. To use it, Type M-: (pp-regexp re-horror). It displays the regexp at the end of the *scratch* buffer, dropping the contents of any \(..\) construct by one line. I find it useful. So might you. Feel free to adapt it, or pass it on to other people. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun pp-regexp (regexp) "Pretty print a regexp. This means, contents of \\\\\(s are lowered a line." (or (stringp regexp) (error "parameter is not a string.")) (let ((depth 0) (re (replace-regexp-in-string "[\t\n\r\f]" (lambda (s) (or (cdr (assoc s '(("\t" . "??") ("\n" . "??") ("\r" . "??")))) "??")) regexp)) (start 0) ; earliest position still without an acm-depth property. (pos 0) ; current analysis position. (max-depth 0) ; How many lines do we need to print? (min-depth 0) ; Pick up "negative depth" errors. pr-line ; output line being constructed line-no ; line number of pr-line, varies between min-depth and max-depth. ch ) ;(translate-rnt re) ;; apply acm-depth properties to the whole string. (while (< start (length re)) (setq pos (string-match ;; "\\\\\\((\\(\\?:\\)?\\||\\|)\\)" "\\\\\\(\\\\\\|(\\(\\?:\\)?\\||\\|)\\)" re start)) (put-text-property start (or pos (length re)) 'acm-depth depth re) (when pos (setq ch (aref (match-string 1 re) 0)) (cond ((eq ch ?\\) (put-text-property pos (match-end 1) 'acm-depth depth re)) ((eq ch ?\() (put-text-property pos (match-end 1) 'acm-depth depth re) (setq depth (1+ depth)) (if (> depth max-depth) (setq max-depth depth))) ((eq ch ?\|) (put-text-property pos (match-end 1) 'acm-depth (1- depth) re) (if (< (1- depth) min-depth) (setq min-depth (1- depth)))) (t ; (eq ch ?\)) (setq depth (1- depth)) (if (< depth min-depth) (setq min-depth depth)) (put-text-property pos (match-end 1) 'acm-depth depth re)))) (setq start (if pos (match-end 1) (length re)))) ;; print out the strings (setq line-no min-depth) (while (<= line-no max-depth) (with-current-buffer "*scratch*" (goto-char (point-max)) (insert ?\n) (setq pr-line "") (setq start 0) (while (< start (length re)) (setq pos (next-single-property-change start 'acm-depth re (length re))) (setq depth (get-text-property start 'acm-depth re)) (setq pr-line (concat pr-line (if (= depth line-no) (substring re start pos) (make-string (- pos start) ?\ )))) (setq start pos)) (insert pr-line) (setq line-no (1+ line-no)))))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > (Note: if there are no such tools, I might be tempted to craft one. Two > things that come to my mind are proper highlighting of matching parens > of various kinds and eldoc-like hints for all the regex constructs ? > I never seem to remember what does ?\\`? do, for instance. Also, > displaying the string with single backslashes and not in the way it is > actually typed in in Elisp, with all the backslash escaping, might be > helpful. Would there be a demand for such a tool larger than one > person?) > Best, > -- > Marcin Borkowski > http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski > Faculty of Mathematics and Computer Science > Adam Mickiewicz University -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-03-18 16:40 ` Alan Mackenzie @ 2015-03-19 8:15 ` Tassilo Horn 0 siblings, 0 replies; 24+ messages in thread From: Tassilo Horn @ 2015-03-19 8:15 UTC (permalink / raw) To: Alan Mackenzie; +Cc: help-gnu-emacs Alan Mackenzie <acm@muc.de> writes: > I wrote myself the following tool. It's not production quality, but > you might find it useful nonetheless. To use it, Type > > M-: (pp-regexp re-horror). > > It displays the regexp at the end of the *scratch* buffer, dropping > the contents of any \(..\) construct by one line. Interesting idea, and it helps a bit. What would be really cool was a transformation from regexp to rx form. Oh, and that seems to exist already (available from Marmalade and MELPA)! https://github.com/joddie/pcre2el Example: --8<---------------cut here---------------start------------->8--- (rxt-elisp-to-rx "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'") ;; Evals to... (seq bos (\? (or (seq "\\" (any "[" "(")) (+ "$"))) (submatch (*\? nonl)) (\? (or (seq "\\" (any ")" "]")) (+ "$"))) eos) --8<---------------cut here---------------end--------------->8--- Bye, Tassilo ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org> 2015-03-13 22:46 ` Emanuel Berg 2015-03-18 16:40 ` Alan Mackenzie @ 2015-04-25 4:23 ` Rusi 2015-04-27 13:26 ` Julien Cubizolles 2 siblings, 1 reply; 24+ messages in thread From: Rusi @ 2015-04-25 4:23 UTC (permalink / raw) To: help-gnu-emacs On Saturday, March 14, 2015 at 3:05:55 AM UTC+5:30, Marcin Borkowski wrote: > Hi all, > > so I have this monstrosity [note: I know, there are much worse ones, > too!]: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" <details snipped> > > Any ideas? Just saw this http://crowding.github.io/blog/2014/09/09/editing-regexes-interactively-in-emacs/ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: How to grok a complicated regex? 2015-04-25 4:23 ` Rusi @ 2015-04-27 13:26 ` Julien Cubizolles 0 siblings, 0 replies; 24+ messages in thread From: Julien Cubizolles @ 2015-04-27 13:26 UTC (permalink / raw) To: help-gnu-emacs Rusi <rustompmody@gmail.com> writes: > Just saw this > http://crowding.github.io/blog/2014/09/09/editing-regexes-interactively-in-emacs/ For helm users, helm-regexp can be useful too, allows one to save as sexp, run a query-replace-regexp from the builder. ^ permalink raw reply [flat|nested] 24+ messages in thread
* How to grok a complicated regex? @ 2015-03-14 8:16 martin rudalics 0 siblings, 0 replies; 24+ messages in thread From: martin rudalics @ 2015-03-14 8:16 UTC (permalink / raw) To: mbork; +Cc: help-gnu-emacs > so I have this monstrosity [note: I know, there are much worse ones, > too!]: > > "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'" > > (it's in the org-latex--script-size function in ox-latex.el, if you're > curious). > > I'm not asking “what does this match” – I can read it myself. But it > comes with a considerable effort. Are you aware of any tools that might > help to understand such regexen? You might want to try regexp-lock.el which you can find here: https://lists.gnu.org/archive/html/emacs-devel/2014-10/msg00688.html Eventually it should also appear on ELPA but I have to polish up some things first. Sincerely, martin ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-04-27 13:26 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-13 21:35 How to grok a complicated regex? Marcin Borkowski 2015-03-13 21:45 ` Marcin Borkowski 2015-03-13 21:47 ` Alexis 2015-03-13 21:57 ` Marcin Borkowski 2015-03-23 12:18 ` Vaidheeswaran C [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org> 2015-03-13 22:46 ` Emanuel Berg 2015-03-13 23:16 ` Marcin Borkowski 2015-03-14 0:12 ` Rasmus 2015-03-14 13:18 ` Stefan Monnier [not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org> 2015-03-15 4:31 ` Rusi 2015-03-22 2:29 ` Tom Tromey 2015-03-22 2:44 ` Rasmus 2015-03-14 5:14 ` Yuri Khan 2015-03-14 7:03 ` Drew Adams [not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org> 2015-03-14 3:58 ` Emanuel Berg 2015-03-14 4:44 ` Emanuel Berg 2015-03-14 4:58 ` Emanuel Berg 2015-03-14 8:43 ` Thien-Thi Nguyen [not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org> 2015-03-20 1:05 ` Emanuel Berg 2015-03-18 16:40 ` Alan Mackenzie 2015-03-19 8:15 ` Tassilo Horn 2015-04-25 4:23 ` Rusi 2015-04-27 13:26 ` Julien Cubizolles -- strict thread matches above, loose matches on Subject: below -- 2015-03-14 8:16 martin rudalics
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).