From: Marcin Borkowski <mbork@wmi.amu.edu.pl>
To: help-gnu-emacs@gnu.org
Subject: Re: How to grok a complicated regex?
Date: Sat, 14 Mar 2015 00:16:50 +0100 [thread overview]
Message-ID: <87egosa3od.fsf@wmi.amu.edu.pl> (raw)
In-Reply-To: <87twxo1pnr.fsf@debian.uxu>
On 2015-03-13, at 23:46, Emanuel Berg <embe8573@student.uu.se> wrote:
> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
>
>> so I have this monstrosity [note: I know, there are
>> much worse ones, too!]:
>>
>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>>
>> (it's in the org-latex--script-size function in
>> ox-latex.el, if you're curious).
>>
>> I'm not asking “what does this match” – I can read
>> it myself. But it comes with a considerable effort.
>
> I dare say most people (even programmers) cannot read
> that so if you can that's great. As a math
Really? It's not /that/ difficult. You only need enough coffee (or
tea, in my case), time and motivation. You don’t need a genius, or even
IQ higher than, say, 90 or so. It's not really /difficult/.
Intimidating, yes. Boring, possibly. Laborious (and mechanical), yes.
But not /difficult/.
> professional you are of course aware of the discipline
> called automata theory that deals with such things.
Well, as an analyst working in metric fixed point theory, that's just
it. I'm /aware/ of automata theory – (almost) nothing more. ;-)
> Perhaps relational algebra might help to, if the data
> in the sets are strings. But automata theory should be
> it even more.
>
> Also, remember you don't have to understand those
> expressions. Often they are setup incrementally. They
> only need to be correct. The computer understands them
> - the programmer only understands the purpose, and the
> latest edition. Kind of risky, perhaps not what I math
> person would be appealed by, but I've constructed many
> that way so I know that method works.
That reminds me of the von Neumann quote: “In mathematics, you don’t
/understand/ things – you just /get used/ to them.”
>> Are you aware of any tools that might help to
>> understand such regexen?
>
> I have seen tools with which you can construct such
> expressions and they output figures, states,
> transitions, and so on. I wonder how advanced
> expression they can deal with? But if you get the
> basics right, it should be just basic building blocks
> that stick together and from there on the sky is the
> limit.
>
> Instead the problem is, as I see it: will those
> figures, balls and arrows, tagged with preconditions,
> postconditions, everything you can think of, will that
> actually be *clearer*?
As we both point out, I’m not talking about changing the representation,
but about making the existing one (which I agree is not /that/ bad) more
comprehensible. Font lock, grouping and unescaping backslashes would be
definitely helpful.
OTOH, I can imagine that some kind of diagrams might be helpful for
someone. The point is, in the end you have to read/write these regexen
in their normal form anyway, so why not train yourself to understand
their “default” representation instead of adding the burden of
translationg between representations?
> If I were to do it (which I am not thanks god) my
> answer would be *no*. The only way I could do it would
> instead be the opposite. Train the brain with such
> expressions - exactly as they are - day in, day out,
> until they are second nature.
>
> Example: a C++ OO project with classes and everything.
> Silly inheritance and interfaces. Some people would
> consider those pretty darn difficult to understand.
> But to the seasoned C++ programmer (no exaggerating
> here, a few years of focused training is enough) those
> programs are clear. For those guys, giving up writing
> C++ code and instead using some other representation
> (be it graphical or not) would be to in one stroke
> cripple their skills.
>
> So no, I think that representation is the best there
> is. To translate it back and forth would not only be
I’m not sure whether it’s the best – but it’s a standard (more or less,
Emacs’ regexen are not really “standard” by today’s, well, standards –
but hardly anything about Emacs is “standard” or “typical”, so who
cares;-)).
> very difficult to do - and even if possible, which of
I disagree. I don’t think that such a translator would be a difficult
one to write.
If only I was a student again, with plenty of spare time, I might have
taken the challenge and tried to write one in TeX, so that some TeX
macro, given an (Emacs) regex would produce a nicely typeset diagram.
Wow, what a nice project for a bachelor’s thesis. Wait a minute.
Ohboyohboyohboy. I have to put this in my faculty’s database of
potential topics. Poor students... ;-)
(BTW, I did once write a poor man’s parser in pure TeX; since there were
no regex engine written in TeX back then (now there is one!), I had to
craft a simple automaton myself. Not an extremely pleasant work...)
> course it is, because a representation is just a
> representation of I don't know how many possible - I
> don't see the end result being any more clear: on the
> contrary, most likely.
>
> What I would do - try to get it more readable by using
> classes, string classes (do they exist?), and even
> more advanced constructs if necessary - as in this
> simple example:
>
> (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)")
>
> How do you define those? Can you identify any which
> aren't there, but could/should be?
>
> Example: say there is a class called "delimiters"
> which contain [, (, {, <, >, }, ), and ]. Can you
> split that up, in "opening-delimiters" and closing
> ditto?
>
> Second, exactly you mentioned - the font lock issue -
> work on that.
>
> You do know, of course, of
>
> font-lock-regexp-grouping-construct
> font-lock-regexp-grouping-backslash
>
> Are there more of those, that you can identify, and
> add?
There could be quite a few. (As Alexis pointed out, a tool I was
writing about seems to exist – if it’s not satisfactory, I could think
about extending it somehow. Not very probable, though – I’m too busy
now. If only someone could be paying me for goofing around and playing
with Emacs hacks...)
Thanks for your input, and best regards!
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
next prev parent reply other threads:[~2015-03-13 23:16 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org>
2015-03-13 22:46 ` How to grok a complicated regex? Emanuel Berg
2015-03-13 23:16 ` Marcin Borkowski [this message]
2015-03-14 0:12 ` Rasmus
2015-03-14 13:18 ` Stefan Monnier
[not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org>
2015-03-15 4:31 ` Rusi
2015-03-22 2:29 ` Tom Tromey
2015-03-22 2:44 ` Rasmus
2015-03-14 5:14 ` Yuri Khan
2015-03-14 7:03 ` Drew Adams
[not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org>
2015-03-14 3:58 ` Emanuel Berg
2015-03-14 4:44 ` Emanuel Berg
2015-03-14 4:58 ` Emanuel Berg
2015-03-14 8:43 ` Thien-Thi Nguyen
[not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org>
2015-03-20 1:05 ` Emanuel Berg
2015-03-18 16:40 ` Alan Mackenzie
2015-03-19 8:15 ` Tassilo Horn
2015-04-25 4:23 ` Rusi
2015-04-27 13:26 ` Julien Cubizolles
2015-03-14 8:16 martin rudalics
-- strict thread matches above, loose matches on Subject: below --
2015-03-13 21:35 Marcin Borkowski
2015-03-13 21:45 ` Marcin Borkowski
2015-03-13 21:47 ` Alexis
2015-03-13 21:57 ` Marcin Borkowski
2015-03-23 12:18 ` Vaidheeswaran C
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87egosa3od.fsf@wmi.amu.edu.pl \
--to=mbork@wmi.amu.edu.pl \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).