unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Marcin Borkowski <mbork@wmi.amu.edu.pl>
To: help-gnu-emacs@gnu.org
Subject: Re: How to grok a complicated regex?
Date: Sat, 14 Mar 2015 00:16:50 +0100	[thread overview]
Message-ID: <87egosa3od.fsf@wmi.amu.edu.pl> (raw)
In-Reply-To: <87twxo1pnr.fsf@debian.uxu>


On 2015-03-13, at 23:46, Emanuel Berg <embe8573@student.uu.se> wrote:

> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
>
>> so I have this monstrosity [note: I know, there are
>> much worse ones, too!]:
>>
>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>>
>> (it's in the org-latex--script-size function in
>> ox-latex.el, if you're curious).
>>
>> I'm not asking “what does this match” – I can read
>> it myself. But it comes with a considerable effort.
>
> I dare say most people (even programmers) cannot read
> that so if you can that's great. As a math

Really?  It's not /that/ difficult.  You only need enough coffee (or
tea, in my case), time and motivation.  You don’t need a genius, or even
IQ higher than, say, 90 or so.  It's not really /difficult/.
Intimidating, yes.  Boring, possibly.  Laborious (and mechanical), yes.
But not /difficult/.

> professional you are of course aware of the discipline
> called automata theory that deals with such things.

Well, as an analyst working in metric fixed point theory, that's just
it.  I'm /aware/ of automata theory – (almost) nothing more. ;-)

> Perhaps relational algebra might help to, if the data
> in the sets are strings. But automata theory should be
> it even more.
>
> Also, remember you don't have to understand those
> expressions. Often they are setup incrementally. They
> only need to be correct. The computer understands them
> - the programmer only understands the purpose, and the
> latest edition. Kind of risky, perhaps not what I math
> person would be appealed by, but I've constructed many
> that way so I know that method works.

That reminds me of the von Neumann quote: “In mathematics, you don’t
/understand/ things – you just /get used/ to them.”

>> Are you aware of any tools that might help to
>> understand such regexen?
>
> I have seen tools with which you can construct such
> expressions and they output figures, states,
> transitions, and so on. I wonder how advanced
> expression they can deal with? But if you get the
> basics right, it should be just basic building blocks
> that stick together and from there on the sky is the
> limit.
>
> Instead the problem is, as I see it: will those
> figures, balls and arrows, tagged with preconditions,
> postconditions, everything you can think of, will that
> actually be *clearer*?

As we both point out, I’m not talking about changing the representation,
but about making the existing one (which I agree is not /that/ bad) more
comprehensible.  Font lock, grouping and unescaping backslashes would be
definitely helpful.

OTOH, I can imagine that some kind of diagrams might be helpful for
someone.  The point is, in the end you have to read/write these regexen
in their normal form anyway, so why not train yourself to understand
their “default” representation instead of adding the burden of
translationg between representations?

> If I were to do it (which I am not thanks god) my
> answer would be *no*. The only way I could do it would
> instead be the opposite. Train the brain with such
> expressions - exactly as they are - day in, day out,
> until they are second nature.
>
> Example: a C++ OO project with classes and everything.
> Silly inheritance and interfaces. Some people would
> consider those pretty darn difficult to understand.
> But to the seasoned C++ programmer (no exaggerating
> here, a few years of focused training is enough) those
> programs are clear. For those guys, giving up writing
> C++ code and instead using some other representation
> (be it graphical or not) would be to in one stroke
> cripple their skills.
>
> So no, I think that representation is the best there
> is. To translate it back and forth would not only be

I’m not sure whether it’s the best – but it’s a standard (more or less,
Emacs’ regexen are not really “standard” by today’s, well, standards –
but hardly anything about Emacs is “standard” or “typical”, so who
cares;-)).

> very difficult to do - and even if possible, which of

I disagree.  I don’t think that such a translator would be a difficult
one to write.

If only I was a student again, with plenty of spare time, I might have
taken the challenge and tried to write one in TeX, so that some TeX
macro, given an (Emacs) regex would produce a nicely typeset diagram.

Wow, what a nice project for a bachelor’s thesis.  Wait a minute.
Ohboyohboyohboy.  I have to put this in my faculty’s database of
potential topics.  Poor students... ;-)

(BTW, I did once write a poor man’s parser in pure TeX; since there were
no regex engine written in TeX back then (now there is one!), I had to
craft a simple automaton myself.  Not an extremely pleasant work...)

> course it is, because a representation is just a
> representation of I don't know how many possible - I
> don't see the end result being any more clear: on the
> contrary, most likely.
>
> What I would do - try to get it more readable by using
> classes, string classes (do they exist?), and even
> more advanced constructs if necessary - as in this
> simple example:
>
>     (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)")
>
> How do you define those? Can you identify any which
> aren't there, but could/should be?
>
> Example: say there is a class called "delimiters"
> which contain [, (, {, <, >, }, ), and ]. Can you
> split that up, in "opening-delimiters" and closing
> ditto?
>
> Second, exactly you mentioned - the font lock issue -
> work on that.
>
> You do know, of course, of
>
>     font-lock-regexp-grouping-construct
>     font-lock-regexp-grouping-backslash
>
> Are there more of those, that you can identify, and
> add?

There could be quite a few.  (As Alexis pointed out, a tool I was
writing about seems to exist – if it’s not satisfactory, I could think
about extending it somehow.  Not very probable, though – I’m too busy
now.  If only someone could be paying me for goofing around and playing
with Emacs hacks...)

Thanks for your input, and best regards!

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



  reply	other threads:[~2015-03-13 23:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org>
2015-03-13 22:46 ` How to grok a complicated regex? Emanuel Berg
2015-03-13 23:16   ` Marcin Borkowski [this message]
2015-03-14  0:12     ` Rasmus
2015-03-14 13:18       ` Stefan Monnier
     [not found]       ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org>
2015-03-15  4:31         ` Rusi
2015-03-22  2:29       ` Tom Tromey
2015-03-22  2:44         ` Rasmus
2015-03-14  5:14     ` Yuri Khan
2015-03-14  7:03     ` Drew Adams
     [not found]   ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org>
2015-03-14  3:58     ` Emanuel Berg
2015-03-14  4:44       ` Emanuel Berg
2015-03-14  4:58         ` Emanuel Berg
2015-03-14  8:43         ` Thien-Thi Nguyen
     [not found]         ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org>
2015-03-20  1:05           ` Emanuel Berg
2015-03-18 16:40 ` Alan Mackenzie
2015-03-19  8:15   ` Tassilo Horn
2015-04-25  4:23 ` Rusi
2015-04-27 13:26   ` Julien Cubizolles
2015-03-14  8:16 martin rudalics
  -- strict thread matches above, loose matches on Subject: below --
2015-03-13 21:35 Marcin Borkowski
2015-03-13 21:45 ` Marcin Borkowski
2015-03-13 21:47 ` Alexis
2015-03-13 21:57   ` Marcin Borkowski
2015-03-23 12:18 ` Vaidheeswaran C

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87egosa3od.fsf@wmi.amu.edu.pl \
    --to=mbork@wmi.amu.edu.pl \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).