From: Alan Mackenzie <acm@muc.de>
To: help-gnu-emacs@gnu.org
Subject: Re: How to grok a complicated regex?
Date: Wed, 18 Mar 2015 16:40:35 +0000 (UTC) [thread overview]
Message-ID: <mec9q3$g8o$1@colin.muc.de> (raw)
In-Reply-To: mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org
Hi, Marcin.
Sorry if I'm a bit late to this discussion.
Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:
> Hi all,
> so I have this monstrosity [note: I know, there are much worse ones,
> too!]:
> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
> (it's in the org-latex--script-size function in ox-latex.el, if you're
> curious).
> I'm not asking ?what does this match? ? I can read it myself. But it
> comes with a considerable effort. Are you aware of any tools that might
> help to understand such regexen?
> I know about re-builder, but it?s well suited for constructing a regex
> matching a given string, not the other way round.
> For instance, show-paren-mode does not really help here, since it seems
> to pair ?\\(? with unescaped ?)?.
> Any ideas?
I wrote myself the following tool. It's not production quality, but you
might find it useful nonetheless. To use it, Type
M-: (pp-regexp re-horror).
It displays the regexp at the end of the *scratch* buffer, dropping the
contents of any \(..\) construct by one line. I find it useful. So might
you. Feel free to adapt it, or pass it on to other people.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun pp-regexp (regexp)
"Pretty print a regexp. This means, contents of \\\\\(s are lowered a line."
(or (stringp regexp) (error "parameter is not a string."))
(let ((depth 0)
(re (replace-regexp-in-string
"[\t\n\r\f]"
(lambda (s)
(or (cdr (assoc s '(("\t" . "??")
("\n" . "??")
("\r" . "??"))))
"??"))
regexp))
(start 0) ; earliest position still without an acm-depth property.
(pos 0) ; current analysis position.
(max-depth 0) ; How many lines do we need to print?
(min-depth 0) ; Pick up "negative depth" errors.
pr-line ; output line being constructed
line-no ; line number of pr-line, varies between min-depth and max-depth.
ch
)
;(translate-rnt re)
;; apply acm-depth properties to the whole string.
(while (< start (length re))
(setq pos (string-match ;; "\\\\\\((\\(\\?:\\)?\\||\\|)\\)"
"\\\\\\(\\\\\\|(\\(\\?:\\)?\\||\\|)\\)"
re start))
(put-text-property start (or pos (length re)) 'acm-depth depth re)
(when pos
(setq ch (aref (match-string 1 re) 0))
(cond
((eq ch ?\\)
(put-text-property pos (match-end 1) 'acm-depth depth re))
((eq ch ?\()
(put-text-property pos (match-end 1) 'acm-depth depth re)
(setq depth (1+ depth))
(if (> depth max-depth) (setq max-depth depth)))
((eq ch ?\|)
(put-text-property pos (match-end 1) 'acm-depth (1- depth) re)
(if (< (1- depth) min-depth) (setq min-depth (1- depth))))
(t ; (eq ch ?\))
(setq depth (1- depth))
(if (< depth min-depth) (setq min-depth depth))
(put-text-property pos (match-end 1) 'acm-depth depth re))))
(setq start (if pos (match-end 1) (length re))))
;; print out the strings
(setq line-no min-depth)
(while (<= line-no max-depth)
(with-current-buffer "*scratch*"
(goto-char (point-max)) (insert ?\n)
(setq pr-line "")
(setq start 0)
(while (< start (length re))
(setq pos (next-single-property-change start 'acm-depth re (length re)))
(setq depth (get-text-property start 'acm-depth re))
(setq pr-line
(concat pr-line
(if (= depth line-no)
(substring re start pos)
(make-string (- pos start) ?\ ))))
(setq start pos))
(insert pr-line)
(setq line-no (1+ line-no))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (Note: if there are no such tools, I might be tempted to craft one. Two
> things that come to my mind are proper highlighting of matching parens
> of various kinds and eldoc-like hints for all the regex constructs ?
> I never seem to remember what does ?\\`? do, for instance. Also,
> displaying the string with single backslashes and not in the way it is
> actually typed in in Elisp, with all the backslash escaping, might be
> helpful. Would there be a demand for such a tool larger than one
> person?)
> Best,
> --
> Marcin Borkowski
> http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
> Faculty of Mathematics and Computer Science
> Adam Mickiewicz University
--
Alan Mackenzie (Nuremberg, Germany).
next prev parent reply other threads:[~2015-03-18 16:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org>
2015-03-13 22:46 ` How to grok a complicated regex? Emanuel Berg
2015-03-13 23:16 ` Marcin Borkowski
2015-03-14 0:12 ` Rasmus
2015-03-14 13:18 ` Stefan Monnier
[not found] ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org>
2015-03-15 4:31 ` Rusi
2015-03-22 2:29 ` Tom Tromey
2015-03-22 2:44 ` Rasmus
2015-03-14 5:14 ` Yuri Khan
2015-03-14 7:03 ` Drew Adams
[not found] ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org>
2015-03-14 3:58 ` Emanuel Berg
2015-03-14 4:44 ` Emanuel Berg
2015-03-14 4:58 ` Emanuel Berg
2015-03-14 8:43 ` Thien-Thi Nguyen
[not found] ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org>
2015-03-20 1:05 ` Emanuel Berg
2015-03-18 16:40 ` Alan Mackenzie [this message]
2015-03-19 8:15 ` Tassilo Horn
2015-04-25 4:23 ` Rusi
2015-04-27 13:26 ` Julien Cubizolles
2015-03-14 8:16 martin rudalics
-- strict thread matches above, loose matches on Subject: below --
2015-03-13 21:35 Marcin Borkowski
2015-03-13 21:45 ` Marcin Borkowski
2015-03-13 21:47 ` Alexis
2015-03-13 21:57 ` Marcin Borkowski
2015-03-23 12:18 ` Vaidheeswaran C
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='mec9q3$g8o$1@colin.muc.de' \
--to=acm@muc.de \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).