unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: help-gnu-emacs@gnu.org
Subject: Re: How to grok a complicated regex?
Date: Wed, 18 Mar 2015 16:40:35 +0000 (UTC)	[thread overview]
Message-ID: <mec9q3$g8o$1@colin.muc.de> (raw)
In-Reply-To: mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org

Hi, Marcin.

Sorry if I'm a bit late to this discussion.

Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:
> Hi all,

> so I have this monstrosity [note: I know, there are much worse ones,
> too!]:

> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"

> (it's in the org-latex--script-size function in ox-latex.el, if you're
> curious).

> I'm not asking ?what does this match? ? I can read it myself.  But it
> comes with a considerable effort.  Are you aware of any tools that might
> help to understand such regexen?

> I know about re-builder, but it?s well suited for constructing a regex
> matching a given string, not the other way round.

> For instance, show-paren-mode does not really help here, since it seems
> to pair ?\\(? with unescaped ?)?.

> Any ideas?

I wrote myself the following tool.  It's not production quality, but you
might find it useful nonetheless.  To use it, Type

     M-: (pp-regexp re-horror).

It displays the regexp at the end of the *scratch* buffer, dropping the
contents of any \(..\) construct by one line.  I find it useful.  So might
you.  Feel free to adapt it, or pass it on to other people.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun pp-regexp (regexp)
  "Pretty print a regexp.  This means, contents of \\\\\(s are lowered a line."
  (or (stringp regexp) (error "parameter is not a string."))
  (let ((depth 0)
        (re (replace-regexp-in-string
             "[\t\n\r\f]"
             (lambda (s)
               (or (cdr (assoc s '(("\t" . "??")
                                   ("\n" . "??")
                                   ("\r" . "??"))))
                   "??"))
             regexp))
        (start 0)     ; earliest position still without an acm-depth property.
        (pos 0)       ; current analysis position.
        (max-depth 0) ; How many lines do we need to print?
        (min-depth 0) ; Pick up "negative depth" errors.
        pr-line       ; output line being constructed
        line-no ; line number of pr-line, varies between min-depth and max-depth.
        ch
        )
    ;(translate-rnt re)
    ;; apply acm-depth properties to the whole string.
    (while (< start (length re))
      (setq pos (string-match ;; "\\\\\\((\\(\\?:\\)?\\||\\|)\\)"
                 "\\\\\\(\\\\\\|(\\(\\?:\\)?\\||\\|)\\)"
                                  re start))
      (put-text-property start (or pos (length re)) 'acm-depth depth re)
      (when pos
        (setq ch (aref (match-string 1 re) 0))
        (cond
         ((eq ch ?\\)
          (put-text-property pos (match-end 1) 'acm-depth depth re))
         ((eq ch ?\()
          (put-text-property pos (match-end 1) 'acm-depth depth re)
          (setq depth (1+ depth))
          (if (> depth max-depth) (setq max-depth depth)))

         ((eq ch ?\|)
          (put-text-property pos (match-end 1) 'acm-depth (1- depth) re)
          (if (< (1- depth) min-depth) (setq min-depth (1- depth))))

         (t                             ; (eq ch ?\))
          (setq depth (1- depth))
          (if (< depth min-depth) (setq min-depth depth))
          (put-text-property pos (match-end 1) 'acm-depth depth re))))
      (setq start (if pos (match-end 1) (length re))))

    ;; print out the strings
    (setq line-no min-depth)
    (while (<= line-no max-depth)
      (with-current-buffer "*scratch*"
        (goto-char (point-max)) (insert ?\n)
        (setq pr-line "")
        (setq start 0)
        (while (< start (length re))
          (setq pos (next-single-property-change start 'acm-depth re (length re)))
          (setq depth (get-text-property start 'acm-depth re))
          (setq pr-line
                (concat pr-line
                        (if (= depth line-no)
                            (substring re start pos)
                          (make-string (- pos start) ?\ ))))
          (setq start pos))
        (insert pr-line)
        (setq line-no (1+ line-no))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

> (Note: if there are no such tools, I might be tempted to craft one.  Two
> things that come to my mind are proper highlighting of matching parens
> of various kinds and eldoc-like hints for all the regex constructs ?
> I never seem to remember what does ?\\`? do, for instance.  Also,
> displaying the string with single backslashes and not in the way it is
> actually typed in in Elisp, with all the backslash escaping, might be
> helpful.  Would there be a demand for such a tool larger than one
> person?)

> Best,

> -- 
> Marcin Borkowski
> http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
> Faculty of Mathematics and Computer Science
> Adam Mickiewicz University

-- 
Alan Mackenzie (Nuremberg, Germany).



  parent reply	other threads:[~2015-03-18 16:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.1979.1426282552.31049.help-gnu-emacs@gnu.org>
2015-03-13 22:46 ` How to grok a complicated regex? Emanuel Berg
2015-03-13 23:16   ` Marcin Borkowski
2015-03-14  0:12     ` Rasmus
2015-03-14 13:18       ` Stefan Monnier
     [not found]       ` <mailman.2003.1426339118.31049.help-gnu-emacs@gnu.org>
2015-03-15  4:31         ` Rusi
2015-03-22  2:29       ` Tom Tromey
2015-03-22  2:44         ` Rasmus
2015-03-14  5:14     ` Yuri Khan
2015-03-14  7:03     ` Drew Adams
     [not found]   ` <mailman.1984.1426288628.31049.help-gnu-emacs@gnu.org>
2015-03-14  3:58     ` Emanuel Berg
2015-03-14  4:44       ` Emanuel Berg
2015-03-14  4:58         ` Emanuel Berg
2015-03-14  8:43         ` Thien-Thi Nguyen
     [not found]         ` <mailman.1997.1426324089.31049.help-gnu-emacs@gnu.org>
2015-03-20  1:05           ` Emanuel Berg
2015-03-18 16:40 ` Alan Mackenzie [this message]
2015-03-19  8:15   ` Tassilo Horn
2015-04-25  4:23 ` Rusi
2015-04-27 13:26   ` Julien Cubizolles
2015-03-14  8:16 martin rudalics
  -- strict thread matches above, loose matches on Subject: below --
2015-03-13 21:35 Marcin Borkowski
2015-03-13 21:45 ` Marcin Borkowski
2015-03-13 21:47 ` Alexis
2015-03-13 21:57   ` Marcin Borkowski
2015-03-23 12:18 ` Vaidheeswaran C

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='mec9q3$g8o$1@colin.muc.de' \
    --to=acm@muc.de \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).