all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Miguel V. S. Frasson" <mvsfrasson@gmail.com>
To: emacs-devel@gnu.org
Subject: Improving regexp-opt
Date: Thu, 7 Feb 2019 14:41:04 -0200	[thread overview]
Message-ID: <CAARdmY1m2+ROr+orDMoUK7zXkFbf94o-89z6bATjzPuoVt=nBA@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]

Hi

I had an idea to improve regexp-opt. (I use Emacs 25.3.2).

In a regexp when you have a group with alternatives, sometimes all
alternatives *finish* with one or more common atom regexps. You could take
the common part out of the group and try to improve the remaining smaller
group, splitting all strings that match and recurse regexp-opt.

Example 1: we read from regexp-opt.el:
;; One possible improvement would be to compile '("aa" "ab" "ba" "bb")
;; into "[ab][ab]" rather than "a[ab]\\|b[ab]".  I'm not sure it's worth
;; it but if someone knows how to do it without going through too many
;; contortions, I'm all ears.

Evaluating,
(regexp-opt '("aa" "ab"  "ba" "bb"))
 -> "\\(?:a[ab]\\|b[ab]\\)"

All alternatives finish with "[ab]", so it is equivalent to
  "\\(?:a\\|b\\)[ab]"

The remaining group is "\\(?:a\\|b\\)". We can further improve making a
list of all strings that match it and recourse regexp-opt:
"\\(?:a\\|b\\)"
all-matchs = ("a" "b")
(regexp-opt '("a" "b")) -> "[ab]"

Finally join the two regexps: "[ab][ab]"
... and the wish of the developer is fulfilled :)

Example 2:
(regexp-opt '("car" "cdr" "caar" "cadr" "cdar" "cddr"))
-> "\\(?:c\\(?:\\(?:a[ad]\\|d[ad]\\|[ad]\\)r\\)\\)"

First of all, there is an (apparently) unnecessary group around the result.

Look the inner group of the result:
  "\\(?:a[ad]\\|d[ad]\\|[ad]\\)"
Notice that all alternatives finish with "[ad]", so this group is
equivalent to
 "\\(?:a\\|d\\|\\)[ad]"

The smaller group matches the strings ("a" "d" "") and
  (regexp-opt '("a" "d" "")) ->"[ad]?"
and the inner group is equivalent to
 "[ad]?[ad]"

So,
  (regexp-opt '("car" "cdr" "caar" "cadr" "cdar" "cddr"))
could return (eliminating outer group)
  "c[ad]?[ad]r"

Of course the splitting and recurse not always improve smaller group. For
example,
(regexp-opt '("master" "monster" "mister"))
-> "\\(?:m\\(?:\\(?:on\\|[ai]\\)ster\\)\\)"

It would be better (imo) eliminate unnecessary groups (those without "\\|")
that the result was
  "m\\(?:on\\|[ai]\\)ster"

Cheers

Miguel Frasson

[-- Attachment #2: Type: text/html, Size: 3971 bytes --]

             reply	other threads:[~2019-02-07 16:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 16:41 Miguel V. S. Frasson [this message]
2019-02-08  3:48 ` Improving regexp-opt Stefan Monnier
2019-04-12 15:06   ` Miguel V. S. Frasson
2019-04-12 15:40     ` Miguel V. S. Frasson
2019-04-12 16:53     ` Stefan Monnier
2019-04-18  0:59       ` Miguel V. S. Frasson
2019-04-18  3:49         ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAARdmY1m2+ROr+orDMoUK7zXkFbf94o-89z6bATjzPuoVt=nBA@mail.gmail.com' \
    --to=mvsfrasson@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.