From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Daniel Colascione <dancol@dancol.org>
Cc: emacs-devel@gnu.org, Colin Fraizer <emacs-devel@cfraizer.com>,
t.matsuyama.pub@gmail.com
Subject: Re: Patch for lookaround assertion in regexp
Date: Tue, 14 Feb 2012 13:36:32 -0500 [thread overview]
Message-ID: <jwvaa4ls24n.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <4F3A9F83.8060307@dancol.org> (Daniel Colascione's message of "Tue, 14 Feb 2012 09:53:07 -0800")
> Implementing a fully general NFA-based regular expression matching
> engine that support submatches is hard. The only two useful
> implementations of which I'm aware are RE2 and Ville Laurikari's TRE,
> both of which are two-clause BSD licensed. Laurikari wrote his thesis
> [2] on the latter. TRE is the better of the two libraries, IMHO,
> because it's single-pass and can work over arbitrary kinds of input
> stream (like characters in a gap buffer). TRE's approximate matching
> support is occasionally useful as well.
I'm familiar with the work, yes. TRE seemed like the best option last
time I looked around.
> That said, I'd actually prefer to head in the other direction and
> allow users to express arbitrarily rich grammars using an extended rx
> syntax.
I think that would be orthogonal: we want regexp support because it's
efficient (yes, our current implementation is super slow in some cases,
but it's also efficient in many important cases).
I also would like a new regexp engine to fix the "backward matching"
problem so that looking-back can work the way most people would expect,
and doesn't need a `greedy' hack. The fact that regexps are symmetric
is a very neat property (operator precedence grammars enjoy the same
property, which is one of the reasons why I chose them as the basis for
SMIE).
> The idea is basically to support parser combinator grammars,
> which can be efficiently matched using a scannerless GRL parser, which
> is O(N^3) time, or a memozing and backtracking "packrat" parser, which
> is O(N) time and O(n) space. The end result would look a bit like Perl
> 6's rules.
While these are algorithmically reasonably efficient, it can be
difficult to make them as efficient as a regexp-only matcher for many
typical regexps. Also it can be difficult to make them work backwards.
IOW I don't think that can replace regexps given the amount of regexps
out there we have to support.
Stefan
next prev parent reply other threads:[~2012-02-14 18:36 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-23 11:17 Patch for lookaround assertion in regexp Colin Fraizer
2012-01-23 14:11 ` Stefan Monnier
2012-01-23 14:44 ` Tom
2012-01-23 14:50 ` Andreas Schwab
2012-01-23 15:19 ` Tom
2012-01-23 16:14 ` Andreas Schwab
2012-01-23 17:11 ` Stefan Monnier
2012-01-23 18:45 ` Štěpán Němec
2012-01-30 0:31 ` Juri Linkov
2012-01-23 15:31 ` Stefan Monnier
2012-01-24 8:41 ` Nikolai Weibull
2012-01-24 14:40 ` Stefan Monnier
2012-01-24 15:09 ` Nikolai Weibull
2012-01-24 17:34 ` Stefan Monnier
2012-01-24 23:27 ` David De La Harpe Golden
2012-01-25 6:07 ` Nikolai Weibull
2012-02-14 17:53 ` Daniel Colascione
2012-02-14 18:36 ` Stefan Monnier [this message]
2012-02-20 16:19 ` Dimitri Fontaine
2012-09-26 6:55 ` Tomohiro Matsuyama
-- strict thread matches above, loose matches on Subject: below --
2009-06-03 23:04 Tomohiro MATSUYAMA
2009-06-04 4:47 ` Miles Bader
2009-06-04 8:27 ` Deniz Dogan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvaa4ls24n.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=dancol@dancol.org \
--cc=emacs-devel@cfraizer.com \
--cc=emacs-devel@gnu.org \
--cc=t.matsuyama.pub@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.