all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Daniel Colascione <dancol@dancol.org>
Cc: emacs-devel@gnu.org, Colin Fraizer <emacs-devel@cfraizer.com>,
	t.matsuyama.pub@gmail.com
Subject: Re: Patch for lookaround assertion in regexp
Date: Tue, 14 Feb 2012 13:36:32 -0500	[thread overview]
Message-ID: <jwvaa4ls24n.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <4F3A9F83.8060307@dancol.org> (Daniel Colascione's message of "Tue, 14 Feb 2012 09:53:07 -0800")

> Implementing a fully general NFA-based regular expression matching
> engine that support submatches is hard. The only two useful
> implementations of which I'm aware are RE2 and Ville Laurikari's TRE,
> both of which are two-clause BSD licensed. Laurikari wrote his thesis
> [2] on the latter. TRE is the better of the two libraries, IMHO,
> because it's single-pass and can work over arbitrary kinds of input
> stream (like characters in a gap buffer). TRE's approximate matching
> support is occasionally useful as well.

I'm familiar with the work, yes.  TRE seemed like the best option last
time I looked around.

> That said, I'd actually prefer to head in the other direction and
> allow users to express arbitrarily rich grammars using an extended rx
> syntax.

I think that would be orthogonal: we want regexp support because it's
efficient (yes, our current implementation is super slow in some cases,
but it's also efficient in many important cases).

I also would like a new regexp engine to fix the "backward matching"
problem so that looking-back can work the way most people would expect,
and doesn't need a `greedy' hack.  The fact that regexps are symmetric
is a very neat property (operator precedence grammars enjoy the same
property, which is one of the reasons why I chose them as the basis for
SMIE).

> The idea is basically to support parser combinator grammars,
> which can be efficiently matched using a scannerless GRL parser, which
> is O(N^3) time, or a memozing and backtracking "packrat" parser, which
> is O(N) time and O(n) space. The end result would look a bit like Perl
> 6's rules.

While these are algorithmically reasonably efficient, it can be
difficult to make them as efficient as a regexp-only matcher for many
typical regexps.  Also it can be difficult to make them work backwards.
IOW I don't think that can replace regexps given the amount of regexps
out there we have to support.


        Stefan



  reply	other threads:[~2012-02-14 18:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-23 11:17 Patch for lookaround assertion in regexp Colin Fraizer
2012-01-23 14:11 ` Stefan Monnier
2012-01-23 14:44   ` Tom
2012-01-23 14:50     ` Andreas Schwab
2012-01-23 15:19       ` Tom
2012-01-23 16:14         ` Andreas Schwab
2012-01-23 17:11           ` Stefan Monnier
2012-01-23 18:45             ` Štěpán Němec
2012-01-30  0:31               ` Juri Linkov
2012-01-23 15:31     ` Stefan Monnier
2012-01-24  8:41   ` Nikolai Weibull
2012-01-24 14:40     ` Stefan Monnier
2012-01-24 15:09       ` Nikolai Weibull
2012-01-24 17:34         ` Stefan Monnier
2012-01-24 23:27     ` David De La Harpe Golden
2012-01-25  6:07       ` Nikolai Weibull
2012-02-14 17:53   ` Daniel Colascione
2012-02-14 18:36     ` Stefan Monnier [this message]
2012-02-20 16:19     ` Dimitri Fontaine
2012-09-26  6:55 ` Tomohiro Matsuyama
  -- strict thread matches above, loose matches on Subject: below --
2009-06-03 23:04 Tomohiro MATSUYAMA
2009-06-04  4:47 ` Miles Bader
2009-06-04  8:27   ` Deniz Dogan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvaa4ls24n.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=dancol@dancol.org \
    --cc=emacs-devel@cfraizer.com \
    --cc=emacs-devel@gnu.org \
    --cc=t.matsuyama.pub@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.