all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: emacs-devel@gnu.org
Subject: Re: Regular expression libraries
Date: Fri, 16 Dec 2016 00:15:18 -0500	[thread overview]
Message-ID: <jwvfulolb18.fsf-monnier+gmane.emacs.devel@gnu.org> (raw)
In-Reply-To: 01d7e608-04d2-84a4-6143-e954bc9d569f@mit.edu

> * Emacs has special regexp features based on syntax classes, or the position of the point.  These features don't seem to be supported in gnulib nor glibc
> * Emacs uses a gap buffer, whereas gnulib and glibc expects a char* for the subject string

The gnulib regexp code used to be a derivative of the code we have in
Emacs (or rather they both derived from the same codebase).
Since then it's been replaced with a completely different codebase.

But IIRC the two did merge back (or close enough) before gnulib' code
was replaced with something else.

> Stefan also expressed a desire to move to a better regular expression
> engine — ideally one that uses a non-backtracking implementation when
> possible, to avoid the exponential blowups that Emacs currently
> suffers from on certain regular expressions.

From my point of view, the main problem with the current code is the
blowup in pathological cases, and it's the only one that justifies moving
to another implementation.

Some users may want to change the engine so as to get new features, but
FWIW, I think it'd be much easier to just add the features to the
current engine.  The main reason such features are currently lacking is
that I've been resisting their addition because I don't want such
features to later make it harder to switch to a non-backtracking engine.

> * There are only two usable, non-backtracking regexp libraries:

What about the one in glibc?

>   - TRE (http://laurikari.net/tre/) uses backtracking if needed, but
>     otherwise uses a finite automaton.  It's stable, can be relatively
>     easily linked into Emacs, and performance is similar to what we
>     have (I benchmarked it).  It does support matching on a gap buffer
>     (it has an interface to use the moral equivalent of a random
>     iterator, reguexec).  It doesn't support Emacs' native encoding
>     out of the box, though, and the library is unmaintained; there are
>     multiple forks which fix various security vulnerabilities.

So, this would be acceptable, but would mean we'd still use "our own
code" and maintain it ourselves.

> If I understand our current implementation, our regexp.c functions
> take the moral equivalent of two strings, and match over that,
> pretending that it's just one large string.  In practice, these two
> strings are the two halves of the gap buffer.  Correct?

That's right.


        Stefan




  parent reply	other threads:[~2016-12-16  5:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-15 19:00 Regular expression libraries Clément Pit--Claudel
2016-12-15 20:10 ` Eli Zaretskii
2016-12-15 20:30 ` Paul Eggert
2016-12-15 22:00   ` Andreas Schwab
2016-12-16  7:20     ` Paul Eggert
2016-12-16 14:31       ` Clément Pit--Claudel
2016-12-16 14:54       ` Clément Pit--Claudel
2016-12-16 15:42         ` Lars Ingebrigtsen
2016-12-16 20:06           ` Clément Pit--Claudel
2016-12-16 21:25             ` Eli Zaretskii
2016-12-16 17:43         ` Paul Eggert
2016-12-15 22:16   ` Clément Pit--Claudel
2016-12-16  5:15 ` Stefan Monnier [this message]
2016-12-16 14:41   ` Clément Pit--Claudel
2016-12-16 17:45     ` Paul Eggert
2016-12-16 20:07       ` Clément Pit--Claudel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvfulolb18.fsf-monnier+gmane.emacs.devel@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.