From: Thomas Lord <lord@emf.net>
To: Helmut Eller <eller.helmut@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Compiling Elisp to a native code with a GCC plugin
Date: Wed, 15 Sep 2010 09:28:57 -0700 [thread overview]
Message-ID: <1284568137.3622.22.camel@dell-desktop.example.com> (raw)
In-Reply-To: <m2zkvjf1rs.fsf@gmail.com>
On Wed, 2010-09-15 at 17:46 +0200, Helmut Eller wrote:
> * Stefan Monnier [2010-09-15 14:59] writes:
> > IIRC regexps without back-refs can be matched (and searched) in O(N)
> > where N is the length of the input.
Not quite. Let R be the length of the pattern
and L be the length of the string we seek to match.
Assume that the pattern is a true regular expression
(no back-references, no sub-expression position
reporting, etc.)
The problem itself is O(R * L): no algorithm can
guarantee doing better than the product of the two
lengths.
There is an algorithm (compiling the pattern to a DFA
first) which is O(2^R + L). Formally it is suboptimal.
There is a catch though. For many common patterns,
either R is very small or we don't actually need
anything like 2^R steps to compile the pattern,
and so the *expected* case (for these patterns)
is O(L).
The Thompson "DFA caching algorithm" described briefly
in the Dragon compiler book is quite attractive
because it offers (for true regular expressions) a
worst case complexity of O(R * L) ... so it is optimal ...
yet also delivers much closer to O(L) for many
common patterns. The algorithm is a bit
unattractive because it is tricky to implement
well, very hard to generalize beyond true regular
expressions, and can easily lose to more naive NFA
algorithms by small but annoying constant factors.
> With back-refs, I think (not sure)
> > the theoretical bound is O(N^2), which requires
> > a non-backtracking algorithm.
> > So yes, we'd need to handle back-refs specially. Several regexp engines
> > do that already (they have a few different inner engines and choose
> > which one to use based on the particular regexp at hand).
>
> After googleing a bit I found this page
> http://swtch.com/~rsc/regexp/regexp1.html
> which again links to this
> http://perl.plover.com/NPC/NPC-3SAT.html
> which says that regexp matching with backreferences is NP-complete.
The best known algorithms are not O(N^2) but,
rather, O(2^N) in the worst case. It's dismal
how such an innocent seeming feature can cause
such havoc.
> Cox (the first page) seems to say that backtracking-with-memoization is
> linear time at the expense of O(N) space.
If he said that regarding worst case performance
and without mentioning some specific subset of
true regular expressions that he's talking about,
he must have made a mistake.
For true regular expressions, the simpler case,
you can not beat O(R * L) time as the worst case.
-t
next prev parent reply other threads:[~2010-09-15 16:28 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-14 19:12 Compiling Elisp to a native code with a GCC plugin Wojciech Meyer
2010-09-14 19:32 ` Tom Tromey
2010-09-14 19:45 ` Wojciech Meyer
2010-09-14 20:17 ` Lars Magne Ingebrigtsen
2010-09-14 20:52 ` Wojciech Meyer
2010-09-14 20:55 ` Tom Tromey
2010-09-14 21:05 ` Wojciech Meyer
2010-09-14 20:44 ` Tom Tromey
2010-09-14 21:00 ` Wojciech Meyer
2010-09-14 21:16 ` Tom Tromey
2010-09-14 21:29 ` Wojciech Meyer
2010-09-14 21:59 ` Tom Tromey
2010-09-14 22:37 ` Wojciech Meyer
2010-09-14 22:55 ` Tom Tromey
2010-09-14 23:33 ` Wojciech Meyer
2010-09-15 1:38 ` Tom Tromey
2010-09-14 22:49 ` Wojciech Meyer
2010-09-14 23:13 ` Thomas Lord
2010-09-14 23:42 ` Wojciech Meyer
2010-09-15 10:47 ` Leo
2010-09-15 11:41 ` Andreas Schwab
2010-09-15 12:10 ` Wojciech Meyer
2010-09-15 14:07 ` Stefan Monnier
2010-09-15 14:27 ` Helmut Eller
2010-09-15 14:59 ` Stefan Monnier
2010-09-15 15:09 ` Lars Magne Ingebrigtsen
2010-09-15 15:31 ` Andreas Schwab
2010-09-15 15:35 ` Lars Magne Ingebrigtsen
2010-09-15 16:28 ` Andreas Schwab
2010-09-16 16:57 ` Lars Magne Ingebrigtsen
2010-09-15 15:42 ` Stefan Monnier
2010-09-15 15:51 ` Lars Magne Ingebrigtsen
2010-09-15 15:57 ` Leo
2010-09-15 16:01 ` Lars Magne Ingebrigtsen
2010-09-15 16:05 ` David Kastrup
2010-09-15 16:23 ` Leo
2010-09-15 16:37 ` David Kastrup
2010-09-16 16:58 ` Lars Magne Ingebrigtsen
2010-09-16 21:11 ` Andreas Schwab
2010-09-16 23:17 ` Lars Magne Ingebrigtsen
2010-09-17 8:13 ` Eli Zaretskii
2010-09-17 13:17 ` Lars Magne Ingebrigtsen
2010-09-17 13:30 ` Eli Zaretskii
2010-09-17 13:34 ` Lars Magne Ingebrigtsen
2010-09-16 17:35 ` Lars Magne Ingebrigtsen
2010-09-16 2:57 ` Stephen J. Turnbull
2010-09-16 6:54 ` David Kastrup
2010-09-16 8:10 ` Stephen J. Turnbull
2010-09-16 8:31 ` David Kastrup
2010-09-16 17:01 ` Lars Magne Ingebrigtsen
2010-09-17 6:52 ` Stephen J. Turnbull
2010-09-17 13:09 ` Lars Magne Ingebrigtsen
2010-09-17 13:31 ` David Kastrup
2010-09-17 13:39 ` Lars Magne Ingebrigtsen
2010-09-17 13:55 ` David Kastrup
2010-09-17 14:18 ` Lars Magne Ingebrigtsen
2010-09-17 14:57 ` David Kastrup
2010-09-17 15:06 ` Lars Magne Ingebrigtsen
2010-09-17 15:24 ` Lars Magne Ingebrigtsen
2010-09-17 16:11 ` Eli Zaretskii
2010-09-17 16:33 ` David Kastrup
2010-09-17 16:41 ` Andreas Schwab
2010-09-17 17:17 ` David Kastrup
2010-09-17 18:24 ` David Kastrup
2010-09-17 20:30 ` David Kastrup
2010-09-17 20:49 ` Lars Magne Ingebrigtsen
2010-09-18 4:31 ` David Kastrup
2010-09-17 18:53 ` Stephen J. Turnbull
2010-09-17 20:57 ` Eli Zaretskii
2010-09-18 14:19 ` Stephen J. Turnbull
2010-09-18 15:46 ` Eli Zaretskii
2010-09-18 15:58 ` Stefan Monnier
2010-09-17 17:24 ` Lars Magne Ingebrigtsen
2010-09-17 16:11 ` David Kastrup
2010-09-17 16:18 ` Eli Zaretskii
2010-09-17 16:24 ` Lars Magne Ingebrigtsen
2010-09-17 16:39 ` Eli Zaretskii
2010-09-17 17:30 ` Lars Magne Ingebrigtsen
2010-09-17 18:49 ` Eli Zaretskii
2010-09-17 16:39 ` Eli Zaretskii
2010-09-17 13:49 ` Andreas Schwab
2010-09-17 13:55 ` Lars Magne Ingebrigtsen
2010-09-17 14:31 ` Wojciech Meyer
2010-09-17 14:40 ` Andreas Schwab
2010-09-17 14:47 ` Lars Magne Ingebrigtsen
2010-09-17 15:10 ` Andreas Schwab
2010-09-17 15:16 ` Lars Magne Ingebrigtsen
2010-09-17 15:39 ` Andreas Schwab
2010-09-17 15:42 ` Lars Magne Ingebrigtsen
2010-09-17 16:04 ` Andreas Schwab
2010-09-17 16:14 ` Eli Zaretskii
2010-09-17 19:22 ` James Cloos
2010-09-17 17:40 ` Stephen J. Turnbull
2010-09-17 19:40 ` Lars Magne Ingebrigtsen
2010-09-15 15:46 ` Helmut Eller
2010-09-15 16:28 ` Thomas Lord [this message]
2010-09-15 21:04 ` Leo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1284568137.3622.22.camel@dell-desktop.example.com \
--to=lord@emf.net \
--cc=eller.helmut@gmail.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.