all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Thomas Lord <lord@emf.net>
To: Helmut Eller <eller.helmut@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Compiling Elisp to a native code with a GCC plugin
Date: Wed, 15 Sep 2010 09:28:57 -0700	[thread overview]
Message-ID: <1284568137.3622.22.camel@dell-desktop.example.com> (raw)
In-Reply-To: <m2zkvjf1rs.fsf@gmail.com>

On Wed, 2010-09-15 at 17:46 +0200, Helmut Eller wrote:
> * Stefan Monnier [2010-09-15 14:59] writes:

> > IIRC regexps without back-refs can be matched (and searched) in O(N)
> > where N is the length of the input.  

Not quite.   Let R be the length of the pattern
and L be the length of the string we seek to match.

Assume that the pattern is a true regular expression
(no back-references, no sub-expression position
reporting, etc.)

The problem itself is O(R * L):  no algorithm can
guarantee doing better than the product of the two
lengths.

There is an algorithm (compiling the pattern to a DFA
first) which is O(2^R + L). Formally it is suboptimal.
There is a catch though.   For many common patterns,
either R is very small or we don't actually need
anything like 2^R steps to compile the pattern,
and so the *expected* case (for these patterns)
is O(L).

The Thompson "DFA caching algorithm" described briefly
in the Dragon compiler book is quite attractive
because it offers (for true regular expressions) a 
worst case complexity of O(R * L) ... so it is optimal ...
yet also delivers much closer to O(L) for many
common patterns.   The algorithm is a bit 
unattractive because it is tricky to implement 
well, very hard to generalize beyond true regular
expressions, and can easily lose to more naive NFA
algorithms by small but annoying constant factors.



> With back-refs, I think (not sure)
> > the theoretical bound is O(N^2), which requires
> > a non-backtracking algorithm.

> > So yes, we'd need to handle back-refs specially.  Several regexp engines
> > do that already (they have a few different inner engines and choose
> > which one to use based on the particular regexp at hand).
> 
> After googleing a bit I found this page 
> http://swtch.com/~rsc/regexp/regexp1.html
> which again links to this
> http://perl.plover.com/NPC/NPC-3SAT.html
> which says that regexp matching with backreferences is NP-complete.

The best known algorithms are not O(N^2) but,
rather, O(2^N) in the worst case.   It's dismal
how such an innocent seeming feature can cause
such havoc.


> Cox (the first page) seems to say that backtracking-with-memoization is
> linear time at the expense of O(N) space.

If he said that regarding worst case performance
and without mentioning some specific subset of 
true regular expressions that he's talking about,
he must have made a mistake.

For true regular expressions, the simpler case,
you can not beat O(R * L) time as the worst case.

-t





  reply	other threads:[~2010-09-15 16:28 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-14 19:12 Compiling Elisp to a native code with a GCC plugin Wojciech Meyer
2010-09-14 19:32 ` Tom Tromey
2010-09-14 19:45   ` Wojciech Meyer
2010-09-14 20:17     ` Lars Magne Ingebrigtsen
2010-09-14 20:52       ` Wojciech Meyer
2010-09-14 20:55       ` Tom Tromey
2010-09-14 21:05         ` Wojciech Meyer
2010-09-14 20:44     ` Tom Tromey
2010-09-14 21:00       ` Wojciech Meyer
2010-09-14 21:16         ` Tom Tromey
2010-09-14 21:29           ` Wojciech Meyer
2010-09-14 21:59             ` Tom Tromey
2010-09-14 22:37               ` Wojciech Meyer
2010-09-14 22:55                 ` Tom Tromey
2010-09-14 23:33                   ` Wojciech Meyer
2010-09-15  1:38                     ` Tom Tromey
2010-09-14 22:49               ` Wojciech Meyer
2010-09-14 23:13           ` Thomas Lord
2010-09-14 23:42             ` Wojciech Meyer
2010-09-15 10:47   ` Leo
2010-09-15 11:41     ` Andreas Schwab
2010-09-15 12:10       ` Wojciech Meyer
2010-09-15 14:07     ` Stefan Monnier
2010-09-15 14:27       ` Helmut Eller
2010-09-15 14:59         ` Stefan Monnier
2010-09-15 15:09           ` Lars Magne Ingebrigtsen
2010-09-15 15:31             ` Andreas Schwab
2010-09-15 15:35               ` Lars Magne Ingebrigtsen
2010-09-15 16:28                 ` Andreas Schwab
2010-09-16 16:57                   ` Lars Magne Ingebrigtsen
2010-09-15 15:42             ` Stefan Monnier
2010-09-15 15:51               ` Lars Magne Ingebrigtsen
2010-09-15 15:57                 ` Leo
2010-09-15 16:01                   ` Lars Magne Ingebrigtsen
2010-09-15 16:05                   ` David Kastrup
2010-09-15 16:23                     ` Leo
2010-09-15 16:37                       ` David Kastrup
2010-09-16 16:58                         ` Lars Magne Ingebrigtsen
2010-09-16 21:11                           ` Andreas Schwab
2010-09-16 23:17                             ` Lars Magne Ingebrigtsen
2010-09-17  8:13                               ` Eli Zaretskii
2010-09-17 13:17                                 ` Lars Magne Ingebrigtsen
2010-09-17 13:30                                   ` Eli Zaretskii
2010-09-17 13:34                                     ` Lars Magne Ingebrigtsen
2010-09-16 17:35                         ` Lars Magne Ingebrigtsen
2010-09-16  2:57                 ` Stephen J. Turnbull
2010-09-16  6:54                   ` David Kastrup
2010-09-16  8:10                     ` Stephen J. Turnbull
2010-09-16  8:31                       ` David Kastrup
2010-09-16 17:01                   ` Lars Magne Ingebrigtsen
2010-09-17  6:52                     ` Stephen J. Turnbull
2010-09-17 13:09                       ` Lars Magne Ingebrigtsen
2010-09-17 13:31                         ` David Kastrup
2010-09-17 13:39                           ` Lars Magne Ingebrigtsen
2010-09-17 13:55                             ` David Kastrup
2010-09-17 14:18                               ` Lars Magne Ingebrigtsen
2010-09-17 14:57                                 ` David Kastrup
2010-09-17 15:06                                   ` Lars Magne Ingebrigtsen
2010-09-17 15:24                                     ` Lars Magne Ingebrigtsen
2010-09-17 16:11                                       ` Eli Zaretskii
2010-09-17 16:33                                       ` David Kastrup
2010-09-17 16:41                                         ` Andreas Schwab
2010-09-17 17:17                                           ` David Kastrup
2010-09-17 18:24                                             ` David Kastrup
2010-09-17 20:30                                               ` David Kastrup
2010-09-17 20:49                                                 ` Lars Magne Ingebrigtsen
2010-09-18  4:31                                                   ` David Kastrup
2010-09-17 18:53                                             ` Stephen J. Turnbull
2010-09-17 20:57                                               ` Eli Zaretskii
2010-09-18 14:19                                                 ` Stephen J. Turnbull
2010-09-18 15:46                                                   ` Eli Zaretskii
2010-09-18 15:58                                                   ` Stefan Monnier
2010-09-17 17:24                                         ` Lars Magne Ingebrigtsen
2010-09-17 16:11                                     ` David Kastrup
2010-09-17 16:18                                 ` Eli Zaretskii
2010-09-17 16:24                                   ` Lars Magne Ingebrigtsen
2010-09-17 16:39                                     ` Eli Zaretskii
2010-09-17 17:30                                       ` Lars Magne Ingebrigtsen
2010-09-17 18:49                                         ` Eli Zaretskii
2010-09-17 16:39                                     ` Eli Zaretskii
2010-09-17 13:49                           ` Andreas Schwab
2010-09-17 13:55                             ` Lars Magne Ingebrigtsen
2010-09-17 14:31                               ` Wojciech Meyer
2010-09-17 14:40                               ` Andreas Schwab
2010-09-17 14:47                                 ` Lars Magne Ingebrigtsen
2010-09-17 15:10                                   ` Andreas Schwab
2010-09-17 15:16                                     ` Lars Magne Ingebrigtsen
2010-09-17 15:39                                       ` Andreas Schwab
2010-09-17 15:42                                         ` Lars Magne Ingebrigtsen
2010-09-17 16:04                                           ` Andreas Schwab
2010-09-17 16:14                                       ` Eli Zaretskii
2010-09-17 19:22                                         ` James Cloos
2010-09-17 17:40                         ` Stephen J. Turnbull
2010-09-17 19:40                           ` Lars Magne Ingebrigtsen
2010-09-15 15:46           ` Helmut Eller
2010-09-15 16:28             ` Thomas Lord [this message]
2010-09-15 21:04       ` Leo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1284568137.3622.22.camel@dell-desktop.example.com \
    --to=lord@emf.net \
    --cc=eller.helmut@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.