From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thomas Lord Newsgroups: gmane.emacs.devel Subject: Re: Compiling Elisp to a native code with a GCC plugin Date: Wed, 15 Sep 2010 09:28:57 -0700 Message-ID: <1284568137.3622.22.camel@dell-desktop.example.com> References: <87bp805ecr.fsf@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1284568443 26850 80.91.229.12 (15 Sep 2010 16:34:03 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 15 Sep 2010 16:34:03 +0000 (UTC) Cc: emacs-devel@gnu.org To: Helmut Eller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 15 18:33:58 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ovuvl-0002SJ-UE for ged-emacs-devel@m.gmane.org; Wed, 15 Sep 2010 18:33:58 +0200 Original-Received: from localhost ([127.0.0.1]:58290 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ovuvl-0000AW-0L for ged-emacs-devel@m.gmane.org; Wed, 15 Sep 2010 12:33:57 -0400 Original-Received: from [140.186.70.92] (port=47223 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ovur2-0004PU-Ns for emacs-devel@gnu.org; Wed, 15 Sep 2010 12:29:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ovuqx-0000SU-HK for emacs-devel@gnu.org; Wed, 15 Sep 2010 12:29:04 -0400 Original-Received: from smtp141.iad.emailsrvr.com ([207.97.245.141]:53840) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ovuqx-0000SN-Ai for emacs-devel@gnu.org; Wed, 15 Sep 2010 12:28:59 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp34.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id 16C1F380327; Wed, 15 Sep 2010 12:28:59 -0400 (EDT) X-Virus-Scanned: OK Original-Received: by smtp34.relay.iad1a.emailsrvr.com (Authenticated sender: lord-AT-emf.net) with ESMTPSA id 800D8380347; Wed, 15 Sep 2010 12:28:58 -0400 (EDT) In-Reply-To: X-Mailer: Evolution 2.28.3 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:130225 Archived-At: On Wed, 2010-09-15 at 17:46 +0200, Helmut Eller wrote: > * Stefan Monnier [2010-09-15 14:59] writes: > > IIRC regexps without back-refs can be matched (and searched) in O(N) > > where N is the length of the input. Not quite. Let R be the length of the pattern and L be the length of the string we seek to match. Assume that the pattern is a true regular expression (no back-references, no sub-expression position reporting, etc.) The problem itself is O(R * L): no algorithm can guarantee doing better than the product of the two lengths. There is an algorithm (compiling the pattern to a DFA first) which is O(2^R + L). Formally it is suboptimal. There is a catch though. For many common patterns, either R is very small or we don't actually need anything like 2^R steps to compile the pattern, and so the *expected* case (for these patterns) is O(L). The Thompson "DFA caching algorithm" described briefly in the Dragon compiler book is quite attractive because it offers (for true regular expressions) a worst case complexity of O(R * L) ... so it is optimal ... yet also delivers much closer to O(L) for many common patterns. The algorithm is a bit unattractive because it is tricky to implement well, very hard to generalize beyond true regular expressions, and can easily lose to more naive NFA algorithms by small but annoying constant factors. > With back-refs, I think (not sure) > > the theoretical bound is O(N^2), which requires > > a non-backtracking algorithm. > > So yes, we'd need to handle back-refs specially. Several regexp engines > > do that already (they have a few different inner engines and choose > > which one to use based on the particular regexp at hand). > > After googleing a bit I found this page > http://swtch.com/~rsc/regexp/regexp1.html > which again links to this > http://perl.plover.com/NPC/NPC-3SAT.html > which says that regexp matching with backreferences is NP-complete. The best known algorithms are not O(N^2) but, rather, O(2^N) in the worst case. It's dismal how such an innocent seeming feature can cause such havoc. > Cox (the first page) seems to say that backtracking-with-memoization is > linear time at the expense of O(N) space. If he said that regarding worst case performance and without mentioning some specific subset of true regular expressions that he's talking about, he must have made a mistake. For true regular expressions, the simpler case, you can not beat O(R * L) time as the worst case. -t