From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#192: regexp does not work as documented Date: Mon, 12 May 2008 09:43:49 -0400 Message-ID: References: <87k5i8ukq8.fsf@stupidchicken.com> <200805061335.11379.bruno@clisp.org> <48204B3D.6000500@gmx.at> <4826A303.3030002@gmx.at> <87abiwoqzd.fsf@stupidchicken.com> <482750F4.2050102@emf.net> <4827B9B8.30406@emf.net> Reply-To: Stefan Monnier , 192@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1210600333 32646 80.91.229.12 (12 May 2008 13:52:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 12 May 2008 13:52:13 +0000 (UTC) Cc: Chong Yidong , 192@emacsbugs.donarmstrong.com, emacs-devel@gnu.org, David Koppelman , Bruno Haible To: Thomas Lord Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 12 15:52:49 2008 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JvYSF-0002P8-RH for geb-bug-gnu-emacs@m.gmane.org; Mon, 12 May 2008 15:52:40 +0200 Original-Received: from localhost ([127.0.0.1]:39072 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JvYRX-0004kB-Ak for geb-bug-gnu-emacs@m.gmane.org; Mon, 12 May 2008 09:51:55 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JvYRM-0004h8-UN for bug-gnu-emacs@gnu.org; Mon, 12 May 2008 09:51:44 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JvYRL-0004gB-EN for bug-gnu-emacs@gnu.org; Mon, 12 May 2008 09:51:43 -0400 Original-Received: from [199.232.76.173] (port=56710 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JvYRL-0004g4-7R for bug-gnu-emacs@gnu.org; Mon, 12 May 2008 09:51:43 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:36670) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JvYRK-0001iq-Df for bug-gnu-emacs@gnu.org; Mon, 12 May 2008 09:51:42 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m4CDpdKH026317; Mon, 12 May 2008 06:51:39 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id m4CDo3Dj025695; Mon, 12 May 2008 06:50:03 -0700 X-Loop: don@donarmstrong.com Resent-From: Stefan Monnier Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Mon, 12 May 2008 13:50:02 +0000 Resent-Message-ID: Resent-Sender: don@donarmstrong.com X-Emacs-PR-Message: report 192 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 192-submit@emacsbugs.donarmstrong.com id=B192.121059984024962 (code B ref 192); Mon, 12 May 2008 13:50:02 +0000 Original-Received: (at 192) by emacsbugs.donarmstrong.com; 12 May 2008 13:44:00 +0000 Original-Received: from ironport2-out.teksavvy.com (ironport2-out.pppoe.ca [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m4CDhuds024956 for <192@emacsbugs.donarmstrong.com>; Mon, 12 May 2008 06:43:57 -0700 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AocDAH/mJ0jO+JgrdGdsb2JhbACBU5A7ASeYEg X-IronPort-AV: E=Sophos;i="4.27,473,1204520400"; d="scan'208";a="20407483" Original-Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 12 May 2008 09:43:50 -0400 Original-Received: from pastel.home ([206.248.152.43]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id SSD44550; Mon, 12 May 2008 09:43:50 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id D59D37F83; Mon, 12 May 2008 09:43:49 -0400 (EDT) In-Reply-To: <4827B9B8.30406@emf.net> (Thomas Lord's message of "Sun, 11 May 2008 20:30:00 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) Resent-Date: Mon, 12 May 2008 09:51:43 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:17976 Archived-At: > years ago, is to consider *offline* DFA conversion (a la 'lex(1)'). That's what I do in lex.el. > The advantage of offline (batch) conversion is that you can burn a lot > of cycles on DFA minimization and, if your offline converter > terminates, you've got a reliably linear matcher. The disadvantages > for *many* uses of regular expressions in Emacs should be pretty > obvious. For something like font-lock, where the regular expressions > don't change that often, that might be a good approach -- precompile > a minimal DFA and then add support for "regular expression > continuations" when using those tables. I do not intend to replace src/regexp.c with a matcher based on offline DFA conversion. Actually, the need to support backrefs makes it pretty much impossible (tho I'm sure there's a way to adapt an offline DFA so it can be used with backrefs), and most importantly it has too different performance characteristics. More specifically, the compilation step should be made explicit. In any case I think you did answer my question: an offline DFA matcher is fine, the worst case is not that common and can be worked around. This is not that different from the current backtracking matcher. Stefan PS: The original motivation for a DFA-matcher is to extend syntax-tables so they can match match multi-char elements.