From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: regexp does not work as documented Date: Mon, 12 May 2008 09:43:49 -0400 Message-ID: References: <87k5i8ukq8.fsf@stupidchicken.com> <200805061335.11379.bruno@clisp.org> <48204B3D.6000500@gmx.at> <4826A303.3030002@gmx.at> <87abiwoqzd.fsf@stupidchicken.com> <482750F4.2050102@emf.net> <4827B9B8.30406@emf.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1210599876 30850 80.91.229.12 (12 May 2008 13:44:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 12 May 2008 13:44:36 +0000 (UTC) Cc: Chong Yidong , 192@emacsbugs.donarmstrong.com, emacs-devel@gnu.org, martin rudalics , David Koppelman , Bruno Haible To: Thomas Lord Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 12 15:45:13 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JvYKZ-0007Sv-4e for ged-emacs-devel@m.gmane.org; Mon, 12 May 2008 15:44:43 +0200 Original-Received: from localhost ([127.0.0.1]:33634 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JvYJq-0002S8-GD for ged-emacs-devel@m.gmane.org; Mon, 12 May 2008 09:43:58 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JvYJm-0002Ry-4w for emacs-devel@gnu.org; Mon, 12 May 2008 09:43:54 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JvYJk-0002Ri-JO for emacs-devel@gnu.org; Mon, 12 May 2008 09:43:53 -0400 Original-Received: from [199.232.76.173] (port=41034 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JvYJk-0002Rd-Dx for emacs-devel@gnu.org; Mon, 12 May 2008 09:43:52 -0400 Original-Received: from ironport2-out.pppoe.ca ([206.248.154.182]:58840 helo=ironport2-out.teksavvy.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JvYJj-0000Fw-Fb for emacs-devel@gnu.org; Mon, 12 May 2008 09:43:51 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AocDAH/mJ0jO+JgrdGdsb2JhbACBU5A7ASeYEg X-IronPort-AV: E=Sophos;i="4.27,473,1204520400"; d="scan'208";a="20407483" Original-Received: from smtp.pppoe.ca (HELO smtp.teksavvy.com) ([65.39.196.238]) by ironport2-out.teksavvy.com with ESMTP; 12 May 2008 09:43:50 -0400 Original-Received: from pastel.home ([206.248.152.43]) by smtp.teksavvy.com (Internet Mail Server v1.0) with ESMTP id SSD44550; Mon, 12 May 2008 09:43:50 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id D59D37F83; Mon, 12 May 2008 09:43:49 -0400 (EDT) In-Reply-To: <4827B9B8.30406@emf.net> (Thomas Lord's message of "Sun, 11 May 2008 20:30:00 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-kernel: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97020 Archived-At: > years ago, is to consider *offline* DFA conversion (a la 'lex(1)'). That's what I do in lex.el. > The advantage of offline (batch) conversion is that you can burn a lot > of cycles on DFA minimization and, if your offline converter > terminates, you've got a reliably linear matcher. The disadvantages > for *many* uses of regular expressions in Emacs should be pretty > obvious. For something like font-lock, where the regular expressions > don't change that often, that might be a good approach -- precompile > a minimal DFA and then add support for "regular expression > continuations" when using those tables. I do not intend to replace src/regexp.c with a matcher based on offline DFA conversion. Actually, the need to support backrefs makes it pretty much impossible (tho I'm sure there's a way to adapt an offline DFA so it can be used with backrefs), and most importantly it has too different performance characteristics. More specifically, the compilation step should be made explicit. In any case I think you did answer my question: an offline DFA matcher is fine, the worst case is not that common and can be worked around. This is not that different from the current backtracking matcher. Stefan PS: The original motivation for a DFA-matcher is to extend syntax-tables so they can match match multi-char elements.