From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept. Date: Mon, 23 Feb 2015 20:21:14 +0000 Message-ID: <20150223202114.GB2861@acm.fritz.box> References: <20150223181205.GA2861@acm.fritz.box> <54EB85AC.1030800@cs.ucla.edu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1424722932 8460 80.91.229.3 (23 Feb 2015 20:22:12 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 23 Feb 2015 20:22:12 +0000 (UTC) Cc: emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Feb 23 21:22:03 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YPzVz-0006ZG-D5 for ged-emacs-devel@m.gmane.org; Mon, 23 Feb 2015 21:22:03 +0100 Original-Received: from localhost ([::1]:45315 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YPzVy-00079w-Hg for ged-emacs-devel@m.gmane.org; Mon, 23 Feb 2015 15:22:02 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40196) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YPzVl-00079p-Hu for emacs-devel@gnu.org; Mon, 23 Feb 2015 15:21:50 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YPzVg-0004hK-8H for emacs-devel@gnu.org; Mon, 23 Feb 2015 15:21:49 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:54651 helo=mail.muc.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YPzVf-0004gx-UK for emacs-devel@gnu.org; Mon, 23 Feb 2015 15:21:44 -0500 Original-Received: (qmail 12972 invoked by uid 3782); 23 Feb 2015 20:21:41 -0000 Original-Received: from acm.muc.de (pD951909A.dip0.t-ipconnect.de [217.81.144.154]) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 23 Feb 2015 21:21:40 +0100 Original-Received: (qmail 4538 invoked by uid 1000); 23 Feb 2015 20:21:14 -0000 Content-Disposition: inline In-Reply-To: <54EB85AC.1030800@cs.ucla.edu> User-Agent: Mutt/1.5.22 (2013-10-16) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 8.x X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183428 Archived-At: Hello, Paul. On Mon, Feb 23, 2015 at 11:55:24AM -0800, Paul Eggert wrote: > Would it be possible to fix the regular expression engine, so that > programs don't have to worry about parsing and reformulating regexps so > that they're "nice"? Anything's possible. Somebody once explained to me that it was all about NFAs and DFAs (which I only understand exceptionally vaguely) and that Emacs has the "wrong" one of these. I talked briefly with Kaz Kylheku on comp.theory a little while ago, and he opined: "I suspect, though, that what you're calling "regexp engine" may actually be some Perl-inspired job which requires a stack for backtracking, and which basically interprets the regular expression, or at least partially." , which, if you take out the disparagement, is what we have, I think. About replacing it with "the other sort" of regexp engine - it might well be that the one we have, complete with backtracking stack and what have you, is generally faster, and that if we were to convert to a regexp engine which could handle awkward regexps gracefully, that could slow Emacs down. It seems a bit like using RISC chips - they can be faster than CISC, but you need an intelligent compiler to drive them. My fix-re.el might be analogous to that intelligent compiler. But, basically, I've got little idea about regexp engines. The one we've got's earliest copyright date is 1993, which seem very late. Perl was first released in 1987. -- Alan Mackenzie (Nuremberg, Germany).