From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept. Date: Thu, 26 Feb 2015 10:11:37 +0000 Message-ID: <20150226101137.GA19320@acm.fritz.box> References: <20150223181205.GA2861@acm.fritz.box> <54EB85AC.1030800@cs.ucla.edu> <20150223202114.GB2861@acm.fritz.box> <54EBA757.5030901@cs.ucla.edu> <20150223224245.GC2861@acm.fritz.box> <54EBB9C4.1020505@cs.ucla.edu> <20150225100834.GA3502@acm.fritz.box> <54EEDD82.4010502@cs.ucla.edu> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1424945540 28009 80.91.229.3 (26 Feb 2015 10:12:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Feb 2015 10:12:20 +0000 (UTC) Cc: emacs-devel@gnu.org To: Paul Eggert Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 26 11:12:14 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YQvQT-0002eb-4j for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 11:12:13 +0100 Original-Received: from localhost ([::1]:58255 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQvQS-0006zq-65 for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 05:12:12 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46527) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQvQP-0006xC-0N for emacs-devel@gnu.org; Thu, 26 Feb 2015 05:12:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YQvQL-0005Au-Qk for emacs-devel@gnu.org; Thu, 26 Feb 2015 05:12:08 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:37853 helo=mail.muc.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQvQL-0005Ag-Hq for emacs-devel@gnu.org; Thu, 26 Feb 2015 05:12:05 -0500 Original-Received: (qmail 16765 invoked by uid 3782); 26 Feb 2015 10:12:03 -0000 Original-Received: from acm.muc.de (pD951AF12.dip0.t-ipconnect.de [217.81.175.18]) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 26 Feb 2015 11:12:02 +0100 Original-Received: (qmail 30699 invoked by uid 1000); 26 Feb 2015 10:11:37 -0000 Content-Disposition: inline In-Reply-To: <54EEDD82.4010502@cs.ucla.edu> User-Agent: Mutt/1.5.22 (2013-10-16) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 8.x X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183509 Archived-At: Hello, Paul. On Thu, Feb 26, 2015 at 12:46:58AM -0800, Paul Eggert wrote: > Alan Mackenzie wrote: > > More to the point, it is sometimes > > not possible to preserve the numbering of \(...\) constructs while > > fixing a regexp, which would change the match-data. > Sure, but you could remember how the \(...\) constructs were > renumbered, and fix the match data after the underlying regexp call > returned. It shouldn't be a big deal. Unfortunately, it's not that simple. Consider the RE \(R\)+E*\(R\)+ 1 1 2 2 . This gets transformed to \(R\)+\(?:E+\(R\)+\|\(R\)\) 1 1 2 2 2 2 . What was subexpression 2 in the original has become two subexpressions straddling an \| sign in the transformation. I don't think there's a way of transforming R+E*R+ that preserves the numbering of the subexpressions. -- Alan Mackenzie (Nuremberg, Germany).