From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Tassilo Horn Newsgroups: gmane.emacs.devel Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept. Date: Thu, 26 Feb 2015 12:05:37 +0100 Message-ID: <87fv9tc4qm.fsf@gnu.org> References: <20150223181205.GA2861@acm.fritz.box> <54EB85AC.1030800@cs.ucla.edu> <20150223202114.GB2861@acm.fritz.box> <54EBA757.5030901@cs.ucla.edu> <20150223224245.GC2861@acm.fritz.box> <54EBB9C4.1020505@cs.ucla.edu> <20150225100834.GA3502@acm.fritz.box> <54EEDD82.4010502@cs.ucla.edu> <20150226101137.GA19320@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1424948765 17506 80.91.229.3 (26 Feb 2015 11:06:05 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Feb 2015 11:06:05 +0000 (UTC) Cc: Paul Eggert , emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 26 12:05:56 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YQwGO-0002Vu-JF for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 12:05:52 +0100 Original-Received: from localhost ([::1]:58381 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQwGN-0002mA-Pd for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 06:05:51 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58074) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQwGJ-0002if-JA for emacs-devel@gnu.org; Thu, 26 Feb 2015 06:05:48 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YQwGB-0005jk-VA for emacs-devel@gnu.org; Thu, 26 Feb 2015 06:05:47 -0500 Original-Received: from deliver.uni-koblenz.de ([141.26.64.15]:57175) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQwGB-0005jY-Nl for emacs-devel@gnu.org; Thu, 26 Feb 2015 06:05:39 -0500 Original-Received: from thinkpad-t440p (dhcp46.uni-koblenz.de [141.26.71.46]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by deliver.uni-koblenz.de (Postfix) with ESMTPSA id BD7741A8437; Thu, 26 Feb 2015 12:05:38 +0100 (CET) Mail-Followup-To: Alan Mackenzie , Paul Eggert , emacs-devel@gnu.org In-Reply-To: <20150226101137.GA19320@acm.fritz.box> (Alan Mackenzie's message of "Thu, 26 Feb 2015 10:11:37 +0000") User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 141.26.64.15 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183510 Archived-At: Alan Mackenzie writes: Hi Alan, >> Sure, but you could remember how the \(...\) constructs were >> renumbered, and fix the match data after the underlying regexp call >> returned. It shouldn't be a big deal. > > Unfortunately, it's not that simple. Consider the RE > > \(R\)+E*\(R\)+ > 1 1 2 2 > > . This gets transformed to > > \(R\)+\(?:E+\(R\)+\|\(R\)\) > 1 1 2 2 2 2 > > . What was subexpression 2 in the original has become two > subexpressions straddling an \| sign in the transformation. I don't > think there's a way of transforming R+E*R+ that preserves the > numbering of the subexpressions. Couldn't you use explicitly numbered groups, i.e., the regex would translate to \(?1:R\)+\(?:E+\(?2:R\)+\|\(?2:R\)\) ? As long as the groups with the same number are exclusive there shouldn't be a problem. Bye, Tassilo