From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept. Date: Thu, 26 Feb 2015 14:12:32 -0500 Message-ID: References: <20150223202114.GB2861@acm.fritz.box> <54EBA757.5030901@cs.ucla.edu> <20150223224245.GC2861@acm.fritz.box> <54EBB9C4.1020505@cs.ucla.edu> <20150225100834.GA3502@acm.fritz.box> <54EEDD82.4010502@cs.ucla.edu> <20150226101137.GA19320@acm.fritz.box> <87fv9tc4qm.fsf@gnu.org> <20150226130917.GC19320@acm.fritz.box> <20150226162119.GD19320@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1424977978 24163 80.91.229.3 (26 Feb 2015 19:12:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Feb 2015 19:12:58 +0000 (UTC) Cc: Paul Eggert , emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 26 20:12:49 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YR3rd-0003m5-12 for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 20:12:49 +0100 Original-Received: from localhost ([::1]:60429 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR3rc-00013j-5S for ged-emacs-devel@m.gmane.org; Thu, 26 Feb 2015 14:12:48 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44628) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR3rR-0000si-8I for emacs-devel@gnu.org; Thu, 26 Feb 2015 14:12:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YR3rN-0003Fn-VJ for emacs-devel@gnu.org; Thu, 26 Feb 2015 14:12:37 -0500 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:9773) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YR3rN-0003Fh-R4 for emacs-devel@gnu.org; Thu, 26 Feb 2015 14:12:33 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArwTAPOG1lRsoXmY/2dsb2JhbABbgwaDX4VTwGUEAgKBDUQBAQEBAQF8hA0BBAFWHgUFCwsOJhIUGA0kiDgIziMBAQEHAQEBAR6PeAeEKgWKJ59LgUUihAwggnMBAQE X-IPAS-Result: ArwTAPOG1lRsoXmY/2dsb2JhbABbgwaDX4VTwGUEAgKBDUQBAQEBAQF8hA0BBAFWHgUFCwsOJhIUGA0kiDgIziMBAQEHAQEBAR6PeAeEKgWKJ59LgUUihAwggnMBAQE X-IronPort-AV: E=Sophos;i="5.09,536,1418101200"; d="scan'208";a="111590029" Original-Received: from 108-161-121-152.dsl.teksavvy.com (HELO pastel.home) ([108.161.121.152]) by ironport2-out.teksavvy.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 26 Feb 2015 14:12:32 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id A616C1219; Thu, 26 Feb 2015 14:12:32 -0500 (EST) In-Reply-To: <20150226162119.GD19320@acm.fritz.box> (Alan Mackenzie's message of "Thu, 26 Feb 2015 16:21:19 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.154.181 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183519 Archived-At: >> > R*\(\)R* >> > , but anybody who writes such regexps deserves what she gets. >> What is it that I deserve to get? > You deserve, perhaps, to lose (match-beginning 1) and (match-end 1), > which were ill-defined anyway. Why do you think so? They seem perfectly well-defined to me. They're just always equal to one another, of course, but to the extent that the regexp syntax only forces me to put "named positions" in pairs, if I need a single position, it's fairly natural to just use \(\). > Have you really written a regexp like this (apart from for testing > purposes)?. If so, what's it for? grep '\\\\(\\\\)' **/*.el finds 27 matches. Taking one example from the list: lisp/emacs-lisp/smie.el: ((looking-at "\\s(\\|\\s)\\(\\)") what this does is to let me use (match-beginning 1) to figure out which of the two alternatives was matched. I could have written this as ((looking-at "\\s(\\|\\(\\s)\\)") but this would be (marginally) slower, because we'd always push a "group-start" marker before try to match "\\s)", whereas with the other rule, we only do that when we know "\\s)" has matched. > By the way, how do you see the prospects of this file becoming > incorporated into Emacs at some stage? To be honest, I haven't looked at it at all, yet. The vague understanding I have of what it might be sounds interesting. It's just a patch trying to cover up the worst aspects of the current regexp engine, but since there doesn't seem to be much interest in improving/overhauling the regexp engine, maybe it's a good stop-gap. Stefan