From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Fixing ill-conditioned regular expressions. Proof of concept. Date: Fri, 13 Mar 2015 18:53:38 -0400 Message-ID: References: <20150223181205.GA2861@acm.fritz.box> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1426287247 13042 80.91.229.3 (13 Mar 2015 22:54:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 13 Mar 2015 22:54:07 +0000 (UTC) Cc: emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 13 23:53:58 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YWYSr-00024S-8Z for ged-emacs-devel@m.gmane.org; Fri, 13 Mar 2015 23:53:57 +0100 Original-Received: from localhost ([::1]:38947 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWYSq-0002d9-Fj for ged-emacs-devel@m.gmane.org; Fri, 13 Mar 2015 18:53:56 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45415) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWYSd-0002d4-VL for emacs-devel@gnu.org; Fri, 13 Mar 2015 18:53:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YWYSZ-0006og-VJ for emacs-devel@gnu.org; Fri, 13 Mar 2015 18:53:43 -0400 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:61907) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YWYSZ-0006oc-Rh for emacs-devel@gnu.org; Fri, 13 Mar 2015 18:53:39 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArwTAPOG1lRFxLnr/2dsb2JhbABbgwaDX4VTwGUEAgKBDUQBAQEBAQF8hA0BBAEnLxYKAwULCw4mEhQYDSSIOAjOIwEBAQcCAR+PeAeEKgWFUpQNjhqBeYFFIoQKIoJzAQEB X-IPAS-Result: ArwTAPOG1lRFxLnr/2dsb2JhbABbgwaDX4VTwGUEAgKBDUQBAQEBAQF8hA0BBAEnLxYKAwULCw4mEhQYDSSIOAjOIwEBAQcCAR+PeAeEKgWFUpQNjhqBeYFFIoQKIoJzAQEB X-IronPort-AV: E=Sophos;i="5.09,536,1418101200"; d="scan'208";a="113508824" Original-Received: from 69-196-185-235.dsl.teksavvy.com (HELO pastel.home) ([69.196.185.235]) by ironport2-out.teksavvy.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 13 Mar 2015 18:53:38 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 309DFFA0; Fri, 13 Mar 2015 18:53:38 -0400 (EDT) In-Reply-To: <20150223181205.GA2861@acm.fritz.box> (Alan Mackenzie's message of "Mon, 23 Feb 2015 18:12:05 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.248.154.181 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:183860 Archived-At: > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > ;; > ;; f i x - r e . e l This is not using the standard file format. I strongly suggest you use something like auto-insert-mode which will get that right for you. > ;; The AST is an internal representation of the regular expression string. > ;; Elements are represented as follows: Why invent a new format, instead of using rx? I'm pretty sure someone has already provided a parser from string to rx format (and of course we also already have a dumper from rx to string). > ;; PTR and AD > ;; ---------- > ;; Whilst building and transforming trees, what we really need is pointers to > ;; either the car or cdr of a cons cell, to enable us to > ;; overwrite them. I think your code will gain a lot in readability if you made it functional: take an rx expression and return a *new* rx expression without modifying the original one. As it stands, I find it rather impenetrable. > Return 'shortened, t or nil. Our conventions say this should be Return `shortened', t or nil. > (if (eq ad 'cdr) (error "fix-re--R+R*->R+ got 'cdr")) > (let ((e0 (if (eq ad 'car) (car ptr) (cadr ptr))) > (e1 (if (eq ad 'car) (cadr ptr) (caddr ptr)))) > (when (and (consp e0) (consp e1)) > (let ((op0 (car e0)) > (op1 (car e1)) > ;; (link (if (eq ad 'car) (cddr ptr) (cdddr ptr))) > op > ) > (when > (and (memq op0 '(+ * \?)) > (memq op1 '(+ * \?)) > (equal (cdr e0) (cdr e1))) ; Both Rs are the same. > (cond > ((and (eq op0 '\?) (eq op1 '\?)) ; Cant combine R?R? > nil) > ((and (eq op0 '+) (eq op1 '+)) ; R+R+ -> RR+ > (fix-re--chop-+* ptr 'car) > t) > (t > (setq op > (cond > ((or (eq op0 '+) (eq op1 '+)) '+) > ((or (eq op0 '*) (eq op1 '*)) '*))) > (setcar e0 op) > (fix-re--chop (if (eq ad 'car) ptr (cdr ptr)) 'cadr) > 'shortened)) > ))))) For example, I think that if you make it functional rather than imperative, the above could turn into something along the lines of (pcase ptr ;We don't have `ad' any more. (`((,op0 . ,r0) (,op1 . ,r1) . (and ,rest (guard (and (equal r0 r1) (memq op0 '(+ * \?)) (memq op1 '(+ * \?))))))) (cond ((and (eq op0 '\?) (eq op1 '\?)) ; Cant combine R?R? ptr) ((and (eq op0 '+) (eq op1 '+)) ; R+R+ -> RR+ `(,(car r0) ,(car ptr) . ,rest)) (t `((,(cond ((or (eq op0 '+) (eq op1 '+)) '+) ((or (eq op0 '*) (eq op1 '*)) '*)) . r0) . ,rest))))) -- Stefan>