From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Noam Postavsky Newsgroups: gmane.emacs.bugs Subject: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR))) Date: Wed, 19 Jun 2019 20:29:44 -0400 Message-ID: <87o92tw13b.fsf@gmail.com> References: <87fto9yawl.fsf@gmail.com> <385FA4F7-7FB5-43A2-B571-CFBA20B24123@acm.org> <87tvcmwe6b.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="245756"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) Cc: Michael Heerdegen , 36237@debbugs.gnu.org, Stefan Monnier , kevin.legouguec@gmail.com To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Jun 20 02:43:10 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hdlAU-0011mh-29 for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Jun 2019 02:43:10 +0200 Original-Received: from localhost ([::1]:42726 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hdlAT-0005Ad-3U for geb-bug-gnu-emacs@m.gmane.org; Wed, 19 Jun 2019 20:43:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35488) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hdlA1-0004xs-Nv for bug-gnu-emacs@gnu.org; Wed, 19 Jun 2019 20:42:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hdkxo-0000CH-9a for bug-gnu-emacs@gnu.org; Wed, 19 Jun 2019 20:30:06 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34063) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hdkxm-00007C-G1 for bug-gnu-emacs@gnu.org; Wed, 19 Jun 2019 20:30:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hdkxm-0003I3-9h for bug-gnu-emacs@gnu.org; Wed, 19 Jun 2019 20:30:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Noam Postavsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 20 Jun 2019 00:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 36237 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 36237-submit@debbugs.gnu.org id=B36237.156099059612609 (code B ref 36237); Thu, 20 Jun 2019 00:30:02 +0000 Original-Received: (at 36237) by debbugs.gnu.org; 20 Jun 2019 00:29:56 +0000 Original-Received: from localhost ([127.0.0.1]:47607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hdkxf-0003HI-6b for submit@debbugs.gnu.org; Wed, 19 Jun 2019 20:29:56 -0400 Original-Received: from mail-io1-f65.google.com ([209.85.166.65]:44047) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hdkxc-0003H2-3o for 36237@debbugs.gnu.org; Wed, 19 Jun 2019 20:29:53 -0400 Original-Received: by mail-io1-f65.google.com with SMTP id s7so758196iob.11 for <36237@debbugs.gnu.org>; Wed, 19 Jun 2019 17:29:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=rcisLlVVrxNWymEQ/n4frtvh4SnS+wFJ4JEyH9aTLJo=; b=NJeRWOm5wmhQ54QiDbsp3BiZg7I+4+CYFGmgKZEbmELR9bzgUsnB8BBFQQXilmRfJJ 3TCem7jeyoXuodQ/iVe8sBp6XjoDpT4PiOsk/ceWJSmX8yM2deemVHB8vr8/5I1+Wlhu FJZuK9Ii0pGBdNNz7yhrpJjajtpW+VgpXarzDkXffKKqzBJSvOdq3oJFcpkSUxeR8+tU GflI1d2f335DLXBbhHdx+NH+sMTdyk5JCMN1tAygKnreZUw4XOTwDoq0yhyYOJQwEZf6 HvAQ2jTcY0mUA8DmxmTacz6fN99hY2nDHAZPLuDrLFUMP54GjgmNaGtSnMPP7Vx8JFyW v1Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=rcisLlVVrxNWymEQ/n4frtvh4SnS+wFJ4JEyH9aTLJo=; b=dIaUmS8FNVURgnMVgedzLr2ggL/3MuddF0Vez+kVMo8y0ZV1OzfxFz9FdS6KyU+rFY CSRAX9BsKMGRwTdqCJ20BSq/AFd40n+93xFL/z9LREo+vSFlNrPNk2KZspu2XN6A+y65 hgYY22J2kUikVE61dEyOBRQz1bo4Nlo0DUALaXtfgjBkZNxrJl2Sbh8xOdsYTwMnThLg Q385zOEFHgUr0YMRpf+xzKmzFeXTiRFcIY78RR1M6lLJkGso/R5ZaQleLWKmdfGouGeX MUn3y7akVith9w25I1Q0WtSo43puLR5be8UBs0wvSG1xzCOiW3yX/al5vaFyHhTVZ33y s60w== X-Gm-Message-State: APjAAAVsU2+nq2OoqElcOO/jjZw2nuyheKn6EKIQoPLNHHARXdJnCSfJ F1NkLVqjlkYywV0SC7v45rc= X-Google-Smtp-Source: APXvYqzW0ZxwE+oYRZERodMno/5iDdggAQUOwfiQSg2sitzehVzYG2OSGhwQ2R/2liuXdEriYWWHcw== X-Received: by 2002:a6b:ee15:: with SMTP id i21mr2330270ioh.281.1560990586495; Wed, 19 Jun 2019 17:29:46 -0700 (PDT) Original-Received: from minid (cbl-45-2-119-34.yyz.frontiernetworks.ca. [45.2.119.34]) by smtp.gmail.com with ESMTPSA id 15sm21268932ioe.46.2019.06.19.17.29.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 19 Jun 2019 17:29:45 -0700 (PDT) In-Reply-To: ("Mattias \=\?utf-8\?Q\?Engdeg\=C3\=A5rd\=22's\?\= message of "Wed, 19 Jun 2019 17:42:57 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:160869 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mattias Engdeg=C3=A5rd writes: > +;; (rx (and line-start (regexp something-else))), statically or > +;; (rx-to-string `(and line-start ,something-else)), dynamically. > > With your patch, the rx-to-string example should no longer be > recommended, but eval is still of interest for compile-time > substitution. What about: > > ;; (rx (and line-start (eval something-else))), statically or > ;; (rx (and line-start (regexp something-else))), dynamically. Not sure that we really want to get into the subtleties of static eval in the intro examples. I'm thinking we just drop the rx-to-string example, without replacement. > + ;; Always group non string forms, since we can't be sure they > > "non-string forms" Right. > +(defun rx-literal (form) > + "Parse and produce code from FORM, which is `(literal STRING-EXP)'." > + (cond ((stringp form) > + ;; This is allowed(?), but makes little sense, you could just > + ;; use STRING directly. > > Yes, I did the same in ry. Maybe `literal' should be disallowed > entirely in rx-to-string, since it's more likely to be a > misunderstanding on the user's part. I think disallowing it could potentially be annoying during development, e.g., building incrementally in re-builder. > +(defun rx-subforms (subforms &optional parent regexp-op) > > REGEXP-OP is perhaps better named SEPARATOR? Yeah, especially since it's just the one "\\|" operator. > + (cl-mapcan (lambda (x) > + (cons regexp-op (funcall listify x))) > + (cdr subregexps)))) > + (t (cl-mapcan listify subregexps))))) > > Any reason for using cl-mapcan instead of straight mapcan? > Not that it matters much. I, um, didn't realize mapcan was builtin (when I saw mapcan elsewhere I just assumed it was from cl.el). In my defence, it's new since 26.1 :p (In addition to the above two points, I've renamed this function to rx--subforms, and re-arranged the code a bit) > +`regexp' will not accept non-string arguments (so (literal > +STRING) becomes just a more verbose version of STRING)." > > Try not breaking the line inside (literal STRING). Right. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch Content-Description: patch >From 6351f9d8ac0ff5643b849f2c8e3eb44ea1641fc5 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Fri, 14 Jun 2019 08:43:17 -0400 Subject: [PATCH] Support (rx (and (regexp EXPR) (literal EXPR))) (Bug#36237) * lisp/emacs-lisp/rx.el (rx-regexp): Allow non-string forms. (rx-constituents): Add literal constituent, which is like a plain STRING form, but allows arbitrary lisp expressions. (rx-literal): New function. (rx-compile-to-lisp): New variable. (rx--subforms): New helper function for handling subforms, including non-constant case. (rx-group-if, rx-and, rx-or, rx-=, rx->=, rx-repeat, rx-submatch) (rx-submatch-n, rx-kleene, rx-atomic-p): Use it to handle non-constant subforms. (rx): Document new form, wrap non-constant forms with concat call. * test/lisp/emacs-lisp/rx-tests.el (rx-tests--match): New macro. (rx-nonstring-expr, rx-nonstring-expr-non-greedy): New tests. * etc/NEWS: Announce changes. --- etc/NEWS | 6 ++ lisp/emacs-lisp/rx.el | 189 ++++++++++++++++++++++++++------------- test/lisp/emacs-lisp/rx-tests.el | 41 +++++++++ 3 files changed, 173 insertions(+), 63 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index 723f0a0fb0..42958bca36 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1380,12 +1380,18 @@ when given in a string. Previously, '(any "\x80-\xff")' would match characters U+0080...U+00FF. Now the expression matches raw bytes in the 128...255 range, as expected. +--- *** The rx 'or' and 'seq' forms no longer require any arguments. (or) produces a regexp that never matches anything, while (seq) matches the empty string, each being an identity for the operation. This also works for their aliases: '|' for 'or'; ':', 'and' and 'sequence' for 'seq'. +--- +*** 'regexp' and new 'literal' accept arbitrary lisp as arguments. +In this case, 'rx' will generate code which produces a regexp string +at runtime, instead of a constant string. + ** Frames +++ diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el index 8ef78fd69e..12e33aaded 100644 --- a/lisp/emacs-lisp/rx.el +++ b/lisp/emacs-lisp/rx.el @@ -47,9 +47,11 @@ ;; Rx translates a sexp notation for regular expressions into the ;; usual string notation. The translation can be done at compile-time -;; by using the `rx' macro. It can be done at run-time by calling -;; function `rx-to-string'. See the documentation of `rx' for a -;; complete description of the sexp notation. +;; by using the `rx' macro. The `regexp' and `literal' forms accept +;; non-constant expressions, in which case `rx' will translate to a +;; `concat' expression. Translation can be done fully at run-time by +;; calling function `rx-to-string'. See the documentation of `rx' for +;; a complete description of the sexp notation. ;; ;; Some examples of string regexps and their sexp counterparts: ;; @@ -78,8 +80,7 @@ ;; (+ (? ?\n)) (any " \t")) ;; ;; (concat "^\\(?:" something-else "\\)") -;; (rx (and line-start (eval something-else))), statically or -;; (rx-to-string '(and line-start ,something-else)), dynamically. +;; (rx (seq line-start (regexp something-else))) ;; ;; (regexp-opt '(STRING1 STRING2 ...)) ;; (rx (or STRING1 STRING2 ...)), or in other words, `or' automatically @@ -176,6 +177,7 @@ (defvar rx-constituents ;Not `const' because some modes extend it. (not-syntax . (rx-not-syntax 1 1)) ; sregex (category . (rx-category 1 1 rx-check-category)) (eval . (rx-eval 1 1)) + (literal . (rx-literal 1 1 stringp)) (regexp . (rx-regexp 1 1 stringp)) (regex . regexp) ; sregex (digit . "[[:digit:]]") @@ -302,6 +304,10 @@ (defvar rx-greedy-flag t "Non-nil means produce greedy regular expressions for `zero-or-one', `zero-or-more', and `one-or-more'. Dynamically bound.") +(defvar rx--compile-to-lisp nil + "Nil means return a regexp as a string. +Non-nil means we may return a lisp form which produces a +string (used for `rx' macro).") (defun rx-info (op head) "Return parsing/code generation info for OP. @@ -344,7 +350,7 @@ (defun rx-check (form) (> nargs max-args)) (error "rx form `%s' accepts at most %d args" (car form) max-args)) - (when (not (null type-pred)) + (when type-pred (dolist (sub-form (cdr form)) (unless (funcall type-pred sub-form) (error "rx form `%s' requires args satisfying `%s'" @@ -360,8 +366,9 @@ (defun rx-group-if (regexp group) ;; for concatenation ((eq group ':) (if (rx-atomic-p - (if (string-match - "\\(?:[?*+]\\??\\|\\\\{[0-9]*,?[0-9]*\\\\}\\)\\'" regexp) + (if (and (stringp regexp) + (string-match + "\\(?:[?*+]\\??\\|\\\\{[0-9]*,?[0-9]*\\\\}\\)\\'" regexp)) (substring regexp 0 (match-beginning 0)) regexp)) (setq group nil))) @@ -370,9 +377,10 @@ (defun rx-group-if (regexp group) ;; do anyway ((eq group t)) ((rx-atomic-p regexp t) (setq group nil))) - (if group - (concat "\\(?:" regexp "\\)") - regexp)) + (cond ((and group (stringp regexp)) + (concat "\\(?:" regexp "\\)")) + (group `("\\(?:" ,@regexp "\\)")) + (t regexp))) (defvar rx-parent) @@ -384,7 +392,7 @@ (defun rx-and (form) FORM is of the form `(and FORM1 ...)'." (rx-check form) (rx-group-if - (mapconcat (lambda (x) (rx-form x ':)) (cdr form) nil) + (rx--subforms (cdr form) ':) (and (memq rx-parent '(* t)) rx-parent))) @@ -396,7 +404,7 @@ (defun rx-or (form) ((null (cdr form)) regexp-unmatchable) ((cl-every #'stringp (cdr form)) (regexp-opt (cdr form) nil t)) - (t (mapconcat (lambda (x) (rx-form x '|)) (cdr form) "\\|"))) + (t (rx--subforms (cdr form) '| "\\|"))) (and (memq rx-parent '(: * t)) rx-parent))) @@ -669,7 +677,10 @@ (defun rx-= (form) (unless (and (integerp (nth 1 form)) (> (nth 1 form) 0)) (error "rx `=' requires positive integer first arg")) - (format "%s\\{%d\\}" (rx-form (nth 2 form) '*) (nth 1 form))) + (let ((subform (rx-form (nth 2 form) '*))) + (if (stringp subform) + (format "%s\\{%d\\}" subform (nth 1 form)) + `(,@subform ,(format "\\{%d\\}" (nth 1 form)))))) (defun rx->= (form) @@ -679,7 +690,10 @@ (defun rx->= (form) (unless (and (integerp (nth 1 form)) (> (nth 1 form) 0)) (error "rx `>=' requires positive integer first arg")) - (format "%s\\{%d,\\}" (rx-form (nth 2 form) '*) (nth 1 form))) + (let ((subform (rx-form (nth 2 form) '*))) + (if (stringp subform) + (format "%s\\{%d,\\}" subform (nth 1 form)) + `(,@subform ,(format "\\{%d,\\}" (nth 1 form)))))) (defun rx-** (form) @@ -700,7 +714,10 @@ (defun rx-repeat (form) (unless (and (integerp (nth 1 form)) (> (nth 1 form) 0)) (error "rx `repeat' requires positive integer first arg")) - (format "%s\\{%d\\}" (rx-form (nth 2 form) '*) (nth 1 form))) + (let ((subform (rx-form (nth 2 form) '*))) + (if (stringp subform) + (format "%s\\{%d\\}" subform (nth 1 form)) + `(,@subform ,(format "\\{%d\\}" (nth 1 form)))))) ((or (not (integerp (nth 2 form))) (< (nth 2 form) 0) (not (integerp (nth 1 form))) @@ -708,30 +725,26 @@ (defun rx-repeat (form) (< (nth 2 form) (nth 1 form))) (error "rx `repeat' range error")) (t - (format "%s\\{%d,%d\\}" (rx-form (nth 3 form) '*) - (nth 1 form) (nth 2 form))))) + (let ((subform (rx-form (nth 3 form) '*))) + (if (stringp subform) + (format "%s\\{%d,%d\\}" subform (nth 1 form) (nth 2 form)) + `(,@subform ,(format "\\{%d,%d\\}" (nth 1 form) (nth 2 form)))))))) (defun rx-submatch (form) "Parse and produce code from FORM, which is `(submatch ...)'." - (concat "\\(" - (if (= 2 (length form)) - ;; Only one sub-form. - (rx-form (cadr form)) - ;; Several sub-forms implicitly concatenated. - (mapconcat (lambda (re) (rx-form re ':)) (cdr form) nil)) - "\\)")) + (let ((subforms (rx--subforms (cdr form) ':))) + (if (stringp subforms) + (concat "\\(" subforms "\\)") + `("\\(" ,@subforms "\\)")))) (defun rx-submatch-n (form) "Parse and produce code from FORM, which is `(submatch-n N ...)'." - (let ((n (nth 1 form))) - (concat "\\(?" (number-to-string n) ":" - (if (= 3 (length form)) - ;; Only one sub-form. - (rx-form (nth 2 form)) - ;; Several sub-forms implicitly concatenated. - (mapconcat (lambda (re) (rx-form re ':)) (cddr form) nil)) - "\\)"))) + (let ((n (nth 1 form)) + (subforms (rx--subforms (cddr form) ':))) + (if (stringp subforms) + (concat "\\(?" (number-to-string n) ":" subforms "\\)") + `("\\(?" ,(number-to-string n) ":" ,@subforms "\\)")))) (defun rx-backref (form) "Parse and produce code from FORM, which is `(backref N)'." @@ -759,9 +772,12 @@ (defun rx-kleene (form) (t "?"))) (op (cond ((memq (car form) '(* *? 0+ zero-or-more)) "*") ((memq (car form) '(+ +? 1+ one-or-more)) "+") - (t "?")))) + (t "?"))) + (subform (rx-form (cadr form) '*))) (rx-group-if - (concat (rx-form (cadr form) '*) op suffix) + (if (stringp subform) + (concat subform op suffix) + `(,@subform ,(concat op suffix))) (and (memq rx-parent '(t *)) rx-parent)))) @@ -789,15 +805,18 @@ (defun rx-atomic-p (r &optional lax) be detected without much effort. A guarantee of no false negatives would require a theoretic specification of the set of all atomic regexps." - (let ((l (length r))) - (cond - ((<= l 1)) - ((= l 2) (= (aref r 0) ?\\)) - ((= l 3) (string-match "\\`\\(?:\\\\[cCsS_]\\|\\[[^^]\\]\\)" r)) - ((null lax) + (if (and rx--compile-to-lisp + (not (stringp r))) + nil ;; Runtime value, we must assume non-atomic. + (let ((l (length r))) (cond - ((string-match "\\`\\[\\^?]?\\(?:\\[:[a-z]+:]\\|[^]]\\)*]\\'" r)) - ((string-match "\\`\\\\(\\(?:[^\\]\\|\\\\[^)]\\)*\\\\)\\'" r))))))) + ((<= l 1)) + ((= l 2) (= (aref r 0) ?\\)) + ((= l 3) (string-match "\\`\\(?:\\\\[cCsS_]\\|\\[[^^]\\]\\)" r)) + ((null lax) + (cond + ((string-match "\\`\\[\\^?]?\\(?:\\[:[a-z]+:]\\|[^]]\\)*]\\'" r)) + ((string-match "\\`\\\\(\\(?:[^\\]\\|\\\\[^)]\\)*\\\\)\\'" r)))))))) (defun rx-syntax (form) @@ -853,9 +872,23 @@ (defun rx-greedy (form) (defun rx-regexp (form) "Parse and produce code from FORM, which is `(regexp STRING)'." - (rx-check form) - (rx-group-if (cadr form) rx-parent)) - + (cond ((stringp form) + (rx-group-if (cadr form) rx-parent)) + (rx--compile-to-lisp + ;; Always group non-string forms, since we can't be sure they + ;; are atomic. + (rx-group-if (cdr form) t)) + (t (rx-check form)))) + +(defun rx-literal (form) + "Parse and produce code from FORM, which is `(literal STRING-EXP)'." + (cond ((stringp form) + ;; This is allowed, but makes little sense, you could just + ;; use STRING directly. + (rx-group-if (regexp-quote (cadr form)) rx-parent)) + (rx--compile-to-lisp + (rx-group-if `((regexp-quote ,(cadr form))) rx-parent)) + (t (rx-check form)))) (defun rx-form (form &optional parent) "Parse and produce code for regular expression FORM. @@ -886,12 +919,38 @@ (defun rx-form (form &optional parent) (t (error "rx syntax error at `%s'" form))))) +(defun rx--subforms (subforms &optional parent separator) + "Produce code for regular expressions SUBFORMS. +SUBFORMS is a list of regular expression sexps. +PARENT controls grouping, as in `rx-form'. +Insert SEPARATOR between the code from each of SUBFORMS." + (if (null (cdr subforms)) + ;; Zero or one forms, no need for grouping. + (and subforms (rx-form (car subforms))) + (let ((listify (lambda (x) + (if (listp x) (copy-sequence x) + (list x))))) + (setq subforms (mapcar (lambda (x) (rx-form x parent)) subforms)) + (cond ((or (not rx--compile-to-lisp) + (cl-every #'stringp subforms)) + (mapconcat #'identity subforms separator)) + (separator + (nconc (funcall listify (car subforms)) + (mapcan (lambda (x) + (cons separator (funcall listify x))) + (cdr subforms)))) + (t (mapcan listify subforms)))))) + ;;;###autoload (defun rx-to-string (form &optional no-group) "Parse and produce code for regular expression FORM. FORM is a regular expression in sexp form. -NO-GROUP non-nil means don't put shy groups around the result." +NO-GROUP non-nil means don't put shy groups around the result. + +In contrast to the `rx' macro, subforms `literal' and `regexp' +will not accept non-string arguments, i.e., (literal STRING) +becomes just a more verbose version of STRING." (rx-group-if (rx-form form) (null no-group))) @@ -901,8 +960,12 @@ (defmacro rx (&rest regexps) REGEXPS is a non-empty sequence of forms of the sort listed below. Note that `rx' is a Lisp macro; when used in a Lisp program being -compiled, the translation is performed by the compiler. -See `rx-to-string' for how to do such a translation at run-time. +compiled, the translation is performed by the compiler. The +`literal' and `regexp' forms accept subforms that will evaluate +to strings, in addition to constant strings. If REGEXPS include +such forms, then the result is an expression which returns a +regexp string, rather than a regexp string directly. See +`rx-to-string' for performing translation completely at run-time. The following are valid subforms of regular expressions in sexp notation. @@ -910,6 +973,10 @@ (defmacro rx (&rest regexps) STRING matches string STRING literally. +`(literal STRING)' + matches STRING literally, where STRING is any lisp + expression that evaluates to a string. + CHAR matches character CHAR literally. @@ -1208,12 +1275,16 @@ (defmacro rx (&rest regexps) `(regexp REGEXP)' include REGEXP in string notation in the result." - (cond ((null regexps) - (error "No regexp")) - ((cdr regexps) - (rx-to-string `(and ,@regexps) t)) - (t - (rx-to-string (car regexps) t)))) + (let* ((rx--compile-to-lisp t) + (re (cond ((null regexps) + (error "No regexp")) + ((cdr regexps) + (rx-to-string `(and ,@regexps) t)) + (t + (rx-to-string (car regexps) t))))) + (if (stringp re) + re + `(concat ,@re)))) (pcase-defmacro rx (&rest regexps) @@ -1275,14 +1346,6 @@ (pcase-defmacro rx (&rest regexps) for var in vars collect `(app (match-string ,i) ,var))))) -;; ;; sregex.el replacement - -;; ;;;###autoload (provide 'sregex) -;; ;;;###autoload (autoload 'sregex "rx") -;; (defalias 'sregex 'rx-to-string) -;; ;;;###autoload (autoload 'sregexq "rx" nil nil 'macro) -;; (defalias 'sregexq 'rx) - (provide 'rx) ;;; rx.el ends here diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el index 6f392d616d..bab71b522b 100644 --- a/test/lisp/emacs-lisp/rx-tests.el +++ b/test/lisp/emacs-lisp/rx-tests.el @@ -115,5 +115,46 @@ (ert-deftest rx-seq () ;; Test zero-argument `seq'. (should (equal (rx (seq)) ""))) +(defmacro rx-tests--match (regexp string &optional match) + (macroexp-let2 nil strexp string + `(ert-info ((format "Matching %S to %S" ',regexp ,strexp)) + (should (string-match ,regexp ,strexp)) + ,@(when match + `((should (equal (match-string 0 ,strexp) ,match))))))) + +(ert-deftest rx-nonstring-expr () + (let ((bee "b") + (vowel "[aeiou]")) + (rx-tests--match (rx "a" (literal bee) "c") "abc") + (rx-tests--match (rx "a" (regexp bee) "c") "abc") + (rx-tests--match (rx "a" (or (regexp bee) "xy") "c") "abc") + (rx-tests--match (rx "a" (or "xy" (regexp bee)) "c") "abc") + (should-not (string-match (rx (or (regexp bee) "xy")) "")) + (rx-tests--match (rx "a" (= 3 (regexp bee)) "c") "abbbc") + (rx-tests--match (rx "x" (= 3 (regexp vowel)) "z") "xeoez") + (should-not (string-match (rx "x" (= 3 (regexp vowel)) "z") "xe[]z")) + (rx-tests--match (rx "x" (= 3 (literal vowel)) "z") + "x[aeiou][aeiou][aeiou]z") + (rx-tests--match (rx "x" (repeat 1 (regexp vowel)) "z") "xaz") + (rx-tests--match (rx "x" (repeat 1 2 (regexp vowel)) "z") "xaz") + (rx-tests--match (rx "x" (repeat 1 2 (regexp vowel)) "z") "xauz") + (rx-tests--match (rx "x" (>= 1 (regexp vowel)) "z") "xaiiz") + (rx-tests--match (rx "x" (** 1 2 (regexp vowel)) "z") "xaiz") + (rx-tests--match (rx "x" (group (regexp vowel)) "z") "xaz") + (rx-tests--match (rx "x" (group-n 1 (regexp vowel)) "z") "xaz") + (rx-tests--match (rx "x" (? (regexp vowel)) "z") "xz"))) + +(ert-deftest rx-nonstring-expr-non-greedy () + "`rx's greediness can't affect runtime regexp parts." + (let ((ad-min "[ad]*?") + (ad-max "[ad]*") + (ad "[ad]")) + (rx-tests--match (rx "c" (regexp ad-min) "a") "cdaaada" "cda") + (rx-tests--match (rx "c" (regexp ad-max) "a") "cdaaada" "cdaaada") + (rx-tests--match (rx "c" (minimal-match (regexp ad-max)) "a") "cdaaada" "cdaaada") + (rx-tests--match (rx "c" (maximal-match (regexp ad-min)) "a") "cdaaada" "cda") + (rx-tests--match (rx "c" (minimal-match (0+ (regexp ad))) "a") "cdaaada" "cda") + (rx-tests--match (rx "c" (maximal-match (0+ (regexp ad))) "a") "cdaaada" "cdaaada"))) + (provide 'rx-tests) ;; rx-tests.el ends here. -- 2.11.0 --=-=-=--