unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Philipp <p.stephani2@gmail.com>
Cc: 48477@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
Subject: bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns
Date: Tue, 18 May 2021 12:44:30 +0200	[thread overview]
Message-ID: <CE446C90-C87B-463E-9847-16A494B4CE7F@acm.org> (raw)
In-Reply-To: <wvr4pmxp1tm5.fsf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

Serves me right for trying to be clever! Very sorry about that.

Matches would always succeed because the outcome was erroneously transformed into a match against a plain pcase variable which never fails. For example, the pattern

 (rx (let x "a"))

would expand to
    
 (and (pred stringp)
      (app (lambda (s) (and (string-match (rx (group-n 1 "a")) s)
                            (match-string 1 s)))
           x))

which cannot fail (as long as the input is a string).  Patterns with two or more named submatches are not affected because of the structural match used, and zero submatches were treated specially anyway.

Please try the attached patch. It encodes non-matches as the number 0 (any non-nil non-string value would have done; 0 is cheap to create and test). The above pattern now expands to

 (and (pred stringp)
      (app (lambda (s) (if (string-match (rx (group-n 1 "a")) s)
                           (match-string 1 s)
                         0))
           (and x (pred (not numberp)))))


[-- Attachment #2: 0001-Fix-pcase-rx-patterns-with-a-single-named-submatch-b.patch --]
[-- Type: application/octet-stream, Size: 3885 bytes --]

From be9db2b94d31a0afe3f93302558b3a78605244c7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Tue, 18 May 2021 12:03:11 +0200
Subject: [PATCH] Fix pcase 'rx' patterns with a single named submatch
 (bug#48477)

pcase 'rx' patterns with a single named submatch, like

  (rx (let x "a"))

would always succeed because of an over-optimistic transformation.
Patterns with 0 or more than 1 named submatches were not affected.

Reported by Philipp Stephani.

* lisp/emacs-lisp/rx.el (rx--pcase-macroexpander):
Special case for a single named submatch.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add tests.
---
 lisp/emacs-lisp/rx.el            | 21 ++++++++++++++++-----
 test/lisp/emacs-lisp/rx-tests.el | 14 ++++++++++++++
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el
index 1e3eb9c12b..43bd84d999 100644
--- a/lisp/emacs-lisp/rx.el
+++ b/lisp/emacs-lisp/rx.el
@@ -1445,12 +1445,23 @@ rx
          (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps))))
          (nvars (length rx--pcase-vars)))
     `(and (pred stringp)
-          ,(if (zerop nvars)
-               ;; No variables bound: a single predicate suffices.
-               `(pred (string-match ,regexp))
+          ,(pcase nvars
+            (0
+             ;; No variables bound: a single predicate suffices.
+             `(pred (string-match ,regexp)))
+            (1
+             ;; Create a match value that on a successful regexp match
+             ;; is the submatch value, 0 on failure.  We can't use nil
+             ;; for failure because it is a valid submatch value.
+             `(app (lambda (s)
+                     (if (string-match ,regexp s)
+                         (match-string 1 s)
+                       0))
+                   (and ,(car rx--pcase-vars) (pred (not numberp)))))
+            (_
              ;; Pack the submatches into a dotted list which is then
              ;; immediately destructured into individual variables again.
-             ;; This is of course slightly inefficient when NVARS > 1.
+             ;; This is of course slightly inefficient.
              ;; A dotted list is used to reduce the number of conses
              ;; to create and take apart.
              `(app (lambda (s)
@@ -1463,7 +1474,7 @@ rx
                           (rx--reduce-right
                            #'cons
                            (mapcar (lambda (name) (list '\, name))
-                                   (reverse rx--pcase-vars)))))))))
+                                   (reverse rx--pcase-vars))))))))))
 
 ;; Obsolete internal symbol, used in old versions of the `flycheck' package.
 (define-obsolete-function-alias 'rx-submatch-n 'rx-to-string "27.1")
diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el
index 2dd1bca22d..4828df0de9 100644
--- a/test/lisp/emacs-lisp/rx-tests.el
+++ b/test/lisp/emacs-lisp/rx-tests.el
@@ -166,6 +166,20 @@ rx-pcase
                         (backref 1))
                     (list u v)))
                  '("1" "3")))
+  (should (equal (pcase "bz"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   (_ 'no))
+                 'no))
+  (should (equal (pcase "az"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   ((rx "b" (let x nonl)) (list 2 x))
+                   (_ 'no))
+                 '(1 "z")))
+  (should (equal (pcase "bz"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   ((rx "b" (let x nonl)) (list 2 x))
+                   (_ 'no))
+                 '(2 "z")))
   (let ((k "blue"))
     (should (equal (pcase "<blue>"
                      ((rx "<" (literal k) ">") 'ok))
-- 
2.21.1 (Apple Git-122.3)


  reply	other threads:[~2021-05-18 10:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 11:34 bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns Philipp Stephani
2021-05-18 10:44 ` Mattias Engdegård [this message]
2021-05-18 11:09   ` Philipp Stephani
2021-05-18 11:12     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CE446C90-C87B-463E-9847-16A494B4CE7F@acm.org \
    --to=mattiase@acm.org \
    --cc=48477@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).