unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns
@ 2021-05-17 11:34 Philipp Stephani
  2021-05-18 10:44 ` Mattias Engdegård
  0 siblings, 1 reply; 4+ messages in thread
From: Philipp Stephani @ 2021-05-17 11:34 UTC (permalink / raw)
  To: 48477


Consider the following pcase form:

(require 'rx)
(pcase string
  ((rx bos (let prefix ?@) (* (not (any ?: ?/))) eos)
   (list 1 prefix))
  ((rx bos (let prefix (* (not (any ?:))) "/...:" eos))
   (list 2 prefix)))

The two branches should be disjoint; e.g. "@foo//...:" should match only
the second, not the first.  Emacs 27.2 agrees and generates the
following code:

(cond
 ((string-match "\\`\\(?1:@\\)[^/:]*\\'" string)
  (let*
      ((#1=#:x457
        (match-string 1 string)))
    (let
        ((prefix #1#))
      (list 1 prefix))))
 ((string-match "\\`\\(?1:[^:]*/\\.\\.\\.:\\'\\)" string)
  (let*
      ((#2=#:x458
        (match-string 1 string)))
    (let
        ((prefix #2#))
      (list 2 prefix))))
 (t nil))

However, Emacs master prints the following warning:

    Warning: pcase pattern (rx bos (let prefix (* (not (any 58))) "/...:" eos)) shadowed by previous pcase pattern

and generates this code:

(if
    (stringp string)
    (let*
	((#1=#:x42
	  (funcall
	   #'(lambda
	       (s)
	       (and
		(string-match "\\`\\(?1:@\\)[^/:]*\\'" s)
		(match-string 1 s)))
	   string)))
      (let
	  ((prefix #1#))
	(list 1 prefix))))

which looks clearly wrong (and also needlessly complex).


In GNU Emacs 28.0.50 (build 104, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2021-05-17
Repository revision: 42950e9e4647c28f56c72cc27ef96edbafcbe5cd
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12010000
System Description: Debian GNU/Linux rodete

Configured using:
 'configure --enable-gcc-warnings=warn-only
 --enable-gtk-deprecation-warnings --without-pop --with-mailutils
 --enable-checking=all --enable-check-lisp-object-type --with-modules
 'CFLAGS=-O0 -ggdb3''

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBSYSTEMD MODULES NOTIFY INOTIFY PDUMPER PNG SECCOMP SOUND
THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LC_TIME: en_DK.utf8
  value of $LANG: en_US.utf8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc dired dired-loaddefs rfc822
mml mml-sec epa epg epg-config gnus-util rmail rmail-loaddefs time-date
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils phst skeleton derived edmacro kmacro pcase ffap thingatpt url
url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util url-parse auth-source cl-seq eieio eieio-core
cl-macs eieio-loaddefs password-cache json map url-vars mailcap rx
gnutls puny dbus xml subr-x seq byte-opt gv bytecomp byte-compile cconv
compile text-property-search comint ansi-color ring cl-loaddefs cl-lib
iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 69432 7725)
 (symbols 48 8422 1)
 (strings 32 24391 1618)
 (string-bytes 1 789459)
 (vectors 16 15075)
 (vector-slots 8 195624 5994)
 (floats 8 26 32)
 (intervals 56 223 0)
 (buffers 992 11))

-- 
Google Germany GmbH
Erika-Mann-Straße 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich.  Falls Sie diese fälschlicherweise erhalten haben
sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie
alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail
an die falsche Person gesendet wurde.

This e-mail is confidential.  If you received this communication by mistake,
please don’t forward it to anyone else, please erase all copies and
attachments, and please let me know that it has gone to the wrong person.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns
  2021-05-17 11:34 bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns Philipp Stephani
@ 2021-05-18 10:44 ` Mattias Engdegård
  2021-05-18 11:09   ` Philipp Stephani
  0 siblings, 1 reply; 4+ messages in thread
From: Mattias Engdegård @ 2021-05-18 10:44 UTC (permalink / raw)
  To: Philipp; +Cc: 48477, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

Serves me right for trying to be clever! Very sorry about that.

Matches would always succeed because the outcome was erroneously transformed into a match against a plain pcase variable which never fails. For example, the pattern

 (rx (let x "a"))

would expand to
    
 (and (pred stringp)
      (app (lambda (s) (and (string-match (rx (group-n 1 "a")) s)
                            (match-string 1 s)))
           x))

which cannot fail (as long as the input is a string).  Patterns with two or more named submatches are not affected because of the structural match used, and zero submatches were treated specially anyway.

Please try the attached patch. It encodes non-matches as the number 0 (any non-nil non-string value would have done; 0 is cheap to create and test). The above pattern now expands to

 (and (pred stringp)
      (app (lambda (s) (if (string-match (rx (group-n 1 "a")) s)
                           (match-string 1 s)
                         0))
           (and x (pred (not numberp)))))


[-- Attachment #2: 0001-Fix-pcase-rx-patterns-with-a-single-named-submatch-b.patch --]
[-- Type: application/octet-stream, Size: 3885 bytes --]

From be9db2b94d31a0afe3f93302558b3a78605244c7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Tue, 18 May 2021 12:03:11 +0200
Subject: [PATCH] Fix pcase 'rx' patterns with a single named submatch
 (bug#48477)

pcase 'rx' patterns with a single named submatch, like

  (rx (let x "a"))

would always succeed because of an over-optimistic transformation.
Patterns with 0 or more than 1 named submatches were not affected.

Reported by Philipp Stephani.

* lisp/emacs-lisp/rx.el (rx--pcase-macroexpander):
Special case for a single named submatch.
* test/lisp/emacs-lisp/rx-tests.el (rx-pcase): Add tests.
---
 lisp/emacs-lisp/rx.el            | 21 ++++++++++++++++-----
 test/lisp/emacs-lisp/rx-tests.el | 14 ++++++++++++++
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el
index 1e3eb9c12b..43bd84d999 100644
--- a/lisp/emacs-lisp/rx.el
+++ b/lisp/emacs-lisp/rx.el
@@ -1445,12 +1445,23 @@ rx
          (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps))))
          (nvars (length rx--pcase-vars)))
     `(and (pred stringp)
-          ,(if (zerop nvars)
-               ;; No variables bound: a single predicate suffices.
-               `(pred (string-match ,regexp))
+          ,(pcase nvars
+            (0
+             ;; No variables bound: a single predicate suffices.
+             `(pred (string-match ,regexp)))
+            (1
+             ;; Create a match value that on a successful regexp match
+             ;; is the submatch value, 0 on failure.  We can't use nil
+             ;; for failure because it is a valid submatch value.
+             `(app (lambda (s)
+                     (if (string-match ,regexp s)
+                         (match-string 1 s)
+                       0))
+                   (and ,(car rx--pcase-vars) (pred (not numberp)))))
+            (_
              ;; Pack the submatches into a dotted list which is then
              ;; immediately destructured into individual variables again.
-             ;; This is of course slightly inefficient when NVARS > 1.
+             ;; This is of course slightly inefficient.
              ;; A dotted list is used to reduce the number of conses
              ;; to create and take apart.
              `(app (lambda (s)
@@ -1463,7 +1474,7 @@ rx
                           (rx--reduce-right
                            #'cons
                            (mapcar (lambda (name) (list '\, name))
-                                   (reverse rx--pcase-vars)))))))))
+                                   (reverse rx--pcase-vars))))))))))
 
 ;; Obsolete internal symbol, used in old versions of the `flycheck' package.
 (define-obsolete-function-alias 'rx-submatch-n 'rx-to-string "27.1")
diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el
index 2dd1bca22d..4828df0de9 100644
--- a/test/lisp/emacs-lisp/rx-tests.el
+++ b/test/lisp/emacs-lisp/rx-tests.el
@@ -166,6 +166,20 @@ rx-pcase
                         (backref 1))
                     (list u v)))
                  '("1" "3")))
+  (should (equal (pcase "bz"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   (_ 'no))
+                 'no))
+  (should (equal (pcase "az"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   ((rx "b" (let x nonl)) (list 2 x))
+                   (_ 'no))
+                 '(1 "z")))
+  (should (equal (pcase "bz"
+                   ((rx "a" (let x nonl)) (list 1 x))
+                   ((rx "b" (let x nonl)) (list 2 x))
+                   (_ 'no))
+                 '(2 "z")))
   (let ((k "blue"))
     (should (equal (pcase "<blue>"
                      ((rx "<" (literal k) ">") 'ok))
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns
  2021-05-18 10:44 ` Mattias Engdegård
@ 2021-05-18 11:09   ` Philipp Stephani
  2021-05-18 11:12     ` Mattias Engdegård
  0 siblings, 1 reply; 4+ messages in thread
From: Philipp Stephani @ 2021-05-18 11:09 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 48477, Stefan Monnier

Am Di., 18. Mai 2021 um 12:44 Uhr schrieb Mattias Engdegård <mattiase@acm.org>:

> Please try the attached patch.

Thanks, that fixes my use case.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns
  2021-05-18 11:09   ` Philipp Stephani
@ 2021-05-18 11:12     ` Mattias Engdegård
  0 siblings, 0 replies; 4+ messages in thread
From: Mattias Engdegård @ 2021-05-18 11:12 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: Stefan Monnier, 48477-done

18 maj 2021 kl. 13.09 skrev Philipp Stephani <p.stephani2@gmail.com>:

> Thanks, that fixes my use case.

Thank you for testing! Pushed and closed.






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-18 11:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 11:34 bug#48477: 28.0.50; Seemingly incorrect codegen with multiple string-matching pcase patterns Philipp Stephani
2021-05-18 10:44 ` Mattias Engdegård
2021-05-18 11:09   ` Philipp Stephani
2021-05-18 11:12     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).