unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
@ 2018-01-22 22:55 Philipp
  2019-10-09 21:29 ` Lars Ingebrigtsen
  2019-10-10 14:34 ` Mattias Engdegård
  0 siblings, 2 replies; 8+ messages in thread
From: Philipp @ 2018-01-22 22:55 UTC (permalink / raw)
  To: 30219


Define the following ERT test:

(ert-deftest foo ()
  (should (equal "a\xFF" "a\u00FF")))

Then run M-x ert.  The test triggers an assertion.  This is because ERT
incorrectly assumes that two arrays are equal if their lengths and
elements are equal, but that's not the case when comparing unibyte and
multibyte strings.


In GNU Emacs 27.0.50 (build 10, x86_64-apple-darwin17.3.0, NS appkit-1561.20 Version 10.13.2 (Build 17C205))
 of 2018-01-22 built on p
Repository revision: 3558d96b60393893a346f4382b813ca0738f9d9b
Windowing system distributor 'Apple', version 10.3.1561
System Description:  Mac OS X 10.13.2

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --without-threads --with-modules --without-pop
 --with-mailutils --enable-gcc-warnings=yes --enable-checking
 --enable-check-lisp-object-type 'CFLAGS=-ggdb3 -O0''

Configured features:
NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES JSON

Important settings:
  value of $LANG: de_DE.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair time-date
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/ns-win ns-win ucs-normalize mule-util term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote kqueue cocoa ns
multi-tty make-network-process emacs)

Memory information:
((conses 16 204871 8960)
 (symbols 48 20154 1)
 (miscs 40 57 145)
 (strings 32 28911 1996)
 (string-bytes 1 771735)
 (vectors 16 35241)
 (vector-slots 8 721940 14772)
 (floats 8 52 64)
 (intervals 56 208 0)
 (buffers 992 11))





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2018-01-22 22:55 bug#30219: 27.0.50; (should (equal ...)) bug for string equality Philipp
@ 2019-10-09 21:29 ` Lars Ingebrigtsen
  2019-10-10 14:34 ` Mattias Engdegård
  1 sibling, 0 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-09 21:29 UTC (permalink / raw)
  To: Philipp; +Cc: 30219

Philipp <p.stephani2@gmail.com> writes:

> Define the following ERT test:
>
> (ert-deftest foo ()
>   (should (equal "a\xFF" "a\u00FF")))
>
> Then run M-x ert.  The test triggers an assertion.  This is because ERT
> incorrectly assumes that two arrays are equal if their lengths and
> elements are equal, but that's not the case when comparing unibyte and
> multibyte strings.

I tried putting

(ert-deftest foo ()
  (should (equal "a\xFF" "a\u00FF")))

into one of the test files and I got

1 unexpected results:
   FAILED  foo

So I'm not able to reproduce this bug.  But the code in question hasn't
changed since this was reported, so I'm not sure why I can't reproduce
it.

Does anybody else see this?

      ((pred arrayp)
       (if (/= (length a) (length b))
           `(arrays-of-different-length ,(length a) ,(length b)
                                        ,a ,b
                                        ,@(unless (char-table-p a)
                                            `(first-mismatch-at
                                              ,(cl-mismatch a b :test 'equal))))
         (cl-loop for i from 0
                  for ai across a
                  for bi across b
                  for xi = (ert--explain-equal-rec ai bi)
                  do (when xi (cl-return `(array-elt ,i ,xi)))
                  finally (cl-assert (equal a b) t))))



-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2018-01-22 22:55 bug#30219: 27.0.50; (should (equal ...)) bug for string equality Philipp
  2019-10-09 21:29 ` Lars Ingebrigtsen
@ 2019-10-10 14:34 ` Mattias Engdegård
  2019-10-10 19:29   ` Mattias Engdegård
  1 sibling, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2019-10-10 14:34 UTC (permalink / raw)
  To: 30219; +Cc: Lars Ingebrigtsen, Philipp

The bug is directly reproducible, and not very surprising given the subtleties in mixed uni/multibyte string comparisons. ERT treats strings as arrays to be compared element-wise via `aref', which isn't how `equal' works.

There are several solutions, none perfect. I'll be back with a patch later today or tomorrow.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2019-10-10 14:34 ` Mattias Engdegård
@ 2019-10-10 19:29   ` Mattias Engdegård
  2019-10-11  5:43     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2019-10-10 19:29 UTC (permalink / raw)
  To: 30219; +Cc: Lars Ingebrigtsen, Philipp

[-- Attachment #1: Type: text/plain, Size: 125 bytes --]

> There are several solutions, none perfect. I'll be back with a patch later today or tomorrow.

Please try this patch.


[-- Attachment #2: 0001-Correctly-explain-test-failures-with-mixed-uni-multi.patch --]
[-- Type: application/octet-stream, Size: 3827 bytes --]

From dc8570aee0fd01ae55c2214160fb609626fbcf4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 10 Oct 2019 21:20:20 +0200
Subject: [PATCH] Correctly explain test failures with mixed uni/multibyte
 strings

* lisp/emacs-lisp/ert.el (ert--explain-equal-rec)
(ert--explain-equal-including-properties):
* test/lisp/emacs-lisp/ert-tests.el (ert-test-explain-equal):
When explaining a difference between a unibyte and a multibyte string,
first convert both to multibyte.  Otherwise, we might end up comparing
unibyte char C to multibyte char C, 127<C<256, and not detect the
difference (bug#30219).
---
 lisp/emacs-lisp/ert.el            |  9 +++++++++
 test/lisp/emacs-lisp/ert-tests.el | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/lisp/emacs-lisp/ert.el b/lisp/emacs-lisp/ert.el
index da241e6304..880a40dafd 100644
--- a/lisp/emacs-lisp/ert.el
+++ b/lisp/emacs-lisp/ert.el
@@ -512,6 +512,11 @@ ert--explain-equal-rec
                      (cl-assert (equal a b) t)
                      nil))))))))
       ((pred arrayp)
+       ;; For mixed unibyte/multibyte string comparisons, make both multibyte.
+       (when (and (stringp a)
+                  (xor (multibyte-string-p a) (multibyte-string-p b)))
+         (setq a (string-to-multibyte a))
+         (setq b (string-to-multibyte b)))
        (if (/= (length a) (length b))
            `(arrays-of-different-length ,(length a) ,(length b)
                                         ,a ,b
@@ -606,6 +611,10 @@ ert--explain-equal-including-properties
     (cl-assert (stringp a) t)
     (cl-assert (stringp b) t)
     (cl-assert (eql (length a) (length b)) t)
+    ;; For mixed unibyte/multibyte comparisons, make both multibyte.
+    (when (xor (multibyte-string-p a) (multibyte-string-p b))
+      (setq a (string-to-multibyte a))
+      (setq b (string-to-multibyte b)))
     (cl-loop for i from 0 to (length a)
              for props-a = (text-properties-at i a)
              for props-b = (text-properties-at i b)
diff --git a/test/lisp/emacs-lisp/ert-tests.el b/test/lisp/emacs-lisp/ert-tests.el
index 36db1eeb42..3a9e81595b 100644
--- a/test/lisp/emacs-lisp/ert-tests.el
+++ b/test/lisp/emacs-lisp/ert-tests.el
@@ -627,6 +627,29 @@ ert-test-explain-equal
     (should (equal (ert--explain-equal 'a sym)
                    `(different-symbols-with-the-same-name a ,sym)))))
 
+(ert-deftest ert-test-explain-equal-strings ()
+  (should (equal (ert--explain-equal "abc" "axc")
+                 '(array-elt 1 (different-atoms
+                                (?b "#x62" "?b")
+                                (?x "#x78" "?x")))))
+  (should (equal (ert--explain-equal "abc" "abxc")
+                 '(arrays-of-different-length
+                   3 4 "abc" "abxc" first-mismatch-at 2)))
+  (should (equal (ert--explain-equal "xyA" "xyÅ")
+                 '(array-elt 2 (different-atoms
+                                (?A "#x41" "?A")
+                                (?Å "#xc5" "?Å")))))
+  (should (equal (ert--explain-equal "m\xff" "m\u00ff")
+                 `(array-elt
+                   1 (different-atoms
+                      (#x3fffff "#x3fffff" ,(string-to-multibyte "?\xff"))
+                      (#xff "#xff" "?ÿ")))))
+  (should (equal (ert--explain-equal (string-to-multibyte "m\xff") "m\u00ff")
+                 `(array-elt
+                   1 (different-atoms
+                      (#x3fffff "#x3fffff" ,(string-to-multibyte "?\xff"))
+                      (#xff "#xff" "?ÿ"))))))
+
 (ert-deftest ert-test-explain-equal-improper-list ()
   (should (equal (ert--explain-equal '(a . b) '(a . c))
                  '(cdr (different-atoms b c)))))
-- 
2.21.0 (Apple Git-122)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2019-10-10 19:29   ` Mattias Engdegård
@ 2019-10-11  5:43     ` Lars Ingebrigtsen
  2019-10-11  9:21       ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-11  5:43 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 30219, Philipp

Mattias Engdegård <mattiase@acm.org> writes:

>        ((pred arrayp)
> +       ;; For mixed unibyte/multibyte string comparisons, make both multibyte.
> +       (when (and (stringp a)
> +                  (xor (multibyte-string-p a) (multibyte-string-p b)))
> +         (setq a (string-to-multibyte a))
> +         (setq b (string-to-multibyte b)))

I guess that makes sense.  We (possibly) lose text properties here, but
we don't care about those anyway in this context, I think?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2019-10-11  5:43     ` Lars Ingebrigtsen
@ 2019-10-11  9:21       ` Mattias Engdegård
  2019-10-13 18:10         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2019-10-11  9:21 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 30219, Philipp

[-- Attachment #1: Type: text/plain, Size: 245 bytes --]

11 okt. 2019 kl. 07.43 skrev Lars Ingebrigtsen <larsi@gnus.org>:
> 
> I guess that makes sense.  We (possibly) lose text properties here, but
> we don't care about those anyway in this context, I think?

Oops, that wasn't intended. Second try.


[-- Attachment #2: 0001-Correctly-explain-test-failures-with-mixed-uni-multi.patch --]
[-- Type: application/octet-stream, Size: 3238 bytes --]

From 48514a003a591ca6ef9d5b2cc9678727b3d93b47 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 10 Oct 2019 21:20:20 +0200
Subject: [PATCH] Correctly explain test failures with mixed uni/multibyte
 strings

* lisp/emacs-lisp/ert.el (ert--explain-equal-rec):
* test/lisp/emacs-lisp/ert-tests.el (ert-test-explain-equal):
When explaining a difference between a unibyte and a multibyte string,
first convert both to multibyte.  Otherwise, we might end up comparing
unibyte char C to multibyte char C, 127<C<256, and not detect the
difference (bug#30219).
---
 lisp/emacs-lisp/ert.el            |  5 +++++
 test/lisp/emacs-lisp/ert-tests.el | 23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/lisp/emacs-lisp/ert.el b/lisp/emacs-lisp/ert.el
index da241e6304..16a6d753f3 100644
--- a/lisp/emacs-lisp/ert.el
+++ b/lisp/emacs-lisp/ert.el
@@ -512,6 +512,11 @@ ert--explain-equal-rec
                      (cl-assert (equal a b) t)
                      nil))))))))
       ((pred arrayp)
+       ;; For mixed unibyte/multibyte string comparisons, make both multibyte.
+       (when (and (stringp a)
+                  (xor (multibyte-string-p a) (multibyte-string-p b)))
+         (setq a (string-to-multibyte a))
+         (setq b (string-to-multibyte b)))
        (if (/= (length a) (length b))
            `(arrays-of-different-length ,(length a) ,(length b)
                                         ,a ,b
diff --git a/test/lisp/emacs-lisp/ert-tests.el b/test/lisp/emacs-lisp/ert-tests.el
index 36db1eeb42..3a9e81595b 100644
--- a/test/lisp/emacs-lisp/ert-tests.el
+++ b/test/lisp/emacs-lisp/ert-tests.el
@@ -627,6 +627,29 @@ ert-test-explain-equal
     (should (equal (ert--explain-equal 'a sym)
                    `(different-symbols-with-the-same-name a ,sym)))))
 
+(ert-deftest ert-test-explain-equal-strings ()
+  (should (equal (ert--explain-equal "abc" "axc")
+                 '(array-elt 1 (different-atoms
+                                (?b "#x62" "?b")
+                                (?x "#x78" "?x")))))
+  (should (equal (ert--explain-equal "abc" "abxc")
+                 '(arrays-of-different-length
+                   3 4 "abc" "abxc" first-mismatch-at 2)))
+  (should (equal (ert--explain-equal "xyA" "xyÅ")
+                 '(array-elt 2 (different-atoms
+                                (?A "#x41" "?A")
+                                (?Å "#xc5" "?Å")))))
+  (should (equal (ert--explain-equal "m\xff" "m\u00ff")
+                 `(array-elt
+                   1 (different-atoms
+                      (#x3fffff "#x3fffff" ,(string-to-multibyte "?\xff"))
+                      (#xff "#xff" "?ÿ")))))
+  (should (equal (ert--explain-equal (string-to-multibyte "m\xff") "m\u00ff")
+                 `(array-elt
+                   1 (different-atoms
+                      (#x3fffff "#x3fffff" ,(string-to-multibyte "?\xff"))
+                      (#xff "#xff" "?ÿ"))))))
+
 (ert-deftest ert-test-explain-equal-improper-list ()
   (should (equal (ert--explain-equal '(a . b) '(a . c))
                  '(cdr (different-atoms b c)))))
-- 
2.21.0 (Apple Git-122)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2019-10-11  9:21       ` Mattias Engdegård
@ 2019-10-13 18:10         ` Lars Ingebrigtsen
  2019-10-13 18:30           ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2019-10-13 18:10 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 30219, Philipp

Mattias Engdegård <mattiase@acm.org> writes:

>        ((pred arrayp)
> +       ;; For mixed unibyte/multibyte string comparisons, make both multibyte.
> +       (when (and (stringp a)
> +                  (xor (multibyte-string-p a) (multibyte-string-p b)))
> +         (setq a (string-to-multibyte a))
> +         (setq b (string-to-multibyte b)))

Looks good to me, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#30219: 27.0.50; (should (equal ...)) bug for string equality
  2019-10-13 18:10         ` Lars Ingebrigtsen
@ 2019-10-13 18:30           ` Mattias Engdegård
  0 siblings, 0 replies; 8+ messages in thread
From: Mattias Engdegård @ 2019-10-13 18:30 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 30219-done, Philipp

13 okt. 2019 kl. 20.10 skrev Lars Ingebrigtsen <larsi@gnus.org>:
> 
> Looks good to me, I think.

Thank you, pushed to master and bug closed.






^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-13 18:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 22:55 bug#30219: 27.0.50; (should (equal ...)) bug for string equality Philipp
2019-10-09 21:29 ` Lars Ingebrigtsen
2019-10-10 14:34 ` Mattias Engdegård
2019-10-10 19:29   ` Mattias Engdegård
2019-10-11  5:43     ` Lars Ingebrigtsen
2019-10-11  9:21       ` Mattias Engdegård
2019-10-13 18:10         ` Lars Ingebrigtsen
2019-10-13 18:30           ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).