unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing
@ 2022-06-11  4:38 Richard Hansen
  2022-06-11  8:11 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Hansen @ 2022-06-11  4:38 UTC (permalink / raw)
  To: 55897; +Cc: monnier


[-- Attachment #1.1.1: Type: text/plain, Size: 696 bytes --]

X-Debbugs-CC: monnier@iro.umontreal.ca

Two patches attached:

Patch 1:

     bindat (str, strz): Reject multibyte input strings

     * lisp/emacs-lisp/bindat.el (str) (strz): Signal an error if the user
     attempts to pack a multibyte string.
     * test/lisp/emacs-lisp/bindat-tests.el (str) (strz): Add tests.

Patch 2:

     bindat (str, strz): Convert to unibyte when packing

     * lisp/emacs-lisp/bindat.el (str) (strz): Allow callers to pack a
     multibyte string if it only contains ASCII and `eight-bit' characters.
     * doc/lispref/processes.texi (Bindat Types): Update documentation.
     * test/lisp/emacs-lisp/bindat-tests.el (str) (strz): Update tests.

[-- Attachment #1.1.2: 0001-bindat-str-strz-Reject-multibyte-input-strings.patch --]
[-- Type: text/x-patch, Size: 2522 bytes --]

From 80cf0f3c1652196fc689bf72ca3b751fb3c52a01 Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Sun, 5 Jun 2022 23:44:42 -0400
Subject: [PATCH 1/2] bindat (str, strz): Reject multibyte input strings

* lisp/emacs-lisp/bindat.el (str) (strz): Signal an error if the user
attempts to pack a multibyte string.
* test/lisp/emacs-lisp/bindat-tests.el (str) (strz): Add tests.
---
 lisp/emacs-lisp/bindat.el            |  4 ++++
 test/lisp/emacs-lisp/bindat-tests.el | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el
index 5f3c772983..9ac24fa008 100644
--- a/lisp/emacs-lisp/bindat.el
+++ b/lisp/emacs-lisp/bindat.el
@@ -435,11 +435,15 @@ bindat--pack-u64r
   (bindat--pack-u32r (ash v -32)))
 
 (defun bindat--pack-str (len v)
+  (if (multibyte-string-p v)
+      (signal 'wrong-type-argument `(multibyte-string-p ,v)))
   (dotimes (i (min len (length v)))
     (aset bindat-raw (+ bindat-idx i) (aref v i)))
   (setq bindat-idx (+ bindat-idx len)))
 
 (defun bindat--pack-strz (v)
+  (if (multibyte-string-p v)
+      (signal 'wrong-type-argument `(multibyte-string-p ,v)))
   (let ((len (length v)))
     (dotimes (i len)
       (aset bindat-raw (+ bindat-idx i) (aref v i)))
diff --git a/test/lisp/emacs-lisp/bindat-tests.el b/test/lisp/emacs-lisp/bindat-tests.el
index 4817072752..da688d1e82 100644
--- a/test/lisp/emacs-lisp/bindat-tests.el
+++ b/test/lisp/emacs-lisp/bindat-tests.el
@@ -189,6 +189,20 @@ bindat-test--str-strz-prealloc
       (apply #'bindat-pack (append (car tc) (list prealloc)))
       (should (equal prealloc (cdr tc))))))
 
+(ert-deftest bindat-test--str-strz-multibyte ()
+  (dolist (spec (list (bindat-type str 2)
+                      (bindat-type strz 2)
+                      (bindat-type strz)))
+    (should-error (bindat-pack spec (string-to-multibyte "x")))
+    (should-error (bindat-pack spec (string-to-multibyte "\xff")))
+    (should-error (bindat-pack spec "💩"))
+    (should-error (bindat-pack spec "\N{U+ff}")))
+  (dolist (spec (list '((x str 2)) '((x strz 2))))
+    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "x")))))
+    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "\xff")))))
+    (should-error (bindat-pack spec '((x . "💩"))))
+    (should-error (bindat-pack spec '((x . "\N{U+ff}"))))))
+
 (let ((spec (bindat-type strz 2)))
   (ert-deftest bindat-test--strz-fixedlen-len ()
     (should (equal (bindat-length spec "") 2))
-- 
2.36.1


[-- Attachment #1.1.3: 0002-bindat-str-strz-Convert-to-unibyte-when-packing.patch --]
[-- Type: text/x-patch, Size: 4715 bytes --]

From 6a4de050d3d9407ca0b3de48e4fb4a6a2b3c2eb1 Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Sun, 5 Jun 2022 23:54:11 -0400
Subject: [PATCH 2/2] bindat (str, strz): Convert to unibyte when packing

* lisp/emacs-lisp/bindat.el (str) (strz): Allow callers to pack a
multibyte string if it only contains ASCII and `eight-bit' characters.
* doc/lispref/processes.texi (Bindat Types): Update documentation.
* test/lisp/emacs-lisp/bindat-tests.el (str) (strz): Update tests.
---
 doc/lispref/processes.texi           | 14 ++++++++++----
 lisp/emacs-lisp/bindat.el            | 14 ++++++--------
 test/lisp/emacs-lisp/bindat-tests.el | 10 ++++++----
 3 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 55fb93ec5a..fbf285c1cc 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3484,8 +3484,11 @@ Bindat Types
 to the packed output.  If the input string is shorter than @var{len},
 the remaining bytes will be null (zero) unless a pre-allocated string
 was provided to @code{bindat-pack}, in which case the remaining bytes
-are left unmodified.  When unpacking, any null bytes in the packed
-input string will appear in the unpacked output.
+are left unmodified.  If the input string is multibyte with only ASCII
+and @code{eight-bit} characters, it is converted to unibyte before it
+is packed; other multibyte strings signal an error.  When unpacking,
+any null bytes in the packed input string will appear in the unpacked
+output.
 
 @item strz &optional @var{len}
 If @var{len} is not provided: Variable-length null-terminated unibyte
@@ -3495,8 +3498,11 @@ Bindat Types
 @code{bindat-pack}, in which case that byte is left unmodified.  The
 length of the packed output is the length of the input string plus one
 (for the null terminator).  The input string must not contain any null
-bytes.  When unpacking, the resulting string contains all bytes up to
-(but excluding) the null byte.
+bytes.  If the input string is multibyte with only ASCII and
+@code{eight-bit} characters, it is converted to unibyte before it is
+packed; other multibyte strings signal an error.  When unpacking, the
+resulting string contains all bytes up to (but excluding) the null
+byte.
 
 @quotation Caution
 If a pre-allocated string is provided to @code{bindat-pack}, the
diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el
index 9ac24fa008..04ad09abc1 100644
--- a/lisp/emacs-lisp/bindat.el
+++ b/lisp/emacs-lisp/bindat.el
@@ -435,16 +435,14 @@ bindat--pack-u64r
   (bindat--pack-u32r (ash v -32)))
 
 (defun bindat--pack-str (len v)
-  (if (multibyte-string-p v)
-      (signal 'wrong-type-argument `(multibyte-string-p ,v)))
-  (dotimes (i (min len (length v)))
-    (aset bindat-raw (+ bindat-idx i) (aref v i)))
-  (setq bindat-idx (+ bindat-idx len)))
+  (let ((v (string-to-unibyte v)))
+    (dotimes (i (min len (length v)))
+      (aset bindat-raw (+ bindat-idx i) (aref v i)))
+    (setq bindat-idx (+ bindat-idx len))))
 
 (defun bindat--pack-strz (v)
-  (if (multibyte-string-p v)
-      (signal 'wrong-type-argument `(multibyte-string-p ,v)))
-  (let ((len (length v)))
+  (let* ((v (string-to-unibyte v))
+         (len (length v)))
     (dotimes (i len)
       (aset bindat-raw (+ bindat-idx i) (aref v i)))
     (setq bindat-idx (+ bindat-idx len 1))))
diff --git a/test/lisp/emacs-lisp/bindat-tests.el b/test/lisp/emacs-lisp/bindat-tests.el
index da688d1e82..d33f1c01a2 100644
--- a/test/lisp/emacs-lisp/bindat-tests.el
+++ b/test/lisp/emacs-lisp/bindat-tests.el
@@ -193,13 +193,15 @@ bindat-test--str-strz-multibyte
   (dolist (spec (list (bindat-type str 2)
                       (bindat-type strz 2)
                       (bindat-type strz)))
-    (should-error (bindat-pack spec (string-to-multibyte "x")))
-    (should-error (bindat-pack spec (string-to-multibyte "\xff")))
+    (should (equal (bindat-pack spec (string-to-multibyte "x")) "x\0"))
+    (should (equal (bindat-pack spec (string-to-multibyte "\xff")) "\xff\0"))
     (should-error (bindat-pack spec "💩"))
     (should-error (bindat-pack spec "\N{U+ff}")))
   (dolist (spec (list '((x str 2)) '((x strz 2))))
-    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "x")))))
-    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "\xff")))))
+    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "x"))))
+                   "x\0"))
+    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "\xff"))))
+                   "\xff\0"))
     (should-error (bindat-pack spec '((x . "💩"))))
     (should-error (bindat-pack spec '((x . "\N{U+ff}"))))))
 
-- 
2.36.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing
  2022-06-11  4:38 bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing Richard Hansen
@ 2022-06-11  8:11 ` Eli Zaretskii
  2022-06-12  5:23   ` Richard Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2022-06-11  8:11 UTC (permalink / raw)
  To: Richard Hansen; +Cc: 55897, monnier

> Cc: monnier@iro.umontreal.ca
> Date: Sat, 11 Jun 2022 00:38:00 -0400
> From: Richard Hansen <rhansen@rhansen.org>
> 
>  (defun bindat--pack-str (len v)
> +  (if (multibyte-string-p v)
> +      (signal 'wrong-type-argument `(multibyte-string-p ,v)))

Isn't this too strict?  First, a string can be multibyte and
pure-ASCII:

  (let ((str (decode-coding-string "abcde" 'utf-8)))
    (multibyte-string-p str))
    => t

Shouldn't it be possible to use such strings here?

Furthermore, I think you said you wanted to extend bindat so it could
use multibyte string that contain ASCII and eight-bit characters?  If
so, this sounds like shooting ourselves in the foot?

>  (defun bindat--pack-str (len v)
> -  (if (multibyte-string-p v)
> -      (signal 'wrong-type-argument `(multibyte-string-p ,v)))
> -  (dotimes (i (min len (length v)))
> -    (aset bindat-raw (+ bindat-idx i) (aref v i)))
> -  (setq bindat-idx (+ bindat-idx len)))
> +  (let ((v (string-to-unibyte v)))
> +    (dotimes (i (min len (length v)))
> +      (aset bindat-raw (+ bindat-idx i) (aref v i)))
> +    (setq bindat-idx (+ bindat-idx len))))

And here you remove that error back?  Why does it make sense to
introduce an error message, only to remove it in the very next commit?
Please instead make a single change which incorporates both.

> --- a/test/lisp/emacs-lisp/bindat-tests.el
> +++ b/test/lisp/emacs-lisp/bindat-tests.el
> @@ -193,13 +193,15 @@ bindat-test--str-strz-multibyte
>    (dolist (spec (list (bindat-type str 2)
>                        (bindat-type strz 2)
>                        (bindat-type strz)))
> -    (should-error (bindat-pack spec (string-to-multibyte "x")))
> -    (should-error (bindat-pack spec (string-to-multibyte "\xff")))
> +    (should (equal (bindat-pack spec (string-to-multibyte "x")) "x\0"))
> +    (should (equal (bindat-pack spec (string-to-multibyte "\xff")) "\xff\0"))
>      (should-error (bindat-pack spec "💩"))
>      (should-error (bindat-pack spec "\N{U+ff}")))
>    (dolist (spec (list '((x str 2)) '((x strz 2))))
> -    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "x")))))
> -    (should-error (bindat-pack spec `((x . ,(string-to-multibyte "\xff")))))
> +    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "x"))))
> +                   "x\0"))
> +    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "\xff"))))
> +                   "\xff\0"))
>      (should-error (bindat-pack spec '((x . "💩"))))
>      (should-error (bindat-pack spec '((x . "\N{U+ff}"))))))

Likewise here.

Thanks.

P.S. Please also mention the bug number in the log message of the next
version of the patch, since the number is now known.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing
  2022-06-11  8:11 ` Eli Zaretskii
@ 2022-06-12  5:23   ` Richard Hansen
  2022-06-12  7:00     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Hansen @ 2022-06-12  5:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55897, monnier


[-- Attachment #1.1.1: Type: text/plain, Size: 228 bytes --]

> Please instead make a single change which incorporates both.

Done; see attached.

> P.S. Please also mention the bug number in the log message of the next
> version of the patch, since the number is now known.

Done.

[-- Attachment #1.1.2: v2-0001-bindat-str-strz-Reject-non-ASCII-non-eight-bit-ch.patch --]
[-- Type: text/x-patch, Size: 4594 bytes --]

From 344bec1fa01ff4d12c352c220237fe8c262e91a4 Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Sun, 12 Jun 2022 01:19:43 -0400
Subject: [PATCH v2] bindat (str, strz): Reject non-ASCII, non-`eight-bit'
 characters

* lisp/emacs-lisp/bindat.el (str) (strz): Signal an error if the user
attempts to pack a multibyte string containing characters other than
ASCII and `eight-bit' characters (bug#55897).
* doc/lispref/processes.texi (Bindat Types): Update documentation.
* test/lisp/emacs-lisp/bindat-tests.el (str) (strz): Add tests.
---
 doc/lispref/processes.texi           | 14 ++++++++++----
 lisp/emacs-lisp/bindat.el            | 10 ++++++----
 test/lisp/emacs-lisp/bindat-tests.el | 16 ++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index aa4d0e3ee4..8c8f8fd6b2 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3486,8 +3486,11 @@ Bindat Types
 to the packed output.  If the input string is shorter than @var{len},
 the remaining bytes will be null (zero) unless a pre-allocated string
 was provided to @code{bindat-pack}, in which case the remaining bytes
-are left unmodified.  When unpacking, any null bytes in the packed
-input string will appear in the unpacked output.
+are left unmodified.  If the input string is multibyte with only ASCII
+and @code{eight-bit} characters, it is converted to unibyte before it
+is packed; other multibyte strings signal an error.  When unpacking,
+any null bytes in the packed input string will appear in the unpacked
+output.
 
 @item strz &optional @var{len}
 If @var{len} is not provided: Variable-length null-terminated unibyte
@@ -3497,8 +3500,11 @@ Bindat Types
 @code{bindat-pack}, in which case that byte is left unmodified.  The
 length of the packed output is the length of the input string plus one
 (for the null terminator).  The input string must not contain any null
-bytes.  When unpacking, the resulting string contains all bytes up to
-(but excluding) the null byte.
+bytes.  If the input string is multibyte with only ASCII and
+@code{eight-bit} characters, it is converted to unibyte before it is
+packed; other multibyte strings signal an error.  When unpacking, the
+resulting string contains all bytes up to (but excluding) the null
+byte.
 
 @quotation Caution
 If a pre-allocated string is provided to @code{bindat-pack}, the
diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el
index 84d5ea1e3b..2d6589b52d 100644
--- a/lisp/emacs-lisp/bindat.el
+++ b/lisp/emacs-lisp/bindat.el
@@ -435,12 +435,14 @@ bindat--pack-u64r
   (bindat--pack-u32r (ash v -32)))
 
 (defun bindat--pack-str (len v)
-  (dotimes (i (min len (length v)))
-    (aset bindat-raw (+ bindat-idx i) (aref v i)))
-  (setq bindat-idx (+ bindat-idx len)))
+  (let ((v (string-to-unibyte v)))
+    (dotimes (i (min len (length v)))
+      (aset bindat-raw (+ bindat-idx i) (aref v i)))
+    (setq bindat-idx (+ bindat-idx len))))
 
 (defun bindat--pack-strz (v)
-  (let ((len (length v)))
+  (let* ((v (string-to-unibyte v))
+         (len (length v)))
     (dotimes (i len)
       (aset bindat-raw (+ bindat-idx i) (aref v i)))
     (setq bindat-idx (+ bindat-idx len 1))))
diff --git a/test/lisp/emacs-lisp/bindat-tests.el b/test/lisp/emacs-lisp/bindat-tests.el
index 1ce402977f..8bb3baa485 100644
--- a/test/lisp/emacs-lisp/bindat-tests.el
+++ b/test/lisp/emacs-lisp/bindat-tests.el
@@ -188,6 +188,22 @@ bindat-test--str-strz-prealloc
       (apply #'bindat-pack (append (car tc) (list prealloc)))
       (should (equal prealloc (cdr tc))))))
 
+(ert-deftest bindat-test--str-strz-multibyte ()
+  (dolist (spec (list (bindat-type str 2)
+                      (bindat-type strz 2)
+                      (bindat-type strz)))
+    (should (equal (bindat-pack spec (string-to-multibyte "x")) "x\0"))
+    (should (equal (bindat-pack spec (string-to-multibyte "\xff")) "\xff\0"))
+    (should-error (bindat-pack spec "💩"))
+    (should-error (bindat-pack spec "\N{U+ff}")))
+  (dolist (spec (list '((x str 2)) '((x strz 2))))
+    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "x"))))
+                   "x\0"))
+    (should (equal (bindat-pack spec `((x . ,(string-to-multibyte "\xff"))))
+                   "\xff\0"))
+    (should-error (bindat-pack spec '((x . "💩"))))
+    (should-error (bindat-pack spec '((x . "\N{U+ff}"))))))
+
 (let ((spec (bindat-type strz 2)))
   (ert-deftest bindat-test--strz-fixedlen-len ()
     (should (equal (bindat-length spec "") 2))
-- 
2.36.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing
  2022-06-12  5:23   ` Richard Hansen
@ 2022-06-12  7:00     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2022-06-12  7:00 UTC (permalink / raw)
  To: Richard Hansen; +Cc: 55897-done, monnier

> Date: Sun, 12 Jun 2022 01:23:17 -0400
> Cc: 55897@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Richard Hansen <rhansen@rhansen.org>
> 
> > Please instead make a single change which incorporates both.
> 
> Done; see attached.
> 
> > P.S. Please also mention the bug number in the log message of the next
> > version of the patch, since the number is now known.
> 
> Done.

Thanks, installed.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-12  7:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-11  4:38 bug#55897: [PATCH] bindat (str, strz): Convert to unibyte when packing Richard Hansen
2022-06-11  8:11 ` Eli Zaretskii
2022-06-12  5:23   ` Richard Hansen
2022-06-12  7:00     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).