unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#52670: legacy base64 encoding of latin-1
@ 2021-12-19 21:47 mattiase
  2021-12-20 17:26 ` bug#52670: [PATCH] " mattiase
  0 siblings, 1 reply; 4+ messages in thread
From: mattiase @ 2021-12-19 21:47 UTC (permalink / raw)
  To: 52670

For what appears to be historical reasons, the base64 encoding functions (base64-encode-string etc) treat characters in the range U+0080..U+00FF as if they were raw bytes in the 127..255 range. This means that

  (base64-encode-string "ÿ")

and

  (base64-encode-string "\xff")

return the same result although the strings are completely different. Attempts to encode other multibyte characters fail (correctly). For example,

  (base64-encode-string "Ÿ")

signals an error, as expected.

I propose we tighten up the behavior by eliminating the legacy handling of characters in the  U+0080..U+00FF range. Letting the bug stay in place enables incorrect, brittle and error-prone usage: the functions are clearly intended to be fed encoded text only and should signal an error when not, as stated in the manual.






^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#52670: [PATCH] legacy base64 encoding of latin-1
  2021-12-19 21:47 bug#52670: legacy base64 encoding of latin-1 mattiase
@ 2021-12-20 17:26 ` mattiase
  2021-12-20 19:10   ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: mattiase @ 2021-12-20 17:26 UTC (permalink / raw)
  To: 52670

[-- Attachment #1: Type: text/plain, Size: 150 bytes --]

It really looks like the erroneous behaviour was an unintended effect of commit 680d4b87f3d88a8b79f883cf3635036747588250. Anyway, here is a patch.


[-- Attachment #2: 0001-Fix-sloppy-base64-acceptance-of-some-multibyte-chara.patch --]
[-- Type: application/octet-stream, Size: 5100 bytes --]

From c979c063a96711b8dea018e7486b05a5a192bfd2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Mon, 20 Dec 2021 18:17:23 +0100
Subject: [PATCH] Fix sloppy base64 acceptance of some multibyte characters

The base64 encoding functions incorrectly accepted some multibyte
characters; stop doing that (bug#52670).

* src/fns.c (base64_encode_1): Reject all multibyte characters.
* test/src/fns-tests.el (fns-tests-base64-encode-string)
(fns-test-base64url-encode-region)
(fns-test-base64url-encode-string): Add tests.
* doc/lispref/text.texi (Base 64): Rephrase outdated manual text.
---
 doc/lispref/text.texi |  5 ++---
 src/fns.c             |  6 +++---
 test/src/fns-tests.el | 16 +++++++++++++---
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 5ab5e5715f..9771d8a7ed 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -4793,9 +4793,8 @@ Base 64
 This function converts the region from @var{beg} to @var{end} into base
 64 code.  It returns the length of the encoded text.  An error is
 signaled if a character in the region is multibyte, i.e., in a
-multibyte buffer the region must contain only characters from the
-charsets @code{ascii}, @code{eight-bit-control} and
-@code{eight-bit-graphic}.
+multibyte buffer the region must contain only ASCII characters or raw
+bytes.
 
 Normally, this function inserts newline characters into the encoded
 text, to avoid overlong lines.  However, if the optional argument
diff --git a/src/fns.c b/src/fns.c
index 76c76c92ba..23721334f7 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -3653,7 +3653,7 @@ base64_encode_1 (const char *from, char *to, ptrdiff_t length,
 	  c = string_char_and_length ((unsigned char *) from + i, &bytes);
 	  if (CHAR_BYTE8_P (c))
 	    c = CHAR_TO_BYTE8 (c);
-	  else if (c >= 256)
+	  else if (c >= 128)
 	    return -1;
 	  i += bytes;
 	}
@@ -3696,7 +3696,7 @@ base64_encode_1 (const char *from, char *to, ptrdiff_t length,
 	  c = string_char_and_length ((unsigned char *) from + i, &bytes);
 	  if (CHAR_BYTE8_P (c))
 	    c = CHAR_TO_BYTE8 (c);
-	  else if (c >= 256)
+	  else if (c >= 128)
 	    return -1;
 	  i += bytes;
 	}
@@ -3721,7 +3721,7 @@ base64_encode_1 (const char *from, char *to, ptrdiff_t length,
 	  c = string_char_and_length ((unsigned char *) from + i, &bytes);
 	  if (CHAR_BYTE8_P (c))
 	    c = CHAR_TO_BYTE8 (c);
-	  else if (c >= 256)
+	  else if (c >= 128)
 	    return -1;
 	  i += bytes;
 	}
diff --git a/test/src/fns-tests.el b/test/src/fns-tests.el
index bec5c03f9e..63423f622f 100644
--- a/test/src/fns-tests.el
+++ b/test/src/fns-tests.el
@@ -318,7 +318,10 @@ fns-tests-base64-encode-string
   (should (equal (base64-encode-string "fooba") "Zm9vYmE="))
   (should (equal (base64-encode-string "foobar") "Zm9vYmFy"))
   (should (equal (base64-encode-string "\x14\xfb\x9c\x03\xd9\x7e") "FPucA9l+"))
-  (should (equal (base64-encode-string "\x14\xfb\x9c\x03\xd9\x7f") "FPucA9l/")))
+  (should (equal (base64-encode-string "\x14\xfb\x9c\x03\xd9\x7f") "FPucA9l/"))
+
+  (should-error (base64-encode-string "ƒ"))
+  (should-error (base64-encode-string "ü")))
 
 (ert-deftest fns-test-base64url-encode-region ()
   ;; url variant with padding
@@ -360,7 +363,11 @@ fns-test-base64url-encode-region
   (should (equal (fns-tests--with-region base64url-encode-region (fns-tests--string-repeat "\x14\xfb\x9c\x03\xd9\x7e" 10) t)
                  (fns-tests--string-repeat "FPucA9l-" 10)))
   (should (equal (fns-tests--with-region base64url-encode-region (fns-tests--string-repeat "\x14\xfb\x9c\x03\xd9\x7f" 10) t)
-                 (fns-tests--string-repeat "FPucA9l_" 10))))
+                 (fns-tests--string-repeat "FPucA9l_" 10)))
+
+  (should-error (fns-tests--with-region base64url-encode-region "ƒ"))
+  (should-error (fns-tests--with-region base64url-encode-region "ü")))
+
 
 (ert-deftest fns-test-base64url-encode-string ()
   ;; url variant with padding
@@ -394,7 +401,10 @@ fns-test-base64url-encode-string
   (should (equal (base64url-encode-string (fns-tests--string-repeat "fooba" 15) t) (fns-tests--string-repeat "Zm9vYmFmb29iYWZvb2Jh" 5)))
   (should (equal (base64url-encode-string (fns-tests--string-repeat "foobar" 15) t) (concat (fns-tests--string-repeat "Zm9vYmFyZm9vYmFy" 7) "Zm9vYmFy")))
   (should (equal (base64url-encode-string (fns-tests--string-repeat "\x14\xfb\x9c\x03\xd9\x7e" 10) t) (fns-tests--string-repeat "FPucA9l-" 10)))
-  (should (equal (base64url-encode-string (fns-tests--string-repeat "\x14\xfb\x9c\x03\xd9\x7f" 10) t) (fns-tests--string-repeat "FPucA9l_" 10))))
+  (should (equal (base64url-encode-string (fns-tests--string-repeat "\x14\xfb\x9c\x03\xd9\x7f" 10) t) (fns-tests--string-repeat "FPucA9l_" 10)))
+
+  (should-error (base64url-encode-string "ƒ"))
+  (should-error (base64url-encode-string "ü")))
 
 (ert-deftest fns-tests-base64-decode-string ()
   ;; standard variant RFC2045
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#52670: [PATCH] legacy base64 encoding of latin-1
  2021-12-20 17:26 ` bug#52670: [PATCH] " mattiase
@ 2021-12-20 19:10   ` Eli Zaretskii
  2021-12-20 19:24     ` Mattias Engdegård
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2021-12-20 19:10 UTC (permalink / raw)
  To: mattiase; +Cc: 52670

> From: mattiase@acm.org
> Date: Mon, 20 Dec 2021 18:26:15 +0100
> 
> It really looks like the erroneous behaviour was an unintended effect of commit 680d4b87f3d88a8b79f883cf3635036747588250.

I think that patch was correct at the time it was done, but it wasn't
undone/fixed when we switched to Unicode.

> Anyway, here is a patch.

Thanks, but this should at the very least be announced as an
incompatible Lisp change in NEWS.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#52670: [PATCH] legacy base64 encoding of latin-1
  2021-12-20 19:10   ` Eli Zaretskii
@ 2021-12-20 19:24     ` Mattias Engdegård
  0 siblings, 0 replies; 4+ messages in thread
From: Mattias Engdegård @ 2021-12-20 19:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 52670-done

20 dec. 2021 kl. 20:10 skrev Eli Zaretskii <eliz@gnu.org>:

> Thanks, but this should at the very least be announced as an
> incompatible Lisp change in NEWS.

Right, I added a detailed notice. Thanks for taking a look!

Pushed; closing.






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-20 19:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-19 21:47 bug#52670: legacy base64 encoding of latin-1 mattiase
2021-12-20 17:26 ` bug#52670: [PATCH] " mattiase
2021-12-20 19:10   ` Eli Zaretskii
2021-12-20 19:24     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).