unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
@ 2020-11-03  8:27 Kazuhiro Ito
  2020-11-03 15:09 ` Lars Ingebrigtsen
  2020-11-03 15:36 ` Eli Zaretskii
  0 siblings, 2 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-03  8:27 UTC (permalink / raw)
  To: 44411

When I call uudecode-decode-region-internal in multibyte buffer, it
fails to decode eight-bit characters.

The function makes string from uuencoded text by passing unsigned char
vlue (0-255) to char-to-string function, which makes multibyte-string.
After that, string is decoded as binary.  But eight-bit characters are
never made in that way.

(let ((ch #xc8))
  (decode-coding-string (char-to-string ch) 'binary))

-> "8"

Additionally, concat and char-to-string functions are called so
frequently that deocder is very slow for large data.

Please see the below patch.


diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..f9254aee75 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
 	      (setq counter (1+ counter)
 		    inputpos (1+ inputpos))
 	      (cond ((= counter 4)
-		     (setq result (cons
-				   (concat
-				    (char-to-string (ash bits -16))
-				    (char-to-string (logand (ash bits -8) 255))
-				    (char-to-string (logand bits 255)))
-				   result))
+		     (setq result (cons (logand bits 255)
+					(cons (logand (ash bits -8) 255)
+					      (cons (ash bits -16)
+						    result))))
 		     (setq bits 0 counter 0))
 		    (t (setq bits (ash bits 6)))))))
 	  (cond
@@ -166,26 +164,21 @@ uudecode-decode-region-internal
 	    ;;(error "uucode ends unexpectedly")
 	    (setq done t))
 	   ((= counter 3)
-	    (setq result (cons
-			  (concat
-			   (char-to-string (logand (ash bits -16) 255))
-			   (char-to-string (logand (ash bits -8) 255)))
-			  result)))
+	    (setq result (cons (logand (ash bits -8) 255)
+			       (cons (logand (ash bits -16) 255)
+				     result))))
 	   ((= counter 2)
-	    (setq result (cons
-			  (char-to-string (logand (ash bits -10) 255))
-			  result))))
+	    (setq result (cons (logand (ash bits -10) 255)
+			       result))))
 	  (skip-chars-forward non-data-chars end))
+	(setq result (apply #'unibyte-string (nreverse result)))
 	(if file-name
             (with-temp-file file-name
               (set-buffer-multibyte nil)
-              (insert (apply #'concat (nreverse result))))
+              (insert result))
 	  (or (markerp end) (setq end (set-marker (make-marker) end)))
 	  (goto-char start)
-	  (if enable-multibyte-characters
-	      (dolist (x (nreverse result))
-                (insert (decode-coding-string x 'binary)))
-	    (insert (apply #'concat (nreverse result))))
+          (insert result)
 	  (delete-region (point) end))))))
 
 ;;;###autoload

-- 
Kazuhiro Ito





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-03  8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
@ 2020-11-03 15:09 ` Lars Ingebrigtsen
  2020-11-03 15:36 ` Eli Zaretskii
  1 sibling, 0 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-03 15:09 UTC (permalink / raw)
  To: Kazuhiro Ito; +Cc: 44411

Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:

> The function makes string from uuencoded text by passing unsigned char
> vlue (0-255) to char-to-string function, which makes multibyte-string.
> After that, string is decoded as binary.  But eight-bit characters are
> never made in that way.
>
> (let ((ch #xc8))
>   (decode-coding-string (char-to-string ch) 'binary))
>
> -> "8"
>
> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.
>
> Please see the below patch.

Thanks; looks good to me.  This patch is slightly too large to apply
without having an FSF copyright on file, and I don't see that in the
assignment file for you.

Would you be willing to sign such paperwork so that we can get the patch
applied?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-03  8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
  2020-11-03 15:09 ` Lars Ingebrigtsen
@ 2020-11-03 15:36 ` Eli Zaretskii
  2020-11-04  8:26   ` Kazuhiro Ito
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-03 15:36 UTC (permalink / raw)
  To: Kazuhiro Ito; +Cc: 44411

> Date: Tue, 03 Nov 2020 17:27:41 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> 
> When I call uudecode-decode-region-internal in multibyte buffer, it
> fails to decode eight-bit characters.

What do you mean by "decode eight-bit characters"?  There's no such
thing in the "real world", it is entirely an Emacs invention.  How
could such "characters" appear in uuencoded email message?

Can you describe the real-life use case where this happened?

> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.

Please show only the changes to fix what you think is a bug.  let's
leave the optimization alone for a moment, so that it doesn't muddy
the waters.

Thanks.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-03 15:36 ` Eli Zaretskii
@ 2020-11-04  8:26   ` Kazuhiro Ito
  2020-11-04 15:26     ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-04  8:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 44411

> > When I call uudecode-decode-region-internal in multibyte buffer, it
> > fails to decode eight-bit characters.
> 
> What do you mean by "decode eight-bit characters"?

I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".

> How could such "characters" appear in uuencoded email message?

I did not said anything about uuencoded email message.  What do you
mean?

> Can you describe the real-life use case where this happened?

1. Yank below uuencoded string into multibyte buffer.

begin 644 c8c8c8c8.bin
$R,C(R```@
``
end
size 4

2. C-SPC at the beginning of uuencoded text and move point to the end
of uuencoded text.

3. M-x uudecode-decode-region-internal

4. decoded result is broken.  Original data is 4bytes of 0xc8, but
inserted text is "8888".  uudecode-decode-region-external returns
expected result.

> Please show only the changes to fix what you think is a bug.

Here is.

diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..81a0a8ae0d 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -151,9 +151,9 @@ uudecode-decode-region-internal
 	      (cond ((= counter 4)
 		     (setq result (cons
 				   (concat
-				    (char-to-string (ash bits -16))
-				    (char-to-string (logand (ash bits -8) 255))
-				    (char-to-string (logand bits 255)))
+				    (unibyte-string (ash bits -16))
+				    (unibyte-string (logand (ash bits -8) 255))
+				    (unibyte-string (logand bits 255)))
 				   result))
 		     (setq bits 0 counter 0))
 		    (t (setq bits (ash bits 6)))))))
@@ -168,12 +168,12 @@ uudecode-decode-region-internal
 	   ((= counter 3)
 	    (setq result (cons
 			  (concat
-			   (char-to-string (logand (ash bits -16) 255))
-			   (char-to-string (logand (ash bits -8) 255)))
+			   (unibyte-string (logand (ash bits -16) 255))
+			   (unibyte-string (logand (ash bits -8) 255)))
 			  result)))
 	   ((= counter 2)
 	    (setq result (cons
-			  (char-to-string (logand (ash bits -10) 255))
+			  (unibyte-string (logand (ash bits -10) 255))
 			  result))))
 	  (skip-chars-forward non-data-chars end))
 	(if file-name
@@ -184,7 +184,7 @@ uudecode-decode-region-internal
 	  (goto-char start)
 	  (if enable-multibyte-characters
 	      (dolist (x (nreverse result))
-                (insert (decode-coding-string x 'binary)))
+                (insert x))
 	    (insert (apply #'concat (nreverse result))))
 	  (delete-region (point) end))))))
 

-- 
Kazuhiro Ito





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-04  8:26   ` Kazuhiro Ito
@ 2020-11-04 15:26     ` Eli Zaretskii
  2020-11-05 10:48       ` Kazuhiro Ito
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-04 15:26 UTC (permalink / raw)
  To: Kazuhiro Ito; +Cc: 44411

> Date: Wed, 04 Nov 2020 17:26:56 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
> 
> > What do you mean by "decode eight-bit characters"?
> 
> I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".
> 
> > How could such "characters" appear in uuencoded email message?
> 
> I did not said anything about uuencoded email message.  What do you
> mean?

Sorry, it looks like I misunderstood the use case.  Ignore that
question.

> 1. Yank below uuencoded string into multibyte buffer.
> 
> begin 644 c8c8c8c8.bin
> $R,C(R```@
> ``
> end
> size 4
> 
> 2. C-SPC at the beginning of uuencoded text and move point to the end
> of uuencoded text.
> 
> 3. M-x uudecode-decode-region-internal
> 
> 4. decoded result is broken.  Original data is 4bytes of 0xc8, but
> inserted text is "8888".  uudecode-decode-region-external returns
> expected result.

OK, I see the problem now: it's the call to decode-coding-string,
which replaced string-as-unibyte of yore.

> > Please show only the changes to fix what you think is a bug.
> 
> Here is.

OK, I agree also to your other optimizations, but can we please go a
step further and avoid consing a string here?  'insert' is perfectly
capable of inserting characters, not only strings.  So in the
unibyte-buffer case, just something like

  (apply #'insert (nreverse result))

with 'result' being a list of bytes, will produce what you want.  And
in the multibyte-buffer case you just need to convert each byte to its
Unicode-compatible codepoint, by using

  (decode-char 'eight-bit CH)

probably in 'mapcar' or somesuch, and then call 'insert'.

Can you augment your patch along these lines, please?

Thanks.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-04 15:26     ` Eli Zaretskii
@ 2020-11-05 10:48       ` Kazuhiro Ito
  2020-11-05 13:48         ` Eli Zaretskii
  2020-11-07  9:42         ` Eli Zaretskii
  0 siblings, 2 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-05 10:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 44411

> OK, I agree also to your other optimizations, but can we please go a
> step further and avoid consing a string here?  'insert' is perfectly
> capable of inserting characters, not only strings.  So in the
> unibyte-buffer case, just something like
> 
>   (apply #'insert (nreverse result))
> 
> with 'result' being a list of bytes, will produce what you want.  And
> in the multibyte-buffer case you just need to convert each byte to its
> Unicode-compatible codepoint, by using
> 
>   (decode-char 'eight-bit CH)
> 
> probably in 'mapcar' or somesuch, and then call 'insert'.
> 
> Can you augment your patch along these lines, please?

Here is a revised one.


diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..0dce9b7b72 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
 	      (setq counter (1+ counter)
 		    inputpos (1+ inputpos))
 	      (cond ((= counter 4)
-		     (setq result (cons
-				   (concat
-				    (char-to-string (ash bits -16))
-				    (char-to-string (logand (ash bits -8) 255))
-				    (char-to-string (logand bits 255)))
-				   result))
+		     (setq result (cons (logand bits 255)
+					(cons (logand (ash bits -8) 255)
+					      (cons (ash bits -16)
+						    result))))
 		     (setq bits 0 counter 0))
 		    (t (setq bits (ash bits 6)))))))
 	  (cond
@@ -166,26 +164,26 @@ uudecode-decode-region-internal
 	    ;;(error "uucode ends unexpectedly")
 	    (setq done t))
 	   ((= counter 3)
-	    (setq result (cons
-			  (concat
-			   (char-to-string (logand (ash bits -16) 255))
-			   (char-to-string (logand (ash bits -8) 255)))
-			  result)))
+	    (setq result (cons (logand (ash bits -8) 255)
+			       (cons (logand (ash bits -16) 255)
+				     result))))
 	   ((= counter 2)
-	    (setq result (cons
-			  (char-to-string (logand (ash bits -10) 255))
-			  result))))
+	    (setq result (cons (logand (ash bits -10) 255)
+			       result))))
 	  (skip-chars-forward non-data-chars end))
 	(if file-name
             (with-temp-file file-name
               (set-buffer-multibyte nil)
-              (insert (apply #'concat (nreverse result))))
+              (apply #'insert (nreverse result)))
 	  (or (markerp end) (setq end (set-marker (make-marker) end)))
 	  (goto-char start)
-	  (if enable-multibyte-characters
-	      (dolist (x (nreverse result))
-                (insert (decode-coding-string x 'binary)))
-	    (insert (apply #'concat (nreverse result))))
+          (apply #'insert
+                 (nreverse
+                  (if enable-multibyte-characters
+                      (mapcar (lambda (ch)
+                                (or (decode-char 'eight-bit ch) ch))
+                              result)
+                    result)))
 	  (delete-region (point) end))))))
 
 ;;;###autoload



-- 
Kazuhiro Ito





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-05 10:48       ` Kazuhiro Ito
@ 2020-11-05 13:48         ` Eli Zaretskii
  2020-11-07  9:42         ` Eli Zaretskii
  1 sibling, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-05 13:48 UTC (permalink / raw)
  To: Kazuhiro Ito; +Cc: 44411

> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
> 
> Here is a revised one.

Thanks, this LGTM.  I will wait for a couple of days, and push then in
your name if no new comments are voiced.






^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-05 10:48       ` Kazuhiro Ito
  2020-11-05 13:48         ` Eli Zaretskii
@ 2020-11-07  9:42         ` Eli Zaretskii
  2020-11-09 15:47           ` Kazuhiro Ito
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-07  9:42 UTC (permalink / raw)
  To: Kazuhiro Ito; +Cc: 44411-done

> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
> 
> > Can you augment your patch along these lines, please?
> 
> Here is a revised one.

Thanks, pushed to the emacs-27 branch.

Since with this patch you have exhausted the amount of changes we can
accept from you without copyright assignment, would you like to start
the assignment process at this time?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
  2020-11-07  9:42         ` Eli Zaretskii
@ 2020-11-09 15:47           ` Kazuhiro Ito
  0 siblings, 0 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-09 15:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 44411

> > > Can you augment your patch along these lines, please?
> > 
> > Here is a revised one.
> 
> Thanks, pushed to the emacs-27 branch.

Thank you.

> Since with this patch you have exhausted the amount of changes we can
> accept from you without copyright assignment, would you like to start
> the assignment process at this time?

I've stared the process and requested the assignment form.

-- 
Kazuhiro Ito





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-09 15:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-03  8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
2020-11-03 15:09 ` Lars Ingebrigtsen
2020-11-03 15:36 ` Eli Zaretskii
2020-11-04  8:26   ` Kazuhiro Ito
2020-11-04 15:26     ` Eli Zaretskii
2020-11-05 10:48       ` Kazuhiro Ito
2020-11-05 13:48         ` Eli Zaretskii
2020-11-07  9:42         ` Eli Zaretskii
2020-11-09 15:47           ` Kazuhiro Ito

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).