* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
@ 2020-11-03 8:27 Kazuhiro Ito
2020-11-03 15:09 ` Lars Ingebrigtsen
2020-11-03 15:36 ` Eli Zaretskii
0 siblings, 2 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-03 8:27 UTC (permalink / raw)
To: 44411
When I call uudecode-decode-region-internal in multibyte buffer, it
fails to decode eight-bit characters.
The function makes string from uuencoded text by passing unsigned char
vlue (0-255) to char-to-string function, which makes multibyte-string.
After that, string is decoded as binary. But eight-bit characters are
never made in that way.
(let ((ch #xc8))
(decode-coding-string (char-to-string ch) 'binary))
-> "8"
Additionally, concat and char-to-string functions are called so
frequently that deocder is very slow for large data.
Please see the below patch.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..f9254aee75 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
(setq counter (1+ counter)
inputpos (1+ inputpos))
(cond ((= counter 4)
- (setq result (cons
- (concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
- result))
+ (setq result (cons (logand bits 255)
+ (cons (logand (ash bits -8) 255)
+ (cons (ash bits -16)
+ result))))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
(cond
@@ -166,26 +164,21 @@ uudecode-decode-region-internal
;;(error "uucode ends unexpectedly")
(setq done t))
((= counter 3)
- (setq result (cons
- (concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
- result)))
+ (setq result (cons (logand (ash bits -8) 255)
+ (cons (logand (ash bits -16) 255)
+ result))))
((= counter 2)
- (setq result (cons
- (char-to-string (logand (ash bits -10) 255))
- result))))
+ (setq result (cons (logand (ash bits -10) 255)
+ result))))
(skip-chars-forward non-data-chars end))
+ (setq result (apply #'unibyte-string (nreverse result)))
(if file-name
(with-temp-file file-name
(set-buffer-multibyte nil)
- (insert (apply #'concat (nreverse result))))
+ (insert result))
(or (markerp end) (setq end (set-marker (make-marker) end)))
(goto-char start)
- (if enable-multibyte-characters
- (dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
- (insert (apply #'concat (nreverse result))))
+ (insert result)
(delete-region (point) end))))))
;;;###autoload
--
Kazuhiro Ito
^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-03 8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
@ 2020-11-03 15:09 ` Lars Ingebrigtsen
2020-11-03 15:36 ` Eli Zaretskii
1 sibling, 0 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-11-03 15:09 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 44411
Kazuhiro Ito <kzhr@d1.dion.ne.jp> writes:
> The function makes string from uuencoded text by passing unsigned char
> vlue (0-255) to char-to-string function, which makes multibyte-string.
> After that, string is decoded as binary. But eight-bit characters are
> never made in that way.
>
> (let ((ch #xc8))
> (decode-coding-string (char-to-string ch) 'binary))
>
> -> "8"
>
> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.
>
> Please see the below patch.
Thanks; looks good to me. This patch is slightly too large to apply
without having an FSF copyright on file, and I don't see that in the
assignment file for you.
Would you be willing to sign such paperwork so that we can get the patch
applied?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-03 8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
2020-11-03 15:09 ` Lars Ingebrigtsen
@ 2020-11-03 15:36 ` Eli Zaretskii
2020-11-04 8:26 ` Kazuhiro Ito
1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-03 15:36 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 44411
> Date: Tue, 03 Nov 2020 17:27:41 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
>
> When I call uudecode-decode-region-internal in multibyte buffer, it
> fails to decode eight-bit characters.
What do you mean by "decode eight-bit characters"? There's no such
thing in the "real world", it is entirely an Emacs invention. How
could such "characters" appear in uuencoded email message?
Can you describe the real-life use case where this happened?
> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.
Please show only the changes to fix what you think is a bug. let's
leave the optimization alone for a moment, so that it doesn't muddy
the waters.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-03 15:36 ` Eli Zaretskii
@ 2020-11-04 8:26 ` Kazuhiro Ito
2020-11-04 15:26 ` Eli Zaretskii
0 siblings, 1 reply; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-04 8:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 44411
> > When I call uudecode-decode-region-internal in multibyte buffer, it
> > fails to decode eight-bit characters.
>
> What do you mean by "decode eight-bit characters"?
I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".
> How could such "characters" appear in uuencoded email message?
I did not said anything about uuencoded email message. What do you
mean?
> Can you describe the real-life use case where this happened?
1. Yank below uuencoded string into multibyte buffer.
begin 644 c8c8c8c8.bin
$R,C(R```@
``
end
size 4
2. C-SPC at the beginning of uuencoded text and move point to the end
of uuencoded text.
3. M-x uudecode-decode-region-internal
4. decoded result is broken. Original data is 4bytes of 0xc8, but
inserted text is "8888". uudecode-decode-region-external returns
expected result.
> Please show only the changes to fix what you think is a bug.
Here is.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..81a0a8ae0d 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -151,9 +151,9 @@ uudecode-decode-region-internal
(cond ((= counter 4)
(setq result (cons
(concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
+ (unibyte-string (ash bits -16))
+ (unibyte-string (logand (ash bits -8) 255))
+ (unibyte-string (logand bits 255)))
result))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
@@ -168,12 +168,12 @@ uudecode-decode-region-internal
((= counter 3)
(setq result (cons
(concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
+ (unibyte-string (logand (ash bits -16) 255))
+ (unibyte-string (logand (ash bits -8) 255)))
result)))
((= counter 2)
(setq result (cons
- (char-to-string (logand (ash bits -10) 255))
+ (unibyte-string (logand (ash bits -10) 255))
result))))
(skip-chars-forward non-data-chars end))
(if file-name
@@ -184,7 +184,7 @@ uudecode-decode-region-internal
(goto-char start)
(if enable-multibyte-characters
(dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
+ (insert x))
(insert (apply #'concat (nreverse result))))
(delete-region (point) end))))))
--
Kazuhiro Ito
^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-04 8:26 ` Kazuhiro Ito
@ 2020-11-04 15:26 ` Eli Zaretskii
2020-11-05 10:48 ` Kazuhiro Ito
0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-04 15:26 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 44411
> Date: Wed, 04 Nov 2020 17:26:56 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
>
> > What do you mean by "decode eight-bit characters"?
>
> I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".
>
> > How could such "characters" appear in uuencoded email message?
>
> I did not said anything about uuencoded email message. What do you
> mean?
Sorry, it looks like I misunderstood the use case. Ignore that
question.
> 1. Yank below uuencoded string into multibyte buffer.
>
> begin 644 c8c8c8c8.bin
> $R,C(R```@
> ``
> end
> size 4
>
> 2. C-SPC at the beginning of uuencoded text and move point to the end
> of uuencoded text.
>
> 3. M-x uudecode-decode-region-internal
>
> 4. decoded result is broken. Original data is 4bytes of 0xc8, but
> inserted text is "8888". uudecode-decode-region-external returns
> expected result.
OK, I see the problem now: it's the call to decode-coding-string,
which replaced string-as-unibyte of yore.
> > Please show only the changes to fix what you think is a bug.
>
> Here is.
OK, I agree also to your other optimizations, but can we please go a
step further and avoid consing a string here? 'insert' is perfectly
capable of inserting characters, not only strings. So in the
unibyte-buffer case, just something like
(apply #'insert (nreverse result))
with 'result' being a list of bytes, will produce what you want. And
in the multibyte-buffer case you just need to convert each byte to its
Unicode-compatible codepoint, by using
(decode-char 'eight-bit CH)
probably in 'mapcar' or somesuch, and then call 'insert'.
Can you augment your patch along these lines, please?
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-04 15:26 ` Eli Zaretskii
@ 2020-11-05 10:48 ` Kazuhiro Ito
2020-11-05 13:48 ` Eli Zaretskii
2020-11-07 9:42 ` Eli Zaretskii
0 siblings, 2 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-05 10:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 44411
> OK, I agree also to your other optimizations, but can we please go a
> step further and avoid consing a string here? 'insert' is perfectly
> capable of inserting characters, not only strings. So in the
> unibyte-buffer case, just something like
>
> (apply #'insert (nreverse result))
>
> with 'result' being a list of bytes, will produce what you want. And
> in the multibyte-buffer case you just need to convert each byte to its
> Unicode-compatible codepoint, by using
>
> (decode-char 'eight-bit CH)
>
> probably in 'mapcar' or somesuch, and then call 'insert'.
>
> Can you augment your patch along these lines, please?
Here is a revised one.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..0dce9b7b72 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
(setq counter (1+ counter)
inputpos (1+ inputpos))
(cond ((= counter 4)
- (setq result (cons
- (concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
- result))
+ (setq result (cons (logand bits 255)
+ (cons (logand (ash bits -8) 255)
+ (cons (ash bits -16)
+ result))))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
(cond
@@ -166,26 +164,26 @@ uudecode-decode-region-internal
;;(error "uucode ends unexpectedly")
(setq done t))
((= counter 3)
- (setq result (cons
- (concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
- result)))
+ (setq result (cons (logand (ash bits -8) 255)
+ (cons (logand (ash bits -16) 255)
+ result))))
((= counter 2)
- (setq result (cons
- (char-to-string (logand (ash bits -10) 255))
- result))))
+ (setq result (cons (logand (ash bits -10) 255)
+ result))))
(skip-chars-forward non-data-chars end))
(if file-name
(with-temp-file file-name
(set-buffer-multibyte nil)
- (insert (apply #'concat (nreverse result))))
+ (apply #'insert (nreverse result)))
(or (markerp end) (setq end (set-marker (make-marker) end)))
(goto-char start)
- (if enable-multibyte-characters
- (dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
- (insert (apply #'concat (nreverse result))))
+ (apply #'insert
+ (nreverse
+ (if enable-multibyte-characters
+ (mapcar (lambda (ch)
+ (or (decode-char 'eight-bit ch) ch))
+ result)
+ result)))
(delete-region (point) end))))))
;;;###autoload
--
Kazuhiro Ito
^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-05 10:48 ` Kazuhiro Ito
@ 2020-11-05 13:48 ` Eli Zaretskii
2020-11-07 9:42 ` Eli Zaretskii
1 sibling, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-05 13:48 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 44411
> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
>
> Here is a revised one.
Thanks, this LGTM. I will wait for a couple of days, and push then in
your name if no new comments are voiced.
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-05 10:48 ` Kazuhiro Ito
2020-11-05 13:48 ` Eli Zaretskii
@ 2020-11-07 9:42 ` Eli Zaretskii
2020-11-09 15:47 ` Kazuhiro Ito
1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-11-07 9:42 UTC (permalink / raw)
To: Kazuhiro Ito; +Cc: 44411-done
> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr@d1.dion.ne.jp>
> Cc: 44411@debbugs.gnu.org
>
> > Can you augment your patch along these lines, please?
>
> Here is a revised one.
Thanks, pushed to the emacs-27 branch.
Since with this patch you have exhausted the amount of changes we can
accept from you without copyright assignment, would you like to start
the assignment process at this time?
^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#44411: 28.0.50; uudecode-decode-region-internal is broken
2020-11-07 9:42 ` Eli Zaretskii
@ 2020-11-09 15:47 ` Kazuhiro Ito
0 siblings, 0 replies; 9+ messages in thread
From: Kazuhiro Ito @ 2020-11-09 15:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 44411
> > > Can you augment your patch along these lines, please?
> >
> > Here is a revised one.
>
> Thanks, pushed to the emacs-27 branch.
Thank you.
> Since with this patch you have exhausted the amount of changes we can
> accept from you without copyright assignment, would you like to start
> the assignment process at this time?
I've stared the process and requested the assignment form.
--
Kazuhiro Ito
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-09 15:47 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-03 8:27 bug#44411: 28.0.50; uudecode-decode-region-internal is broken Kazuhiro Ito
2020-11-03 15:09 ` Lars Ingebrigtsen
2020-11-03 15:36 ` Eli Zaretskii
2020-11-04 8:26 ` Kazuhiro Ito
2020-11-04 15:26 ` Eli Zaretskii
2020-11-05 10:48 ` Kazuhiro Ito
2020-11-05 13:48 ` Eli Zaretskii
2020-11-07 9:42 ` Eli Zaretskii
2020-11-09 15:47 ` Kazuhiro Ito
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).