From: Alexandre Duret-Lutz <adl@lrde.epita.fr>
To: 44307@debbugs.gnu.org
Cc: larsi@gnus.org
Subject: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode
Date: Mon, 04 Jan 2021 22:54:18 +0100 [thread overview]
Message-ID: <87h7nwbc6t.fsf@goulash.lrde.epita.fr> (raw)
In-Reply-To: <8735zj6q6h.fsf@goulash.lrde.epita.fr> (Alexandre Duret-Lutz's message of "Sat, 02 Jan 2021 21:26:30 +0100")
Alexandre Duret-Lutz <adl@lrde.epita.fr> writes:
> Clicking inside this message on the "Attachement: [2. text/plain]"
> button inserts "\344\344\344\344". I.e., that's
> the Latin-1 version of "ääää". (M-x describe-char on these say that they
> are "not encodable by coding system utf-8-unix")
Digging the code, I believe that the unexpected conversion occurs in this macro:
(defmacro mm-with-part (handle &rest forms)
"Run FORMS in the temp buffer containing the contents of HANDLE."
;; The handle-buffer's content is a sequence of bytes, not a sequence of
;; chars, so the buffer should be unibyte. It may happen that the
;; handle-buffer is multibyte for some reason, in which case now is a good
;; time to adjust it, since we know at this point that it should
;; be unibyte.
`(let* ((handle ,handle))
(when (and (mm-handle-buffer handle)
(buffer-name (mm-handle-buffer handle)))
(with-temp-buffer
(mm-disable-multibyte)
(insert-buffer-substring (mm-handle-buffer handle))
(mm-decode-content-transfer-encoding
(mm-handle-encoding handle)
(mm-handle-media-type handle))
,@forms))))
In my case the (mm-handle-buffer handle) is multibyte. This
multibyteness was preserved by mm-copy-to-buffer while creating the
handle buffer, but a did not check the original source of it, since the
comment above the macro suggests that having multibyte parts is OK.
However the
(mm-disable-multibyte)
(insert-buffer-substring (mm-handle-buffer handle))
seems to be doing harm. The documentation of
insert-buffer-substring/insert notes that multibyte strings will be
converted by taking the lowest 8 bits of each multibyte character, not
by spliting those characters.
Mimicking it with
(let ((utf8string "ääää")) ; typed as utf8
(with-temp-buffer
(mm-disable-multibyte)
(insert utf8string)
(print (string-bytes utf8string))
(print (string-bytes (buffer-string)))
(buffer-string)))
this prints :
8
4
"\344\344\344\344"
So it would seem that (mm-disable-multibyte) should be called *after* the
insertion and not before, in order to perserve all bytes.
Does this make sense?
--
Alexandre Duret-Lutz
next prev parent reply other threads:[~2021-01-04 21:54 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-29 14:09 bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode Thomas Schneider
2020-10-30 13:10 ` Lars Ingebrigtsen
2020-10-30 13:26 ` Andreas Schwab
2020-10-30 13:35 ` Lars Ingebrigtsen
2020-10-30 14:53 ` Andreas Schwab
2020-11-01 12:10 ` Lars Ingebrigtsen
2020-11-01 12:15 ` Andreas Schwab
2020-11-02 14:56 ` Lars Ingebrigtsen
2020-10-30 13:28 ` Andreas Schwab
2021-01-02 20:26 ` Alexandre Duret-Lutz
2021-01-04 21:54 ` Alexandre Duret-Lutz [this message]
2021-01-05 10:00 ` Alexandre Duret-Lutz
2021-01-07 13:43 ` Lars Ingebrigtsen
2021-01-05 9:30 ` Lars Ingebrigtsen
2021-01-05 10:07 ` Alexandre Duret-Lutz
2021-01-05 10:14 ` Lars Ingebrigtsen
2021-01-05 11:17 ` Alexandre Duret-Lutz
2021-01-07 14:14 ` Lars Ingebrigtsen
2021-01-07 16:06 ` Alexandre Duret-Lutz
2021-01-07 16:10 ` Lars Ingebrigtsen
2021-01-07 17:07 ` Alexandre Duret-Lutz
2021-01-10 12:27 ` Lars Ingebrigtsen
2021-01-10 14:02 ` Alexandre Duret-Lutz
2021-01-10 14:11 ` Lars Ingebrigtsen
2021-01-10 14:48 ` Alexandre Duret-Lutz
2021-01-10 15:21 ` Alexandre Duret-Lutz
2021-01-11 14:28 ` Lars Ingebrigtsen
2021-02-02 11:36 ` Alexandre Duret-Lutz
2021-02-04 8:04 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h7nwbc6t.fsf@goulash.lrde.epita.fr \
--to=adl@lrde.epita.fr \
--cc=44307@debbugs.gnu.org \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.