unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alexandre Duret-Lutz <adl@lrde.epita.fr>
To: Lars Ingebrigtsen <larsi@gnus.org>
Cc: 44307@debbugs.gnu.org
Subject: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode
Date: Thu, 07 Jan 2021 17:06:44 +0100	[thread overview]
Message-ID: <87wnwo7muj.fsf@lrde.epita.fr> (raw)
In-Reply-To: <87wnwo3kd2.fsf@gnus.org> (Lars Ingebrigtsen's message of "Thu, 07 Jan 2021 15:14:01 +0100")

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I've now committed a fix to mm-with-part that may or may not fix this
> nnmaildir problem.  

Question: shouldn't mm-with-part always leave the buffer in unibyte
mode?  The comment at the beginning of the macro seems to suggest that,
but the new "if" does not call (mm-disable-multibyte) after inserting
the part.

Otherwise that would be just pushing the issue further away, to the next
place where when the contents of mm-with-part will be inserted in a
unibyte buffer.

> Can you try this (in Emacs 28)?  You may have to do a "make bootstrap"
> or at least remove all the lisp/gnus/*.elc files for the change to
> have any effect.

After "make bootstrap", this seems to fix only the rendering of
text/html utf-8 parts (I'm using w3m, if that matters).  However
text/plain utf-8 parts are still garbled as they where before.

If I tweak the patch a follows:

--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -1271,7 +1271,9 @@ mm-with-part
             ;; multibyte buffer here, but if it's using an 8bit
             ;; Content-Transfer-Encoding, then work around that by
             ;; just ignoring the situation.
-            (insert-buffer-substring (mm-handle-buffer handle))
+            (progn
+              (insert-buffer-substring (mm-handle-buffer handle))
+              (mm-disable-multibyte))
           ;; Do the decoding.
           (mm-disable-multibyte)
           (insert-buffer-substring (mm-handle-buffer handle))

this seems to fix text/plain utf-8 parts as well, however the
rendering of window-1252 parts is now broken...

See the following table, where "with patch" refers to
commit (23a887e4), and "disable-mb" to the above tweak.

|-------------+------------+---------------+------------+------------|
| charset     | type       | without patch | with patch | disable-mb |
|-------------+------------+---------------+------------+------------|
| utf-8       | text/html  | garbled       | ok         | ok         |
| window-1252 | test/html  | ok            | ok         | garbled    |
| utf-8       | text/plain | garbled       | garbled    | ok         |
| window-1252 | test/plain | ok            | ok         | garbled    |

When looking at window-1252-encoded mails read by nnmaildir, and
rendered using "C-u g" (where none of the above changes should matter),
it's obvious that the buffer contains utf-8 characters.

My guess is that when nnmaildir calls nnheader-insert-file-contents to
reads the mail, it does so with 'undecided coding.  emacs then
automatically detect window-1252 and converts it to utf-8 for its
internal representation.
-- 
Alexandre Duret-Lutz





  reply	other threads:[~2021-01-07 16:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29 14:09 bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode Thomas Schneider
2020-10-30 13:10 ` Lars Ingebrigtsen
2020-10-30 13:26   ` Andreas Schwab
2020-10-30 13:35     ` Lars Ingebrigtsen
2020-10-30 14:53       ` Andreas Schwab
2020-11-01 12:10         ` Lars Ingebrigtsen
2020-11-01 12:15           ` Andreas Schwab
2020-11-02 14:56             ` Lars Ingebrigtsen
2020-10-30 13:28 ` Andreas Schwab
2021-01-02 20:26 ` Alexandre Duret-Lutz
2021-01-04 21:54   ` Alexandre Duret-Lutz
2021-01-05 10:00     ` Alexandre Duret-Lutz
2021-01-07 13:43       ` Lars Ingebrigtsen
2021-01-05  9:30   ` Lars Ingebrigtsen
2021-01-05 10:07     ` Alexandre Duret-Lutz
2021-01-05 10:14       ` Lars Ingebrigtsen
2021-01-05 11:17         ` Alexandre Duret-Lutz
2021-01-07 14:14       ` Lars Ingebrigtsen
2021-01-07 16:06         ` Alexandre Duret-Lutz [this message]
2021-01-07 16:10           ` Lars Ingebrigtsen
2021-01-07 17:07             ` Alexandre Duret-Lutz
2021-01-10 12:27           ` Lars Ingebrigtsen
2021-01-10 14:02             ` Alexandre Duret-Lutz
2021-01-10 14:11               ` Lars Ingebrigtsen
2021-01-10 14:48                 ` Alexandre Duret-Lutz
2021-01-10 15:21                   ` Alexandre Duret-Lutz
2021-01-11 14:28                     ` Lars Ingebrigtsen
2021-02-02 11:36                       ` Alexandre Duret-Lutz
2021-02-04  8:04                         ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnwo7muj.fsf@lrde.epita.fr \
    --to=adl@lrde.epita.fr \
    --cc=44307@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).