From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alexandre Duret-Lutz Newsgroups: gmane.emacs.bugs Subject: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode Date: Mon, 04 Jan 2021 22:54:18 +0100 Message-ID: <87h7nwbc6t.fsf@goulash.lrde.epita.fr> References: <8735zj6q6h.fsf@goulash.lrde.epita.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17667"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: larsi@gnus.org To: 44307@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 04 22:55:16 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kwXoq-0004Ul-Ql for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 04 Jan 2021 22:55:16 +0100 Original-Received: from localhost ([::1]:39820 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kwXop-0002eF-Ev for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 04 Jan 2021 16:55:15 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38780) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kwXoc-0002e5-KS for bug-gnu-emacs@gnu.org; Mon, 04 Jan 2021 16:55:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:48918) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kwXoc-0002Gq-D2 for bug-gnu-emacs@gnu.org; Mon, 04 Jan 2021 16:55:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kwXoc-00079M-9X; Mon, 04 Jan 2021 16:55:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Alexandre Duret-Lutz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bugs@gnus.org Resent-Date: Mon, 04 Jan 2021 21:55:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44307 X-GNU-PR-Package: emacs,gnus Original-Received: via spool by 44307-submit@debbugs.gnu.org id=B44307.160979726727436 (code B ref 44307); Mon, 04 Jan 2021 21:55:02 +0000 Original-Received: (at 44307) by debbugs.gnu.org; 4 Jan 2021 21:54:27 +0000 Original-Received: from localhost ([127.0.0.1]:60464 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kwXo3-00078R-GM for submit@debbugs.gnu.org; Mon, 04 Jan 2021 16:54:27 -0500 Original-Received: from mail-wm1-f46.google.com ([209.85.128.46]:36719) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kwXo2-00078E-49 for 44307@debbugs.gnu.org; Mon, 04 Jan 2021 16:54:26 -0500 Original-Received: by mail-wm1-f46.google.com with SMTP id y23so626114wmi.1 for <44307@debbugs.gnu.org>; Mon, 04 Jan 2021 13:54:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=lbfc0Tvb9xE79wJIL+0gaVqeNvgSntD5+vVj8sSR/XE=; b=JXbe04TJ/AUNtufGaME8HGQIJfqdGUYeg8BjWk/LUOC8Q2lDZhxLR/CRAmLLuafZZX 4cPDE+V5iiGa53zzWyibAscDaGPidD9iPB2PfLCCGdlsckrfVUBkue5idn9xoEij6DSU LMPkBc3pl8S5VMJGPQ34lDaR2ZijkLDnD50ANDUg+8vVL2COqfEwbyLuHv8RyEtsRPcs k5DZvrApsPXQtmEIP1UJ0P6tRJeDL75kcbPdbhTTuv2MxMtdj7bSstOEIbiplYkDPsyd qSTapNjtl7L77ZSsAQMRUP7A29Qhw+6ygHzR7C6BXOh5AAMw8w1NGcayyp2KyNJ/Z6zE AQvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=lbfc0Tvb9xE79wJIL+0gaVqeNvgSntD5+vVj8sSR/XE=; b=Jlkby7rqhzryO8hWAjqZygOpNgHJfTEDzbVP8OytJRhlx8ju3Njp2e5ZCzpzBDSNmF Vn67/WeOJMZ5seF7dl9I3ZgLDv11L0qv9Mx1koUTZAIr3PpRfPIt9KACf14SKzKVmCaU Gwjoo0i33Dp2rQQ3YfidfccSpLjGtZ65NZWwltDeMyZL091hSZmJ3VMyeBIvK/fHBB8p pP5uLr0CC8rjLuDuApXBYLXmSoUk6WygIl6q4WHMD3BxSxFuUWnvWEqE9AM1X83lQcNN grtoYH/MG1aPbgEwPsgriNMmLJOGiqFUUmmHk9YhPb9cE4u7lQIwvyGNlBMu+6RUWywv SIEQ== X-Gm-Message-State: AOAM531ix+lKMBHjUMuAkSRftxUJx68LTnKyVE5hg6N6/b8LtftNyB1o oSfNYM1XKOLutEBXsuw2hH4= X-Google-Smtp-Source: ABdhPJxO7/WkbYUhLmyB4z6lhTo/GEjDY66AZK+UfaZZjF3+9cpdDmhnyLUZHDRiNQ2xMJXkhBI0PA== X-Received: by 2002:a1c:bc41:: with SMTP id m62mr775645wmf.46.1609797260040; Mon, 04 Jan 2021 13:54:20 -0800 (PST) Original-Received: from goulash (89-109-190-109.dsl.ovh.fr. [109.190.109.89]) by smtp.gmail.com with ESMTPSA id e15sm93477402wrx.86.2021.01.04.13.54.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Jan 2021 13:54:19 -0800 (PST) In-Reply-To: <8735zj6q6h.fsf@goulash.lrde.epita.fr> (Alexandre Duret-Lutz's message of "Sat, 02 Jan 2021 21:26:30 +0100") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:197337 Archived-At: Alexandre Duret-Lutz writes: > Clicking inside this message on the "Attachement: [2. text/plain]" > button inserts "\344\344\344\344". I.e., that's > the Latin-1 version of "=C3=A4=C3=A4=C3=A4=C3=A4". (M-x describe-char on= these say that they > are "not encodable by coding system utf-8-unix") Digging the code, I believe that the unexpected conversion occurs in this m= acro: (defmacro mm-with-part (handle &rest forms) "Run FORMS in the temp buffer containing the contents of HANDLE." ;; The handle-buffer's content is a sequence of bytes, not a sequence of ;; chars, so the buffer should be unibyte. It may happen that the ;; handle-buffer is multibyte for some reason, in which case now is a good ;; time to adjust it, since we know at this point that it should ;; be unibyte. `(let* ((handle ,handle)) (when (and (mm-handle-buffer handle) (buffer-name (mm-handle-buffer handle))) (with-temp-buffer (mm-disable-multibyte) (insert-buffer-substring (mm-handle-buffer handle)) (mm-decode-content-transfer-encoding (mm-handle-encoding handle) (mm-handle-media-type handle)) ,@forms)))) In my case the (mm-handle-buffer handle) is multibyte. This multibyteness was preserved by mm-copy-to-buffer while creating the handle buffer, but a did not check the original source of it, since the comment above the macro suggests that having multibyte parts is OK. However the=20 (mm-disable-multibyte) (insert-buffer-substring (mm-handle-buffer handle)) seems to be doing harm. The documentation of insert-buffer-substring/insert notes that multibyte strings will be converted by taking the lowest 8 bits of each multibyte character, not by spliting those characters. Mimicking it with (let ((utf8string "=C3=A4=C3=A4=C3=A4=C3=A4")) ; typed as utf8 (with-temp-buffer (mm-disable-multibyte) (insert utf8string) (print (string-bytes utf8string)) (print (string-bytes (buffer-string))) (buffer-string))) this prints : 8 4 "\344\344\344\344" So it would seem that (mm-disable-multibyte) should be called *after* the insertion and not before, in order to perserve all bytes. Does this make sense? --=20 Alexandre Duret-Lutz