From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Andy Moreton Newsgroups: gmane.emacs.bugs Subject: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Date: Wed, 01 May 2019 01:35:09 +0100 Message-ID: <865zqv3tc2.fsf@gmail.com> References: <44a26585-7980-378c-9262-a567ddd3e617@cs.ucla.edu> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="138378"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (windows-nt) To: 35507@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed May 01 02:36:15 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hLdEM-000Zp7-Cd for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 May 2019 02:36:14 +0200 Original-Received: from localhost ([127.0.0.1]:57175 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLdEL-0007pa-Ej for geb-bug-gnu-emacs@m.gmane.org; Tue, 30 Apr 2019 20:36:13 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:50863) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLdED-0007pT-Cq for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:36:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hLdEA-0002DB-Bh for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:36:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:57539) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hLdEA-0002Cv-8x for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:36:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hLdEA-0000vd-0B; Tue, 30 Apr 2019 20:36:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <44a26585-7980-378c-9262-a567ddd3e617@cs.ucla.edu> Resent-From: Andy Moreton Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bugs@gnus.org Resent-Date: Wed, 01 May 2019 00:36:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 35507 X-GNU-PR-Package: emacs,gnus X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.15566709443543 (code B ref -1); Wed, 01 May 2019 00:36:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 1 May 2019 00:35:44 +0000 Original-Received: from localhost ([127.0.0.1]:42849 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hLdDr-0000v4-IT for submit@debbugs.gnu.org; Tue, 30 Apr 2019 20:35:43 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:36400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hLdDp-0000uq-IS for submit@debbugs.gnu.org; Tue, 30 Apr 2019 20:35:42 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:44482) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hLdDk-00020s-EN for submit@debbugs.gnu.org; Tue, 30 Apr 2019 20:35:36 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:50769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLdDh-0007VN-9N for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:35:36 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hLdDb-0001qi-MC for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:35:31 -0400 Original-Received: from [195.159.176.226] (port=49722 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hLdDa-0001iU-8z for bug-gnu-emacs@gnu.org; Tue, 30 Apr 2019 20:35:26 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.89) (envelope-from ) id 1hLdDV-000YWI-83 for bug-gnu-emacs@gnu.org; Wed, 01 May 2019 02:35:21 +0200 X-Injected-Via-Gmane: http://gmane.org/ Cancel-Lock: sha1:C3bFVvbSvVnkLjvwuSTABgDQ4Xs= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:158560 Archived-At: On Tue 30 Apr 2019, Paul Eggert wrote: > The attachment has a text/* media type but it has no charset parameter. > The patch itself (output by git format-patch) says its charset is UTF-8. > Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so > mishandles the non-ASCII characters in the attachment. To reproduce the > problem, read this email with Gnus; the full attachment is attached to > this email in the Thunderbird way. > > Although Internet RFC 2046 section 4.1.2 says the default charset for > text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this > to say that registered text/* media types should require a charset > specification (or should say it's not needed because the payload has > that info, which obviously doesn't apply here). It later says that if > there is a strong reason to have a charset default, the default should > be UTF-8. > > Unfortunately Gnus apparently doesn't default to UTF-8 for such > attachments, which means that sending a text/x-patch attachment from > Thunderbird to Gnus messes up if the attachment contains non-ASCII > characters. This has been causing problems on the Emacs mailing list for > years and it bit a correspondent of mine again today; see > . > > I have filed a Thunderbird bug report for this, as Thunderbird should > specify a charset; see > . However, Gnus > should be a polite citizen and handle these attachments nicely rather > than converting the non-ASCII UTF-8 characters to mojibake. After a bit of experimenting, this minimal patch appears to fix things. Should this also allow the user to choose the charset if none is specified, or just hardwire it to utf-8 ? diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el index 3f255419e7..a99d52a7e7 100644 --- a/lisp/gnus/mm-decode.el +++ b/lisp/gnus/mm-decode.el @@ -665,6 +665,9 @@ mm-dissect-buffer (setq type (split-string (car ctl) "/")) (setq subtype (cadr type) type (car type)) + ;; Fix missing charset in Thunderbird + (unless (assq 'charset (cdr ctl)) + (push '(charset . utf-8) (cdr ctl))) (setq result (cond