From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Date: Wed, 01 May 2019 20:32:22 +0300 Message-ID: <83d0l2qdw9.fsf@gnu.org> References: <44a26585-7980-378c-9262-a567ddd3e617@cs.ucla.edu> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="249390"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 35507@debbugs.gnu.org To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed May 01 19:35:54 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hLt98-0012jB-0J for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 May 2019 19:35:54 +0200 Original-Received: from localhost ([127.0.0.1]:37606 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLt96-0003rE-Kw for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 May 2019 13:35:52 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:37366) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLt6P-0002E4-LM for bug-gnu-emacs@gnu.org; Wed, 01 May 2019 13:33:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hLt6N-0008I5-Jy for bug-gnu-emacs@gnu.org; Wed, 01 May 2019 13:33:04 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59490) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hLt6N-0008Ho-G6 for bug-gnu-emacs@gnu.org; Wed, 01 May 2019 13:33:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hLt6M-0002Py-Im; Wed, 01 May 2019 13:33:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org, bugs@gnus.org Resent-Date: Wed, 01 May 2019 17:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 35507 X-GNU-PR-Package: emacs,gnus Original-Received: via spool by 35507-submit@debbugs.gnu.org id=B35507.15567319739278 (code B ref 35507); Wed, 01 May 2019 17:33:02 +0000 Original-Received: (at 35507) by debbugs.gnu.org; 1 May 2019 17:32:53 +0000 Original-Received: from localhost ([127.0.0.1]:44800 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hLt6C-0002Pa-L4 for submit@debbugs.gnu.org; Wed, 01 May 2019 13:32:52 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:51071) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hLt6A-0002PM-DB for 35507@debbugs.gnu.org; Wed, 01 May 2019 13:32:51 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:56940) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLt64-0007nC-V7; Wed, 01 May 2019 13:32:45 -0400 Original-Received: from [176.228.60.248] (port=1159 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hLt61-0008S9-6o; Wed, 01 May 2019 13:32:43 -0400 In-reply-to: <44a26585-7980-378c-9262-a567ddd3e617@cs.ucla.edu> (message from Paul Eggert on Tue, 30 Apr 2019 12:20:58 -0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:158586 Archived-At: > From: Paul Eggert > Date: Tue, 30 Apr 2019 12:20:58 -0700 > > Although Internet RFC 2046 section 4.1.2 says the default charset for > text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this > to say that registered text/* media types should require a charset > specification (or should say it's not needed because the payload has > that info, which obviously doesn't apply here). It later says that if > there is a strong reason to have a charset default, the default should > be UTF-8. (You meant RFC 6657, I believe.) That's not exactly my reading of the RFC language. First, it sounds like the text there is primarily intended for the sending MUA, not for the receiving MUA. And second, this text: In order to improve interoperability with deployed agents, "text/*" media type registrations SHOULD either a. specify that the "charset" parameter is not used for the defined subtype, because the charset information is transported inside the payload (such as in "text/xml"), or b. require explicit unconditional inclusion of the "charset" parameter, eliminating the need for a default value. In accordance with option (a) above, registrations for "text/*" media types that can transport charset information inside the corresponding payloads (such as "text/html" and "text/xml") SHOULD NOT specify the use of a "charset" parameter, nor any default value, in order to avoid conflicting interpretations should the "charset" parameter value and the value specified in the payload disagree. Thus, new subtypes of the "text" media type SHOULD NOT define a default "charset" value. If there is a strong reason to do so despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as the default. Regardless of what approach is chosen, all new "text/*" registrations MUST clearly specify how the charset is determined; relying on the default defined in Section 4.1.2 of [RFC2046] is no longer permitted. However, existing "text/*" registrations that fail to specify how the charset is determined still default to US-ASCII. seems to say that: . it is preferable, for new types of text/* media, not to have any default charset, unless there's a strong reason to the contrary . all new text/* registrations must specify how the charset is determined, and not rely on the default from RFC 2046 Is text/x-patch a "new media type" or not? If it is not new, then where is it defined? I couldn't find it on the IANA site. If it _is_ "new", my reading of the RFC is that we should not define or expect any defaults, which means this bug is squarely in Thunderbird's yard, and we shouldn't change Gnus to arbitrarily assume UTF-8 as the default. > I have filed a Thunderbird bug report for this, as Thunderbird should > specify a charset; see > . However, Gnus > should be a polite citizen and handle these attachments nicely rather > than converting the non-ASCII UTF-8 characters to mojibake. Does Gnus have a command to re-decode an already decoded MIME part? If not, it should. But other than that, I don't see why we should change Gnus in this regard, certainly not unconditionally assuming UTF-8.