From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alex Bochannek Newsgroups: gmane.emacs.bugs Subject: bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii Date: Sat, 12 Sep 2020 00:04:15 -0700 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11570"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (darwin) To: 43351@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Sep 12 09:06:18 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kGzc1-0002tI-Rw for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 12 Sep 2020 09:06:17 +0200 Original-Received: from localhost ([::1]:34342 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kGzc0-0007kv-GR for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 12 Sep 2020 03:06:16 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60048) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kGzbm-0007km-JL for bug-gnu-emacs@gnu.org; Sat, 12 Sep 2020 03:06:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34537) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kGzbm-0000ai-AM for bug-gnu-emacs@gnu.org; Sat, 12 Sep 2020 03:06:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kGzbm-0000lS-6I for bug-gnu-emacs@gnu.org; Sat, 12 Sep 2020 03:06:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alex Bochannek Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 12 Sep 2020 07:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 43351 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.15998943032853 (code B ref -1); Sat, 12 Sep 2020 07:06:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 12 Sep 2020 07:05:03 +0000 Original-Received: from localhost ([127.0.0.1]:46083 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGzap-0000jw-GB for submit@debbugs.gnu.org; Sat, 12 Sep 2020 03:05:03 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:57220) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kGzal-0000jT-V8 for submit@debbugs.gnu.org; Sat, 12 Sep 2020 03:05:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59926) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kGzal-0007ad-RE for bug-gnu-emacs@gnu.org; Sat, 12 Sep 2020 03:04:59 -0400 Original-Received: from 50-0-39-243.dsl.static.fusionbroadband.com ([50.0.39.243]:31364 helo=mail.lapseofthought.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kGzai-0000Lr-5H for bug-gnu-emacs@gnu.org; Sat, 12 Sep 2020 03:04:58 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by mail.lapseofthought.com (Postfix) with ESMTP id 4BpNsF3NfFz3jjcS for ; Sat, 12 Sep 2020 00:04:17 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at lapseofthought.com Original-Received: from mail.lapseofthought.com ([127.0.0.1]) by localhost (mail.lapseofthought.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7qC_OXdZiKf2 for ; Sat, 12 Sep 2020 00:04:15 -0700 (PDT) Original-Received: from awb-mbp.local (unknown [IPv6:2601:646:4200:b470:95cf:1b46:a99a:8a92]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by mail.lapseofthought.com (Postfix) with ESMTPSA id 4BpNsC51C8z3jhW8 for ; Sat, 12 Sep 2020 00:04:15 -0700 (PDT) Received-SPF: softfail client-ip=50.0.39.243; envelope-from=alex@bochannek.com; helo=mail.lapseofthought.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/12 03:04:47 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: 2 X-Spam_score: 0.2 X-Spam_bar: / X-Spam_report: (0.2 / 5.0 requ) BAYES_00=-1.9, FORGED_SPF_HELO=1, KHOP_HELO_FCRDNS=0.4, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665, TVD_RCVD_IP=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:187879 Archived-At: --=-=-= Content-Type: text/plain Hello! This is a very small patch, but I am not confident that there aren't other side effects, so please evaluate it carefully. In the fix for bug#5458 (2011-06-30), a change was made to mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This is problematic because decode-coding-string does not recognize 'ascii as a coding system and throws an "Invalid coding system: ascii" exception. As a result, when using gnus-article-browse-html-article (K H) to display a text/html message that has charset=us-ascii (or presumably also charset=ascii), the display will fail iff the header of the message is not ASCII. Tracing gnus-article-browse-html-parts the call chain in my test case looks like this: (setq hcharset (mm-find-mime-charset-region (point-min)(point-max))) returns 'utf-8 because of the RFC 2047 encoded words in the from-header. The HTML part has charset=us-ascii and therefore coding and charset differ. (setq body (mm-charset-to-coding-system charset nil t)) then sets 'us-ascii to 'ascii (see above) and the attempt to transcode the part into 'utf-8 fails at (encode-coding-string (decode-coding-string content body) charset) That last piece of code seems to have gone in on 2016-02-12 when removing XEmacs compat functions from mm-util.el. This patch no longer maps 'us-ascii and instead maps 'ascii to 'us-ascii (The ANSI alias is untouched.) Alternatively, I could modify gnus-article-browse-html-parts to special-case this, but I don't think mm-charset-to-coding-system should output 'ascii if it is not a valid coding system (anymore?) However, I don't know what else that could possibly break, which is why I want to offer this patch with some caution. Please let me know if there is anything I can do to help with getting this change accepted. Thanks! -- Alex. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=mm-util.el.diff Content-Description: Change ASCII handling in mm-charset-to-coding-system to us-ascii diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el index 282465722d..3dc93e4ad4 100644 --- a/lisp/gnus/mm-util.el +++ b/lisp/gnus/mm-util.el @@ -137,9 +137,9 @@ mm-charset-to-coding-system (let ((cs (cdr (assq charset mm-charset-override-alist)))) (and cs (mm-coding-system-p cs) cs)))) ;; ascii - ((or (eq charset 'us-ascii) + ((or (eq charset 'ascii) (string-match "ansi.x3.4" (symbol-name charset))) - 'ascii) + 'us-ascii) ;; Check to see whether we can handle this charset. (This depends ;; on there being some coding system matching each `mime-charset' ;; property defined, as there should be.) --=-=-=--