unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
@ 2020-09-12  7:04 Alex Bochannek
  2020-09-12 14:13 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Bochannek @ 2020-09-12  7:04 UTC (permalink / raw)
  To: 43351

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

Hello!

This is a very small patch, but I am not confident that there aren't
other side effects, so please evaluate it carefully.

In the fix for bug#5458 (2011-06-30), a change was made to
mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
is problematic because decode-coding-string does not recognize 'ascii as
a coding system and throws an "Invalid coding system: ascii" exception.

As a result, when using gnus-article-browse-html-article (K H) to
display a text/html message that has charset=us-ascii (or presumably
also charset=ascii), the display will fail iff the header of the message
is not ASCII.

Tracing gnus-article-browse-html-parts the call chain in my test case
looks like this:

(setq hcharset (mm-find-mime-charset-region (point-min)(point-max)))
returns 'utf-8 because of the RFC 2047 encoded words in the
from-header. The HTML part has charset=us-ascii and therefore coding and
charset differ. (setq body (mm-charset-to-coding-system charset nil t))
then sets 'us-ascii to 'ascii (see above) and the attempt to transcode
the part into 'utf-8 fails at (encode-coding-string
(decode-coding-string content body) charset) That last piece of code
seems to have gone in on 2016-02-12 when removing XEmacs compat
functions from mm-util.el.

This patch no longer maps 'us-ascii and instead maps 'ascii to 'us-ascii
(The ANSI alias is untouched.) Alternatively, I could modify
gnus-article-browse-html-parts to special-case this, but I don't think
mm-charset-to-coding-system should output 'ascii if it is not a valid
coding system (anymore?) However, I don't know what else that could
possibly break, which is why I want to offer this patch with some
caution.

Please let me know if there is anything I can do to help with getting
this change accepted.

Thanks!

-- 
Alex. <abochannek@google.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Change ASCII handling in mm-charset-to-coding-system to us-ascii --]
[-- Type: text/x-patch, Size: 643 bytes --]

diff --git a/lisp/gnus/mm-util.el b/lisp/gnus/mm-util.el
index 282465722d..3dc93e4ad4 100644
--- a/lisp/gnus/mm-util.el
+++ b/lisp/gnus/mm-util.el
@@ -137,9 +137,9 @@ mm-charset-to-coding-system
 	 (let ((cs (cdr (assq charset mm-charset-override-alist))))
 	   (and cs (mm-coding-system-p cs) cs))))
    ;; ascii
-   ((or (eq charset 'us-ascii)
+   ((or (eq charset 'ascii)
 	(string-match "ansi.x3.4" (symbol-name charset)))
-    'ascii)
+    'us-ascii)
    ;; Check to see whether we can handle this charset.  (This depends
    ;; on there being some coding system matching each `mime-charset'
    ;; property defined, as there should be.)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
  2020-09-12  7:04 bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii Alex Bochannek
@ 2020-09-12 14:13 ` Lars Ingebrigtsen
  2020-09-12 14:19   ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-12 14:13 UTC (permalink / raw)
  To: Alex Bochannek; +Cc: 43351

Alex Bochannek <alex@bochannek.com> writes:

> This is a very small patch, but I am not confident that there aren't
> other side effects, so please evaluate it carefully.

The patch makes sense conceptually, but it can't be applied because
there's code (presumably) all over the place that depends on it
returning `ascii'.  For instance:

  (let ((cs (mm-charset-to-coding-system charset nil allow-override)))
    (cond ((eq cs 'ascii)
	   (setq cs (or (mm-charset-to-coding-system mail-parse-charset)
			'raw-text)))

> In the fix for bug#5458 (2011-06-30), a change was made to
> mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
> 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
> is problematic because decode-coding-string does not recognize 'ascii as
> a coding system and throws an "Invalid coding system: ascii" exception.

Indeed; ascii isn't a valid coding system...  but poking around here, I
think the function (despite its name) isn't really returning a coding
system.  I mean, it does in most cases, but not for ascii.  :-/

(The doc string of the function should be fixed.)

> As a result, when using gnus-article-browse-html-article (K H) to
> display a text/html message that has charset=us-ascii (or presumably
> also charset=ascii), the display will fail iff the header of the message
> is not ASCII.

I think this has to be changed in gnus-article-browse-html-article
instead, unfortunately.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
  2020-09-12 14:13 ` Lars Ingebrigtsen
@ 2020-09-12 14:19   ` Eli Zaretskii
  2020-09-13  1:00     ` Alex Bochannek
  2020-09-13 12:39     ` Lars Ingebrigtsen
  0 siblings, 2 replies; 6+ messages in thread
From: Eli Zaretskii @ 2020-09-12 14:19 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 43351, alex

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sat, 12 Sep 2020 16:13:43 +0200
> Cc: 43351@debbugs.gnu.org
> 
> > In the fix for bug#5458 (2011-06-30), a change was made to
> > mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
> > 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
> > is problematic because decode-coding-string does not recognize 'ascii as
> > a coding system and throws an "Invalid coding system: ascii" exception.
> 
> Indeed; ascii isn't a valid coding system...  but poking around here, I
> think the function (despite its name) isn't really returning a coding
> system.  I mean, it does in most cases, but not for ascii.  :-/

Maybe the simplest solution is to define a coding-system-alias named
'ascii'.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
  2020-09-12 14:19   ` Eli Zaretskii
@ 2020-09-13  1:00     ` Alex Bochannek
  2020-09-13 12:40       ` Lars Ingebrigtsen
  2020-09-13 12:39     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 6+ messages in thread
From: Alex Bochannek @ 2020-09-13  1:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, 43351

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Date: Sat, 12 Sep 2020 16:13:43 +0200
>> Cc: 43351@debbugs.gnu.org
>> 
>> > In the fix for bug#5458 (2011-06-30), a change was made to
>> > mm-charset-to-coding-system to support "ansi.x3.4*" as an alias for
>> > 'ascii. As part of that patch 'us-ascii was also mapped to 'ascii. This
>> > is problematic because decode-coding-string does not recognize 'ascii as
>> > a coding system and throws an "Invalid coding system: ascii" exception.
>> 
>> Indeed; ascii isn't a valid coding system...  but poking around here, I
>> think the function (despite its name) isn't really returning a coding
>> system.  I mean, it does in most cases, but not for ascii.  :-/
>
> Maybe the simplest solution is to define a coding-system-alias named
> 'ascii'.

I like this approach because it seems like it would do the least amount
of potential harm. I am happy to just put a (define-coding-system-alias
'ascii 'us-ascii) in my .gnus, but should this possibly be done in
mule-conf.el?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Add 'ascii as alias for 'us-ascii coding system --]
[-- Type: text/x-patch, Size: 425 bytes --]

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index edda79ba4e..c84f0a4990 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1508,6 +1508,7 @@ 'us-ascii
   :mime-charset 'us-ascii)
 
 (define-coding-system-alias 'iso-safe 'us-ascii)
+(define-coding-system-alias 'ascii 'us-ascii)
 
 (define-coding-system 'utf-7
   "UTF-7 encoding of Unicode (RFC 2152)."

[-- Attachment #3: Type: text/plain, Size: 44 bytes --]


Thanks!

-- 
Alex. <abochannek@google.com>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
  2020-09-12 14:19   ` Eli Zaretskii
  2020-09-13  1:00     ` Alex Bochannek
@ 2020-09-13 12:39     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-13 12:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43351, alex

Eli Zaretskii <eliz@gnu.org> writes:

> Maybe the simplest solution is to define a coding-system-alias named
> 'ascii'.

Yeah, I wondered whether the coding system was called `us-ascii', and
the charset called `ascii', to avoid some sort of confusion somewhere?

I've now made `ascii' a coding system alias, and it doesn't seem to
break anything.  I haven't documented it in the manual, because I guess
we still prefer people to use `us-ascii' as the coding system...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii
  2020-09-13  1:00     ` Alex Bochannek
@ 2020-09-13 12:40       ` Lars Ingebrigtsen
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-13 12:40 UTC (permalink / raw)
  To: Alex Bochannek; +Cc: 43351

Alex Bochannek <alex@bochannek.com> writes:

> I like this approach because it seems like it would do the least amount
> of potential harm. I am happy to just put a (define-coding-system-alias
> 'ascii 'us-ascii) in my .gnus, but should this possibly be done in
> mule-conf.el?

[...]

>  (define-coding-system-alias 'iso-safe 'us-ascii)
> +(define-coding-system-alias 'ascii 'us-ascii)

Yup; I just applied a very similar change to Emacs 28.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-13 12:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-12  7:04 bug#43351: 27.1; [PATCH] Change ASCII handling in mm-charset-to-coding-system to us-ascii Alex Bochannek
2020-09-12 14:13 ` Lars Ingebrigtsen
2020-09-12 14:19   ` Eli Zaretskii
2020-09-13  1:00     ` Alex Bochannek
2020-09-13 12:40       ` Lars Ingebrigtsen
2020-09-13 12:39     ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).