unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
@ 2019-04-30 19:20 Paul Eggert
  2019-05-01  0:35 ` Andy Moreton
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Paul Eggert @ 2019-04-30 19:20 UTC (permalink / raw)
  To: 35507

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

Package: emacs,gnus
Version: 27

When I send email from Thunderbird with a patch attachment, Thunderbird
puts something like the following into the email:

  --------------AA6C74B60F40E0D600CCD03A
  Content-Type: text/x-patch;
   name="0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch"
  Content-Transfer-Encoding: 8bit
  Content-Disposition: attachment;
   filename*0="0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch"

  From 325f51c84d9ad4d9776784bd324b347ffe4fe51b Mon Sep 17 00:00:00 2001
  From: Paul Eggert <eggert@cs.ucla.edu>
  Date: Tue, 30 Apr 2019 10:45:48 -0700
  Subject: [PATCH] Fix decode-time/encode-time roundtrip on macOS
  MIME-Version: 1.0
  Content-Type: text/plain; charset=UTF-8
  Content-Transfer-Encoding: 8bit

  * src/timefns.c (Fencode_time): Ignore DST flag when the zone is
  ...

The attachment has a text/* media type but it has no charset parameter.
The patch itself (output by git format-patch) says its charset is UTF-8.
Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
mishandles the non-ASCII characters in the attachment. To reproduce the
problem, read this email with Gnus; the full attachment is attached to
this email in the Thunderbird way.

Although Internet RFC 2046 section 4.1.2 says the default charset for
text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
to say that registered text/* media types should require a charset
specification (or should say it's not needed because the payload has
that info, which obviously doesn't apply here). It later says that if
there is a strong reason to have a charset default, the default should
be UTF-8.

Unfortunately Gnus apparently doesn't default to UTF-8 for such
attachments, which means that sending a text/x-patch attachment from
Thunderbird to Gnus messes up if the attachment contains non-ASCII
characters. This has been causing problems on the Emacs mailing list for
years and it bit a correspondent of mine again today; see
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.

I have filed a Thunderbird bug report for this, as Thunderbird should
specify a charset; see
<https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
should be a polite citizen and handle these attachments nicely rather
than converting the non-ASCII UTF-8 characters to mojibake.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch --]
[-- Type: text/x-patch; name="0001-Fix-decode-time-encode-time-roundtrip-on-macOS.patch", Size: 2058 bytes --]

From 325f51c84d9ad4d9776784bd324b347ffe4fe51b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 30 Apr 2019 10:45:48 -0700
Subject: [PATCH] Fix decode-time/encode-time roundtrip on macOS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/timefns.c (Fencode_time): Ignore DST flag when the zone is
numeric or is a cons, as the doc string says it’s ignored in that
case, and not ignoring it causes encode-time to not invert
decode-time on some platforms (Bug#35502).
* test/src/timefns-tests.el (encode-time-dst-numeric-zone):
New test.
---
 src/timefns.c             | 5 +++--
 test/src/timefns-tests.el | 6 ++++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/timefns.c b/src/timefns.c
index 5005c73b7f..7b5af6a5d2 100644
--- a/src/timefns.c
+++ b/src/timefns.c
@@ -1488,10 +1488,11 @@ usage: (encode-time &optional TIME FORM &rest OBSOLESCENT-ARGUMENTS)  */)
       tm.tm_mon  = check_tm_member (XCAR (a), 1); a = XCDR (a);
       tm.tm_year = check_tm_member (XCAR (a), TM_YEAR_BASE); a = XCDR (a);
       a = XCDR (a);
-      if (SYMBOLP (XCAR (a)))
-	tm.tm_isdst = !NILP (XCAR (a));
+      Lisp_Object dstflag = XCAR (a);
       a = XCDR (a);
       zone = XCAR (a);
+      if (SYMBOLP (dstflag) && !FIXNUMP (zone) && !CONSP (zone))
+	tm.tm_isdst = !NILP (dstflag);
     }
   else if (nargs < 6)
     xsignal2 (Qwrong_number_of_arguments, Qencode_time, make_fixnum (nargs));
diff --git a/test/src/timefns-tests.el b/test/src/timefns-tests.el
index 5c858ef3bd..2c90af757f 100644
--- a/test/src/timefns-tests.el
+++ b/test/src/timefns-tests.el
@@ -142,3 +142,9 @@ timefns-tests--have-leap-seconds
 		      (< 0.99 (/ x y) 1.01)
 		      (< 0.99 (/ (- (float-time a)) (float-time b))
 			 1.01))))))))
+
+(ert-deftest encode-time-dst-numeric-zone ()
+    "Check for Bug#35502."
+    (should (time-equal-p
+             (encode-time '(29 31 17 30 4 2019 2 t 7200))
+             '(23752 27217))))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-04-30 19:20 bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Paul Eggert
@ 2019-05-01  0:35 ` Andy Moreton
  2019-05-01 15:22   ` Robert Pluim
  2019-05-01 17:33   ` Eli Zaretskii
  2019-05-01 17:32 ` Eli Zaretskii
  2019-05-02 23:24 ` Paul Eggert
  2 siblings, 2 replies; 36+ messages in thread
From: Andy Moreton @ 2019-05-01  0:35 UTC (permalink / raw)
  To: 35507

On Tue 30 Apr 2019, Paul Eggert wrote:

> The attachment has a text/* media type but it has no charset parameter.
> The patch itself (output by git format-patch) says its charset is UTF-8.
> Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
> mishandles the non-ASCII characters in the attachment. To reproduce the
> problem, read this email with Gnus; the full attachment is attached to
> this email in the Thunderbird way.
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
>
> Unfortunately Gnus apparently doesn't default to UTF-8 for such
> attachments, which means that sending a text/x-patch attachment from
> Thunderbird to Gnus messes up if the attachment contains non-ASCII
> characters. This has been causing problems on the Emacs mailing list for
> years and it bit a correspondent of mine again today; see
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.
>
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

After a bit of experimenting, this minimal patch appears to fix things.
Should this also allow the user to choose the charset if none is
specified, or just hardwire it to utf-8 ?

diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
index 3f255419e7..a99d52a7e7 100644
--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -665,6 +665,9 @@ mm-dissect-buffer
 	(setq type (split-string (car ctl) "/"))
 	(setq subtype (cadr type)
 	      type (car type))
+        ;; Fix missing charset in Thunderbird
+        (unless (assq 'charset (cdr ctl))
+          (push '(charset . utf-8) (cdr ctl)))
 	(setq
 	 result
 	 (cond






^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01  0:35 ` Andy Moreton
@ 2019-05-01 15:22   ` Robert Pluim
  2019-05-01 15:45     ` Andy Moreton
  2019-05-01 17:34     ` Eli Zaretskii
  2019-05-01 17:33   ` Eli Zaretskii
  1 sibling, 2 replies; 36+ messages in thread
From: Robert Pluim @ 2019-05-01 15:22 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton@gmail.com> said:
    Andy> After a bit of experimenting, this minimal patch appears to
    Andy> fix things.  Should this also allow the user to choose the
    Andy> charset if none is specified, or just hardwire it to utf-8 ?

I think utf-8 is a good fallback if the message doesnʼt specify a
charset. Itʼs not going to produce any worse effects than what we have
now.

Robert





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 15:22   ` Robert Pluim
@ 2019-05-01 15:45     ` Andy Moreton
  2019-05-01 16:42       ` Andy Moreton
  2019-05-01 17:34     ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Andy Moreton @ 2019-05-01 15:45 UTC (permalink / raw)
  To: 35507

On Wed 01 May 2019, Robert Pluim wrote:

>>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton@gmail.com> said:
>     Andy> After a bit of experimenting, this minimal patch appears to
>     Andy> fix things.  Should this also allow the user to choose the
>     Andy> charset if none is specified, or just hardwire it to utf-8 ?
>
> I think utf-8 is a good fallback if the message doesnʼt specify a
> charset. Itʼs not going to produce any worse effects than what we have
> now.

Looking at this a bit more, the " *mm*" temp buffers produced when
decoding the MIME parts all seems to have the right coding, so my
previous patch looks wrong.

The problem may be in `mm-display-inline-fontify' when it tries to
choose a charset or coding system to display the MIME part inline.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 15:45     ` Andy Moreton
@ 2019-05-01 16:42       ` Andy Moreton
  2019-05-01 17:36         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Moreton @ 2019-05-01 16:42 UTC (permalink / raw)
  To: 35507

On Wed 01 May 2019, Andy Moreton wrote:

> On Wed 01 May 2019, Robert Pluim wrote:
>
>>>>>>> On Wed, 01 May 2019 01:35:09 +0100, Andy Moreton <andrewjmoreton@gmail.com> said:
>>     Andy> After a bit of experimenting, this minimal patch appears to
>>     Andy> fix things.  Should this also allow the user to choose the
>>     Andy> charset if none is specified, or just hardwire it to utf-8 ?
>>
>> I think utf-8 is a good fallback if the message doesnʼt specify a
>> charset. Itʼs not going to produce any worse effects than what we have
>> now.
>
> Looking at this a bit more, the " *mm*" temp buffers produced when
> decoding the MIME parts all seems to have the right coding, so my
> previous patch looks wrong.
>
> The problem may be in `mm-display-inline-fontify' when it tries to
> choose a charset or coding system to display the MIME part inline.

This patch only affects display, so should be safer:

diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..173ebfab48 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -475,7 +475,7 @@ mm-display-inline-fontify
 		    (charset
 		     (mm-decode-string text charset))
 		    (t
-		     text)))
+		     (mm-decode-string text 'utf-8))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling






^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-04-30 19:20 bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Paul Eggert
  2019-05-01  0:35 ` Andy Moreton
@ 2019-05-01 17:32 ` Eli Zaretskii
  2019-05-01 18:26   ` Paul Eggert
  2019-05-02 23:24 ` Paul Eggert
  2 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 17:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 35507

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 30 Apr 2019 12:20:58 -0700
> 
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.

(You meant RFC 6657, I believe.)

That's not exactly my reading of the RFC language.  First, it sounds
like the text there is primarily intended for the sending MUA, not for
the receiving MUA.  And second, this text:

     In order to improve interoperability with deployed agents, "text/*"
     media type registrations SHOULD either

     a.  specify that the "charset" parameter is not used for the defined
	 subtype, because the charset information is transported inside
	 the payload (such as in "text/xml"), or

     b.  require explicit unconditional inclusion of the "charset"
	 parameter, eliminating the need for a default value.

     In accordance with option (a) above, registrations for "text/*" media
     types that can transport charset information inside the corresponding
     payloads (such as "text/html" and "text/xml") SHOULD NOT specify the
     use of a "charset" parameter, nor any default value, in order to
     avoid conflicting interpretations should the "charset" parameter
     value and the value specified in the payload disagree.

     Thus, new subtypes of the "text" media type SHOULD NOT define a
     default "charset" value.  If there is a strong reason to do so
     despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
     the default.

     Regardless of what approach is chosen, all new "text/*" registrations
     MUST clearly specify how the charset is determined; relying on the
     default defined in Section 4.1.2 of [RFC2046] is no longer permitted.
     However, existing "text/*" registrations that fail to specify how the
     charset is determined still default to US-ASCII.

seems to say that:

  . it is preferable, for new types of text/* media, not to have any
    default charset, unless there's a strong reason to the contrary

  . all new text/* registrations must specify how the charset is
    determined, and not rely on the default from RFC 2046

Is text/x-patch a "new media type" or not?  If it is not new, then
where is it defined?  I couldn't find it on the IANA site.

If it _is_ "new", my reading of the RFC is that we should not define
or expect any defaults, which means this bug is squarely in
Thunderbird's yard, and we shouldn't change Gnus to arbitrarily assume
UTF-8 as the default.

> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

Does Gnus have a command to re-decode an already decoded MIME part?
If not, it should.  But other than that, I don't see why we should
change Gnus in this regard, certainly not unconditionally assuming
UTF-8.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01  0:35 ` Andy Moreton
  2019-05-01 15:22   ` Robert Pluim
@ 2019-05-01 17:33   ` Eli Zaretskii
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 17:33 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Wed, 01 May 2019 01:35:09 +0100
> 
> After a bit of experimenting, this minimal patch appears to fix things.
> Should this also allow the user to choose the charset if none is
> specified, or just hardwire it to utf-8 ?
> 
> diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
> index 3f255419e7..a99d52a7e7 100644
> --- a/lisp/gnus/mm-decode.el
> +++ b/lisp/gnus/mm-decode.el
> @@ -665,6 +665,9 @@ mm-dissect-buffer
>  	(setq type (split-string (car ctl) "/"))
>  	(setq subtype (cadr type)
>  	      type (car type))
> +        ;; Fix missing charset in Thunderbird
> +        (unless (assq 'charset (cdr ctl))
> +          (push '(charset . utf-8) (cdr ctl)))

Please don't unconditionally force UTF-8 on users.  At the very least
this should be a user option, if at all.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 15:22   ` Robert Pluim
  2019-05-01 15:45     ` Andy Moreton
@ 2019-05-01 17:34     ` Eli Zaretskii
  2019-05-02  6:35       ` Robert Pluim
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 17:34 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 35507, andrewjmoreton

> From: Robert Pluim <rpluim@gmail.com>
> Date: Wed, 01 May 2019 17:22:19 +0200
> Cc: 35507@debbugs.gnu.org
> 
> I think utf-8 is a good fallback if the message doesnʼt specify a
> charset. Itʼs not going to produce any worse effects than what we have
> now.

What considerations led you to that conclusion?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 16:42       ` Andy Moreton
@ 2019-05-01 17:36         ` Eli Zaretskii
  2019-05-01 23:54           ` Andy Moreton
  2019-05-02  3:07           ` Noam Postavsky
  0 siblings, 2 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 17:36 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Wed, 01 May 2019 17:42:18 +0100
> 
> +		     (mm-decode-string text 'utf-8))))

As I said, I'm not sure we should do this, let alone unconditionally
force UTF-8 here, but if we must, why not use decode-coding-string?
Do we really need the mm-* stuff?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 17:32 ` Eli Zaretskii
@ 2019-05-01 18:26   ` Paul Eggert
  2019-05-01 19:05     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2019-05-01 18:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507

On 5/1/19 10:32 AM, Eli Zaretskii wrote:
> Is text/x-patch a "new media type" or not? 

It's not a registered media type so strictly speaking the RFCs' SHOULD
statements do not apply (and they are SHOULDs not MUSTs so they could be
disregarded for good reason). That being said, the ordinary and usual
intent is for the x- media types to follow these recommendations and my
bug report was filed under that assumption.

> my reading of the RFC is that we should not define
> or expect any defaults, which means this bug is squarely in
> Thunderbird's yard

Ah, sorry, I see that my bug report misstated a point. This particular
patch clearly identifies its own encoding because its header says
"Content-Type: text/plain; charset=UTF-8". (I think Git-generated
patches always specify an encoding unless it's ASCII.) So in this
particular case the RFC's recommendation seems to be respected by the
sender.

Gnus could look for a Content-Type: header in text bodies that do not
specify charsets; this would follow the Internet's robustness principle
better.

> I don't see why we should
> change Gnus in this regard, certainly not unconditionally assuming
> UTF-8.
Gnus is mishandling emails sent from Thunderbird right now, so it would
be a practical benefit for Gnus users if it did a better job of decoding
these admittedly-iffy messages.

These days, UTF-8 is by far the most common encoding specified for
non-ASCII text in email and its popularity is growing, so it's the best
choice for a default if Gnus will have one - certainly better than the
confusing behavior that Robert Pluim observed in his Gnus session.
Gnus's current behavior may have been a good idea in 1996 when RFC 2046
said US-ASCII was the default, but it stopped being a good idea in 2012
when RFC 6657 came out and said that UTF-8 should be the default if
there is a default.

Another possibility is that Gnus could ask the user which encoding to
use when the email headers don't specify one and when the text is not
ASCII; even that would be better than Gnus's current behavior of forcing
US-ASCII and displaying something like "\xe2\x80\x99" when it encounters
a non-ASCII character.






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 18:26   ` Paul Eggert
@ 2019-05-01 19:05     ` Eli Zaretskii
  2019-05-01 19:47       ` Andreas Schwab
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 19:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 35507

> Cc: 35507@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 1 May 2019 11:26:35 -0700
> 
> > I don't see why we should
> > change Gnus in this regard, certainly not unconditionally assuming
> > UTF-8.
> Gnus is mishandling emails sent from Thunderbird right now, so it would
> be a practical benefit for Gnus users if it did a better job of decoding
> these admittedly-iffy messages.
> 
> These days, UTF-8 is by far the most common encoding specified for
> non-ASCII text in email and its popularity is growing, so it's the best
> choice for a default if Gnus will have one - certainly better than the
> confusing behavior that Robert Pluim observed in his Gnus session.
> Gnus's current behavior may have been a good idea in 1996 when RFC 2046
> said US-ASCII was the default, but it stopped being a good idea in 2012
> when RFC 6657 came out and said that UTF-8 should be the default if
> there is a default.
> 
> Another possibility is that Gnus could ask the user which encoding to
> use when the email headers don't specify one and when the text is not
> ASCII; even that would be better than Gnus's current behavior of forcing
> US-ASCII and displaying something like "\xe2\x80\x99" when it encounters
> a non-ASCII character.

I'm okay with having a default that's customizable.  I also think Gnus
should have a feature that allows the user to request "re-decoding" of
a message part, because no matter how smart are we and our defaults,
they will sometimes fail.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 19:05     ` Eli Zaretskii
@ 2019-05-01 19:47       ` Andreas Schwab
  2019-05-01 19:57         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Andreas Schwab @ 2019-05-01 19:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, Paul Eggert

On Mai 01 2019, Eli Zaretskii <eliz@gnu.org> wrote:

> I'm okay with having a default that's customizable.  I also think Gnus
> should have a feature that allows the user to request "re-decoding" of
> a message part, because no matter how smart are we and our defaults,
> they will sometimes fail.

That already exists (K C runs the command
gnus-article-view-part-as-charset).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 19:47       ` Andreas Schwab
@ 2019-05-01 19:57         ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-01 19:57 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 35507, eggert

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  35507@debbugs.gnu.org
> Date: Wed, 01 May 2019 21:47:58 +0200
> 
> On Mai 01 2019, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > I'm okay with having a default that's customizable.  I also think Gnus
> > should have a feature that allows the user to request "re-decoding" of
> > a message part, because no matter how smart are we and our defaults,
> > they will sometimes fail.
> 
> That already exists (K C runs the command
> gnus-article-view-part-as-charset).

Great, thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 17:36         ` Eli Zaretskii
@ 2019-05-01 23:54           ` Andy Moreton
  2019-05-02  3:07           ` Noam Postavsky
  1 sibling, 0 replies; 36+ messages in thread
From: Andy Moreton @ 2019-05-01 23:54 UTC (permalink / raw)
  To: 35507

On Wed 01 May 2019, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Wed, 01 May 2019 17:42:18 +0100
>> 
>> +		     (mm-decode-string text 'utf-8))))
>
> As I said, I'm not sure we should do this, let alone unconditionally
> force UTF-8 here, but if we must, why not use decode-coding-string?
> Do we really need the mm-* stuff?

No idea - I am not at all expert in coding systems or the internals of
Gnus.

This was the simplest patch that appeared to work for producing the
right display, without changing the decode into the " *mm*" prefixed
temp buffers created by the MIME machinery for each part.

If you think `decode-coding-string' is a better patch, feel free to test
and commit that instead.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 17:36         ` Eli Zaretskii
  2019-05-01 23:54           ` Andy Moreton
@ 2019-05-02  3:07           ` Noam Postavsky
  2019-05-02  7:17             ` Andy Moreton
  1 sibling, 1 reply; 36+ messages in thread
From: Noam Postavsky @ 2019-05-02  3:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, Andy Moreton

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Wed, 01 May 2019 17:42:18 +0100
>> 
>> +		     (mm-decode-string text 'utf-8))))
>
> As I said, I'm not sure we should do this, let alone unconditionally
> force UTF-8 here, but if we must, why not use decode-coding-string?
> Do we really need the mm-* stuff?

As far as I can tell, the mm-* version is useful for handling stuff lke
"UTF-8" as the charset argument (which might be useful if we extract it
from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
'utf-8, then it's just the same as calling decode-coding-string.

For a default if we don't find a charset header, I guess `undecided'
would make more sense, right?  After all, Emacs already has the coding
detection machinery, may as well use it.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-01 17:34     ` Eli Zaretskii
@ 2019-05-02  6:35       ` Robert Pluim
  0 siblings, 0 replies; 36+ messages in thread
From: Robert Pluim @ 2019-05-02  6:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, andrewjmoreton

>>>>> On Wed, 01 May 2019 20:34:46 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com> Date: Wed, 01 May 2019
    >> 17:22:19 +0200 Cc: 35507@debbugs.gnu.org
    >> 
    >> I think utf-8 is a good fallback if the message doesnʼt specify
    >> a charset. Itʼs not going to produce any worse effects than
    >> what we have now.

    Eli> What considerations led you to that conclusion?

If the message requires a charset, gnus might produce
mojibake. Assuming utf-8 reduces the chance of that happening. Itʼs
true that in particular cases a different charset should be used, but
in that case the existing assumption of ASCII is wrong as well.

Robert





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02  3:07           ` Noam Postavsky
@ 2019-05-02  7:17             ` Andy Moreton
  2019-05-02 11:04               ` Eli Zaretskii
  2019-05-02 12:01               ` Noam Postavsky
  0 siblings, 2 replies; 36+ messages in thread
From: Andy Moreton @ 2019-05-02  7:17 UTC (permalink / raw)
  To: 35507

On Wed 01 May 2019, Noam Postavsky wrote:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Andy Moreton <andrewjmoreton@gmail.com>
>>> Date: Wed, 01 May 2019 17:42:18 +0100
>>> 
>>> +		     (mm-decode-string text 'utf-8))))
>>
>> As I said, I'm not sure we should do this, let alone unconditionally
>> force UTF-8 here, but if we must, why not use decode-coding-string?
>> Do we really need the mm-* stuff?
>
> As far as I can tell, the mm-* version is useful for handling stuff lke
> "UTF-8" as the charset argument (which might be useful if we extract it
> from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
> 'utf-8, then it's just the same as calling decode-coding-string.

OK, in that case we could indeed just call decode-coding-string.

> For a default if we don't find a charset header, I guess `undecided'
> would make more sense, right?  After all, Emacs already has the coding
> detection machinery, may as well use it.

Please re-read the original bug report: the problem is with malformed
messages that do not contain a charset field in the Content-Type header.

The one-liner patch changes the default for inline display in the
Gnus article buffer to assume UTF-8 when nothing is specified, rather
than just inserting the text without decoding it.

That should result in text that actually is UTF-8 being displayed
correctly, and no change to plain ASCII. For anything else, the user can
use the `gnus-mime-view-part-as-charset' command to override the
default.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02  7:17             ` Andy Moreton
@ 2019-05-02 11:04               ` Eli Zaretskii
  2019-05-02 15:43                 ` Andy Moreton
  2019-05-02 12:01               ` Noam Postavsky
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 11:04 UTC (permalink / raw)
  To: 35507, andrewjmoreton

On May 2, 2019 10:17:51 AM GMT+03:00, Andy Moreton <andrewjmoreton@gmail.com> wrote:
> On Wed 01 May 2019, Noam Postavsky wrote:
> 
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> >>> From: Andy Moreton <andrewjmoreton@gmail.com>
> >>> Date: Wed, 01 May 2019 17:42:18 +0100
> >>> 
> >>> +		     (mm-decode-string text 'utf-8))))
> >>
> >> As I said, I'm not sure we should do this, let alone
> unconditionally
> >> force UTF-8 here, but if we must, why not use decode-coding-string?
> >> Do we really need the mm-* stuff?
> >
> > As far as I can tell, the mm-* version is useful for handling stuff
> lke
> > "UTF-8" as the charset argument (which might be useful if we extract
> it
> > from the "Content-Type: text/plain; charset=UTF-8" header).  If
> passing
> > 'utf-8, then it's just the same as calling decode-coding-string.
> 
> OK, in that case we could indeed just call decode-coding-string.
> 
> > For a default if we don't find a charset header, I guess `undecided'
> > would make more sense, right?  After all, Emacs already has the
> coding
> > detection machinery, may as well use it.
> 
> Please re-read the original bug report: the problem is with malformed
> messages that do not contain a charset field in the Content-Type
> header.
> 
> The one-liner patch changes the default for inline display in the
> Gnus article buffer to assume UTF-8 when nothing is specified, rather
> than just inserting the text without decoding it.
> 
> That should result in text that actually is UTF-8 being displayed
> correctly, and no change to plain ASCII. For anything else, the user
> can
> use the `gnus-mime-view-part-as-charset' command to override the
> default.
> 
>     AndyM

Using 'undecided' doesn't disable decoding, it just means Emacs will try to detect the correct encoding by looking at the text (not at the charset header).  In a UTF-8 locale, we will guess UTF-8 anyway, unless we see invalid sequences.

So yes, I think Noam is right, and 'undecided' is a better alternative here.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02  7:17             ` Andy Moreton
  2019-05-02 11:04               ` Eli Zaretskii
@ 2019-05-02 12:01               ` Noam Postavsky
  2019-05-02 15:40                 ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Noam Postavsky @ 2019-05-02 12:01 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Wed 01 May 2019, Noam Postavsky wrote:
>>
>> As far as I can tell, the mm-* version is useful for handling stuff lke
>> "UTF-8" as the charset argument (which might be useful if we extract it
>> from the "Content-Type: text/plain; charset=UTF-8" header).  If passing
>> 'utf-8, then it's just the same as calling decode-coding-string.
>
> OK, in that case we could indeed just call decode-coding-string.
>
>> For a default if we don't find a charset header, I guess `undecided'
>> would make more sense, right?  After all, Emacs already has the coding
>> detection machinery, may as well use it.
>
> Please re-read the original bug report: the problem is with malformed
> messages that do not contain a charset field in the Content-Type header.

I understood from Paul's followup in https://debbugs.gnu.org/35507#32
that the report is mainly about the case where there is a Content-Type
header with a charset field within the body of the attachment.






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 12:01               ` Noam Postavsky
@ 2019-05-02 15:40                 ` Eli Zaretskii
  2019-05-03 14:02                   ` Basil L. Contovounesios
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 15:40 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 35507, andrewjmoreton

> From: Noam Postavsky <npostavs@gmail.com>
> Date: Thu, 02 May 2019 08:01:38 -0400
> Cc: 35507@debbugs.gnu.org
> 
> I understood from Paul's followup in https://debbugs.gnu.org/35507#32
> that the report is mainly about the case where there is a Content-Type
> header with a charset field within the body of the attachment.

Yes, that's my understanding as well.  So I guess Gnus should try
gleaning the charset from there.  The 'undecided' stuff is for when it
fails, I think.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 11:04               ` Eli Zaretskii
@ 2019-05-02 15:43                 ` Andy Moreton
  2019-05-02 15:57                   ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Moreton @ 2019-05-02 15:43 UTC (permalink / raw)
  To: 35507

On Thu 02 May 2019, Eli Zaretskii wrote:

> Using 'undecided' doesn't disable decoding, it just means Emacs will try to
> detect the correct encoding by looking at the text (not at the charset
> header). In a UTF-8 locale, we will guess UTF-8 anyway, unless we see invalid
> sequences.
>
> So yes, I think Noam is right, and 'undecided' is a better alternative here.

That is arguing for the existing code, which does not work correctly.

The problem is in `mm-display-inline-fontify'.

I am disinclined to look any further at this, as nobody else appears to
be running the existing code before commenting, or testing the proposed
patch.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 15:43                 ` Andy Moreton
@ 2019-05-02 15:57                   ` Eli Zaretskii
  2019-05-02 16:08                     ` Andy Moreton
                                       ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 15:57 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Thu, 02 May 2019 16:43:31 +0100
> 
> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
> 
> That is arguing for the existing code, which does not work correctly.

No, the existing code simply uses the undecoded string.

What I argue for is to do this:

diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..173ebfab48 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -475,7 +475,7 @@ mm-display-inline-fontify
 		    (charset
 		     (mm-decode-string text charset))
 		    (t
-		     text)))
+		     (mm-decode-string text 'undecided))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling

> I am disinclined to look any further at this, as nobody else appears to
> be running the existing code before commenting, or testing the proposed
> patch.

Please don't be offended, there's no intent to offend you here.  Your
efforts are greatly appreciated.  We are just discussing a small
change to what you were proposing, see above.

Or are you saying that using undecided as above doesn't do the job?

(Sorry, I don't use Gnus, so to be able to reproduce the problem and
test a proposed solution I need detailed instructions, I cannot easily
do it myself without investing an inordinate amount of time.)





^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 15:57                   ` Eli Zaretskii
@ 2019-05-02 16:08                     ` Andy Moreton
  2019-05-02 16:50                       ` Eli Zaretskii
  2019-05-02 17:13                       ` Eric Abrahamsen
  2019-05-02 16:10                     ` Basil L. Contovounesios
  2019-05-02 16:29                     ` Robert Pluim
  2 siblings, 2 replies; 36+ messages in thread
From: Andy Moreton @ 2019-05-02 16:08 UTC (permalink / raw)
  To: 35507

On Thu 02 May 2019, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Thu, 02 May 2019 16:43:31 +0100
>> 
>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>> 
>> That is arguing for the existing code, which does not work correctly.
>
> No, the existing code simply uses the undecoded string.
>
> What I argue for is to do this:
>
> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> index 1e1d264b99..173ebfab48 100644
> --- a/lisp/gnus/mm-view.el
> +++ b/lisp/gnus/mm-view.el
> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>  		    (charset
>  		     (mm-decode-string text charset))
>  		    (t
> -		     text)))
> +		     (mm-decode-string text 'undecided))))
>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>  	    (enable-local-variables nil))
>          ;; We used to set font-lock-mode-hook to nil to avoid enabling

ok, that does appear to work for the example message in the original bug
report. Please push this change and we can find out if it causes any
other problems.

>> I am disinclined to look any further at this, as nobody else appears to
>> be running the existing code before commenting, or testing the proposed
>> patch.
>
> Please don't be offended, there's no intent to offend you here.  Your
> efforts are greatly appreciated.  We are just discussing a small
> change to what you were proposing, see above.

I'm not offended, but I did want to encourage others to run the code and
test the results before adding further commentary.

> Or are you saying that using undecided as above doesn't do the job?
>
> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
> test a proposed solution I need detailed instructions, I cannot easily
> do it myself without investing an inordinate amount of time.)

The gnus-mock package on GNU ELPA may of some help for testing. However
I have not used it myself, nor investigated if it's collection of test
data contains a suitably malformed message.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 15:57                   ` Eli Zaretskii
  2019-05-02 16:08                     ` Andy Moreton
@ 2019-05-02 16:10                     ` Basil L. Contovounesios
  2019-05-02 16:50                       ` Eli Zaretskii
  2019-05-02 16:29                     ` Robert Pluim
  2 siblings, 1 reply; 36+ messages in thread
From: Basil L. Contovounesios @ 2019-05-02 16:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, Andy Moreton

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Thu, 02 May 2019 16:43:31 +0100
>> 
>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>> 
>> That is arguing for the existing code, which does not work correctly.
>
> No, the existing code simply uses the undecoded string.
>
> What I argue for is to do this:
>
> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> index 1e1d264b99..173ebfab48 100644
> --- a/lisp/gnus/mm-view.el
> +++ b/lisp/gnus/mm-view.el
> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>  		    (charset
>  		     (mm-decode-string text charset))
>  		    (t
> -		     text)))
> +		     (mm-decode-string text 'undecided))))
>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>  	    (enable-local-variables nil))
>          ;; We used to set font-lock-mode-hook to nil to avoid enabling
>
>> I am disinclined to look any further at this, as nobody else appears to
>> be running the existing code before commenting, or testing the proposed
>> patch.
>
> Please don't be offended, there's no intent to offend you here.  Your
> efforts are greatly appreciated.  We are just discussing a small
> change to what you were proposing, see above.
>
> Or are you saying that using undecided as above doesn't do the job?
>
> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
> test a proposed solution I need detailed instructions, I cannot easily
> do it myself without investing an inordinate amount of time.)

FWIW, I use Gnus, and your suggested change to mm-display-inline-fontify
fixes the inline display of the patch in the OP for me.  BTW, the last
two cond branches can be merged following your change:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mm-view.diff --]
[-- Type: text/x-diff, Size: 658 bytes --]

diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
index 1e1d264b99..849488293a 100644
--- a/lisp/gnus/mm-view.el
+++ b/lisp/gnus/mm-view.el
@@ -472,10 +472,8 @@ mm-display-inline-fontify
 		       (buffer-string)))
 		    (coding-system
 		     (decode-coding-string text coding-system))
-		    (charset
-		     (mm-decode-string text charset))
-		    (t
-		     text)))
+                    (t
+                     (mm-decode-string text (or charset 'undecided)))))
       (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
 	    (enable-local-variables nil))
         ;; We used to set font-lock-mode-hook to nil to avoid enabling

[-- Attachment #3: Type: text/plain, Size: 20 bytes --]


Thanks,

-- 
Basil

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 15:57                   ` Eli Zaretskii
  2019-05-02 16:08                     ` Andy Moreton
  2019-05-02 16:10                     ` Basil L. Contovounesios
@ 2019-05-02 16:29                     ` Robert Pluim
  2019-05-02 16:53                       ` Eli Zaretskii
  2 siblings, 1 reply; 36+ messages in thread
From: Robert Pluim @ 2019-05-02 16:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, Andy Moreton

>>>>> On Thu, 02 May 2019 18:57:23 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Andy Moreton <andrewjmoreton@gmail.com> Date: Thu, 02 May
    >> 2019 16:43:31 +0100
    >> 
    >> > So yes, I think Noam is right, and 'undecided' is a better
    >> alternative here.
    >> 
    >> That is arguing for the existing code, which does not work
    >> correctly.

    Eli> No, the existing code simply uses the undecoded string.

    Eli> What I argue for is to do this:

    >diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
    >index 1e1d264b99..173ebfab48 100644
    >--- a/lisp/gnus/mm-view.el
    >+++ b/lisp/gnus/mm-view.el
    >@@ -475,7 +475,7 @@ mm-display-inline-fontify
    > 		    (charset
    > 		     (mm-decode-string text charset))
    > 		    (t
    >- text)))
    >+ (mm-decode-string text 'undecided))))
    >       (let ((font-lock-verbose nil) ; font-lock is a bit too verbose.
    > 	    (enable-local-variables nil))
    >         ;; We used to set font-lock-mode-hook to nil to avoid enabling

That fixes things for me, thanks (I tested against Paul's original
message).

I donʼt see any need for it to be configurable, but thatʼs up to you.

Robert





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 16:10                     ` Basil L. Contovounesios
@ 2019-05-02 16:50                       ` Eli Zaretskii
  2019-05-02 17:20                         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 16:50 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 35507, andrewjmoreton

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: Andy Moreton <andrewjmoreton@gmail.com>,  <35507@debbugs.gnu.org>
> Date: Thu, 02 May 2019 17:10:40 +0100
> 
> FWIW, I use Gnus, and your suggested change to mm-display-inline-fontify
> fixes the inline display of the patch in the OP for me.  BTW, the last
> two cond branches can be merged following your change:

Thanks.  Would you please push that, and give credit to Noam for the
suggestion?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 16:08                     ` Andy Moreton
@ 2019-05-02 16:50                       ` Eli Zaretskii
  2019-05-02 17:13                       ` Eric Abrahamsen
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 16:50 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Thu, 02 May 2019 17:08:21 +0100
> 
> > diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
> > index 1e1d264b99..173ebfab48 100644
> > --- a/lisp/gnus/mm-view.el
> > +++ b/lisp/gnus/mm-view.el
> > @@ -475,7 +475,7 @@ mm-display-inline-fontify
> >  		    (charset
> >  		     (mm-decode-string text charset))
> >  		    (t
> > -		     text)))
> > +		     (mm-decode-string text 'undecided))))
> >        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
> >  	    (enable-local-variables nil))
> >          ;; We used to set font-lock-mode-hook to nil to avoid enabling
> 
> ok, that does appear to work for the example message in the original bug
> report.

Thanks for testing.  Something according those lines will be in the
repository shortly.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 16:29                     ` Robert Pluim
@ 2019-05-02 16:53                       ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 16:53 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 35507, andrewjmoreton

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Andy Moreton <andrewjmoreton@gmail.com>,  35507@debbugs.gnu.org
> Date: Thu, 02 May 2019 18:29:52 +0200
> 
> That fixes things for me, thanks (I tested against Paul's original
> message).

Thanks for testing.

> I donʼt see any need for it to be configurable, but thatʼs up to you.

No need, IMO.  That's a nice bonus of Noam's idea: 'undecided' can
already be configured via existing facilities, like
prefer-coding-system, set-language-environment, and their ilk.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 16:08                     ` Andy Moreton
  2019-05-02 16:50                       ` Eli Zaretskii
@ 2019-05-02 17:13                       ` Eric Abrahamsen
  2019-05-02 17:45                         ` Andy Moreton
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Abrahamsen @ 2019-05-02 17:13 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Thu 02 May 2019, Eli Zaretskii wrote:
>
>>> From: Andy Moreton <andrewjmoreton@gmail.com>
>>> Date: Thu, 02 May 2019 16:43:31 +0100
>>> 
>>> > So yes, I think Noam is right, and 'undecided' is a better alternative here.
>>> 
>>> That is arguing for the existing code, which does not work correctly.
>>
>> No, the existing code simply uses the undecoded string.
>>
>> What I argue for is to do this:
>>
>> diff --git a/lisp/gnus/mm-view.el b/lisp/gnus/mm-view.el
>> index 1e1d264b99..173ebfab48 100644
>> --- a/lisp/gnus/mm-view.el
>> +++ b/lisp/gnus/mm-view.el
>> @@ -475,7 +475,7 @@ mm-display-inline-fontify
>>  		    (charset
>>  		     (mm-decode-string text charset))
>>  		    (t
>> -		     text)))
>> +		     (mm-decode-string text 'undecided))))
>>        (let ((font-lock-verbose nil)     ; font-lock is a bit too verbose.
>>  	    (enable-local-variables nil))
>>          ;; We used to set font-lock-mode-hook to nil to avoid enabling
>
> ok, that does appear to work for the example message in the original bug
> report. Please push this change and we can find out if it causes any
> other problems.
>
>>> I am disinclined to look any further at this, as nobody else appears to
>>> be running the existing code before commenting, or testing the proposed
>>> patch.
>>
>> Please don't be offended, there's no intent to offend you here.  Your
>> efforts are greatly appreciated.  We are just discussing a small
>> change to what you were proposing, see above.
>
> I'm not offended, but I did want to encourage others to run the code and
> test the results before adding further commentary.
>
>> Or are you saying that using undecided as above doesn't do the job?
>>
>> (Sorry, I don't use Gnus, so to be able to reproduce the problem and
>> test a proposed solution I need detailed instructions, I cannot easily
>> do it myself without investing an inordinate amount of time.)
>
> The gnus-mock package on GNU ELPA may of some help for testing. However
> I have not used it myself, nor investigated if it's collection of test
> data contains a suitably malformed message.

It doesn't currently, but this is a perfect use-case for the package.
Shall I just add the up-thread message into the test data? Or can we
come up with a more-broken version of the message?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 16:50                       ` Eli Zaretskii
@ 2019-05-02 17:20                         ` Eli Zaretskii
  2019-05-03 13:55                           ` Basil L. Contovounesios
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-02 17:20 UTC (permalink / raw)
  To: contovob; +Cc: 35507, andrewjmoreton

> Date: Thu, 02 May 2019 19:50:11 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 35507@debbugs.gnu.org, andrewjmoreton@gmail.com
> 
> give credit to Noam

And to Andy, of course.  Sorry, thought it was obvious, but better
safe than sorry.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 17:13                       ` Eric Abrahamsen
@ 2019-05-02 17:45                         ` Andy Moreton
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Moreton @ 2019-05-02 17:45 UTC (permalink / raw)
  To: 35507

On Thu 02 May 2019, Eric Abrahamsen wrote:

> Andy Moreton <andrewjmoreton@gmail.com> writes:
>> The gnus-mock package on GNU ELPA may of some help for testing. However
>> I have not used it myself, nor investigated if it's collection of test
>> data contains a suitably malformed message.
>
> It doesn't currently, but this is a perfect use-case for the package.
> Shall I just add the up-thread message into the test data? Or can we
> come up with a more-broken version of the message?

Something similar, but any test data should be anonymised so that it
does not contain personal details or real email addresses.

    AndyM






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-04-30 19:20 bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Paul Eggert
  2019-05-01  0:35 ` Andy Moreton
  2019-05-01 17:32 ` Eli Zaretskii
@ 2019-05-02 23:24 ` Paul Eggert
  2 siblings, 0 replies; 36+ messages in thread
From: Paul Eggert @ 2019-05-02 23:24 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 35507

> any test data should be anonymised so that it
> does not contain personal details or real email addresses.
It's OK with me if you use my original bug report as test data, as I
think the only email addresses it contains are public ones like mine
(already nearly 3000 copies of that in the Emacs source code!) or
bug-gnu-emacs.

Thanks to all for fixing this.






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 17:20                         ` Eli Zaretskii
@ 2019-05-03 13:55                           ` Basil L. Contovounesios
  0 siblings, 0 replies; 36+ messages in thread
From: Basil L. Contovounesios @ 2019-05-03 13:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507-done, andrewjmoreton

tags 35507 fixed
close 35507
quit

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Thu, 02 May 2019 19:50:11 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: 35507@debbugs.gnu.org, andrewjmoreton@gmail.com
>> 
>> give credit to Noam
>
> And to Andy, of course.  Sorry, thought it was obvious, but better
> safe than sorry.

Done (hopefully without needing to be sorry):

[24a1d5a0b5]: Fix Gnus inline attachment decoding (bug#35507)
  2019-05-03 14:52:01 +0100
  https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=24a1d5a0b5c0debd8256d71242bfa6f8448bf5af

I am thus closing this report.

Thanks,

-- 
Basil





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-02 15:40                 ` Eli Zaretskii
@ 2019-05-03 14:02                   ` Basil L. Contovounesios
  2019-05-03 15:14                     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Basil L. Contovounesios @ 2019-05-03 14:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, andrewjmoreton, Noam Postavsky

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Noam Postavsky <npostavs@gmail.com>
>> Date: Thu, 02 May 2019 08:01:38 -0400
>> Cc: 35507@debbugs.gnu.org
>> 
>> I understood from Paul's followup in https://debbugs.gnu.org/35507#32
>> that the report is mainly about the case where there is a Content-Type
>> header with a charset field within the body of the attachment.
>
> Yes, that's my understanding as well.  So I guess Gnus should try
> gleaning the charset from there.  The 'undecided' stuff is for when it
> fails, I think.

Question following an initial reading of (info "(elisp) Coding System
Basics"): would it be better in this case to use prefer-utf-8 instead of
undecided?  If not, why not?

Thanks,

-- 
Basil





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-03 14:02                   ` Basil L. Contovounesios
@ 2019-05-03 15:14                     ` Eli Zaretskii
  2019-05-03 15:20                       ` Basil L. Contovounesios
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2019-05-03 15:14 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: 35507, andrewjmoreton, npostavs

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: Noam Postavsky <npostavs@gmail.com>,  <35507@debbugs.gnu.org>,  <andrewjmoreton@gmail.com>
> Date: Fri, 03 May 2019 15:02:01 +0100
> 
> Question following an initial reading of (info "(elisp) Coding System
> Basics"): would it be better in this case to use prefer-utf-8 instead of
> undecided?  If not, why not?

Because we have no reason to prefer UTF-8 in this case.  No one tells
us that x-patch will be predominantly encoded in UTF-8.

The RFC doesn't say that UTF-8 is the default, either, and
text/x-patch is not defined anywhere with that default.  Which means
there's no default, and in that case 'undecided' is better, because it
heeds to the preferences of the user.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
  2019-05-03 15:14                     ` Eli Zaretskii
@ 2019-05-03 15:20                       ` Basil L. Contovounesios
  0 siblings, 0 replies; 36+ messages in thread
From: Basil L. Contovounesios @ 2019-05-03 15:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 35507, andrewjmoreton, npostavs

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob@tcd.ie>
>> Cc: Noam Postavsky <npostavs@gmail.com>,  <35507@debbugs.gnu.org>,  <andrewjmoreton@gmail.com>
>> Date: Fri, 03 May 2019 15:02:01 +0100
>> 
>> Question following an initial reading of (info "(elisp) Coding System
>> Basics"): would it be better in this case to use prefer-utf-8 instead of
>> undecided?  If not, why not?
>
> Because we have no reason to prefer UTF-8 in this case.  No one tells
> us that x-patch will be predominantly encoded in UTF-8.
>
> The RFC doesn't say that UTF-8 is the default, either, and
> text/x-patch is not defined anywhere with that default.  Which means
> there's no default, and in that case 'undecided' is better, because it
> heeds to the preferences of the user.

Right, thanks for explaining.

-- 
Basil





^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-05-03 15:20 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-30 19:20 bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird Paul Eggert
2019-05-01  0:35 ` Andy Moreton
2019-05-01 15:22   ` Robert Pluim
2019-05-01 15:45     ` Andy Moreton
2019-05-01 16:42       ` Andy Moreton
2019-05-01 17:36         ` Eli Zaretskii
2019-05-01 23:54           ` Andy Moreton
2019-05-02  3:07           ` Noam Postavsky
2019-05-02  7:17             ` Andy Moreton
2019-05-02 11:04               ` Eli Zaretskii
2019-05-02 15:43                 ` Andy Moreton
2019-05-02 15:57                   ` Eli Zaretskii
2019-05-02 16:08                     ` Andy Moreton
2019-05-02 16:50                       ` Eli Zaretskii
2019-05-02 17:13                       ` Eric Abrahamsen
2019-05-02 17:45                         ` Andy Moreton
2019-05-02 16:10                     ` Basil L. Contovounesios
2019-05-02 16:50                       ` Eli Zaretskii
2019-05-02 17:20                         ` Eli Zaretskii
2019-05-03 13:55                           ` Basil L. Contovounesios
2019-05-02 16:29                     ` Robert Pluim
2019-05-02 16:53                       ` Eli Zaretskii
2019-05-02 12:01               ` Noam Postavsky
2019-05-02 15:40                 ` Eli Zaretskii
2019-05-03 14:02                   ` Basil L. Contovounesios
2019-05-03 15:14                     ` Eli Zaretskii
2019-05-03 15:20                       ` Basil L. Contovounesios
2019-05-01 17:34     ` Eli Zaretskii
2019-05-02  6:35       ` Robert Pluim
2019-05-01 17:33   ` Eli Zaretskii
2019-05-01 17:32 ` Eli Zaretskii
2019-05-01 18:26   ` Paul Eggert
2019-05-01 19:05     ` Eli Zaretskii
2019-05-01 19:47       ` Andreas Schwab
2019-05-01 19:57         ` Eli Zaretskii
2019-05-02 23:24 ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).