unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
@ 2021-01-17  5:35 積丹尼 Dan Jacobson
  2021-01-19  5:30 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-01-17  5:35 UTC (permalink / raw)
  To: 45925

Try this simple experiment:
$ echo Subject: 一二三|procmail
$ echo Subject: 一二三|iconv -t big5|procmail
$ emacs -f gnus

In the *Article* buffer, both look like
Subject: 一二三
In the *Summary* buffer so does the big5 version.
Alas, the utf-8 version looks like
c\x80\xd3....

(Yes, these are illegal raw headers. But Gnus is supposed to be
accommodating. And it does... but oddly not for the majority (UTF-8) case.)

Important settings:
  value of $LC_COLLATE: C
  value of $LC_CTYPE: zh_TW.UTF-8
  value of $LC_MESSAGES: C
  value of $LANG: zh_TW.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

(Might be related to bug#45724.)

(https://www.jidanni.org/comp/configuration/ has my dot files. )





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-17  5:35 bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers 積丹尼 Dan Jacobson
@ 2021-01-19  5:30 ` Lars Ingebrigtsen
  2021-01-20  5:58   ` 積丹尼 Dan Jacobson
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-19  5:30 UTC (permalink / raw)
  To: 積丹尼 Dan Jacobson; +Cc: 45925

積丹尼 Dan Jacobson <jidanni@jidanni.org> writes:

> Try this simple experiment:
> $ echo Subject: 一二三|procmail
> $ echo Subject: 一二三|iconv -t big5|procmail

I don't have procmail installed, so I'm not sure what these do -- are
you sending a mail (to yourself?) here?  Do you have a recipe to
reproduce this problem without the use of procmail?

> $ emacs -f gnus
>
> In the *Article* buffer, both look like
> Subject: 一二三
> In the *Summary* buffer so does the big5 version.
> Alas, the utf-8 version looks like
> c\x80\xd3....
>
> (Yes, these are illegal raw headers. But Gnus is supposed to be
> accommodating. And it does... but oddly not for the majority (UTF-8) case.)

[...]

> (Might be related to bug#45724.)

Is this still with nnml?  If so, could you find the resulting lines in
the .overview files in the nnml directory and post them here?  (Perhaps
after gzipping them to avoid Emacs helpfully re-encoding the lines.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-19  5:30 ` Lars Ingebrigtsen
@ 2021-01-20  5:58   ` 積丹尼 Dan Jacobson
  2021-01-20 16:31     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-01-20  5:58 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 45925

>>>>> "LI" == Lars Ingebrigtsen <larsi@gnus.org> writes:
LI> 積丹尼 Dan Jacobson <jidanni@jidanni.org> writes:

>> Try this simple experiment:
>> $ echo Subject: 一二三|procmail
>> $ echo Subject: 一二三|iconv -t big5|procmail

LI> I don't have procmail installed, so I'm not sure what these do -- are
LI> you sending a mail (to yourself?) here?  Do you have a recipe to
LI> reproduce this problem without the use of procmail?

$ echo Subject: 一二三 > ~/Maildir/new/Z
$ file ~/Maildir/new/Z
~/Maildir/new/Z: UTF-8 Unicode text


>> $ emacs -f gnus
>> 
>> In the *Article* buffer, both look like
>> Subject: 一二三
>> In the *Summary* buffer so does the big5 version.
>> Alas, the utf-8 version looks like
>> c\x80\xd3....
>> 
>> (Yes, these are illegal raw headers. But Gnus is supposed to be
>> accommodating. And it does... but oddly not for the majority (UTF-8) case.)

LI> [...]

>> (Might be related to bug#45724.)

LI> Is this still with nnml?  If so, could you find the resulting lines in
LI> the .overview files in the nnml directory and post them here?  (Perhaps
LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.)

Yes, nnml.

The headers get appended raw to .overview.

Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file.

$ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5
$ echo Subject: 一二三 > ~/Maildir/new/UT
$ emacs -f gnus
$ tail -n 2 Mail/mail/misc/.overview|qprint -e
37397   =A4@=A4G=A4T    (nobody)                <87a6t4gnpx.5.fsf@totally-fudged-out-mess=
age-id>         0       0       Xref: jidanni5 mail.misc:37397=09
37398   =E4=B8=80=E4=BA=8C=E4=B8=89     (nobody)                <878s8ognpx.5.fsf@totally-=
fudged-out-message-id>          0       0       Xref: jidanni5 mail.misc:37398=09

Anyway: *Summary* oddly can only deal with raw big5, not raw UTF-8.
However *Article* can deal with both.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-20  5:58   ` 積丹尼 Dan Jacobson
@ 2021-01-20 16:31     ` Lars Ingebrigtsen
  2021-01-21 19:55       ` 積丹尼 Dan Jacobson
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-20 16:31 UTC (permalink / raw)
  To: 積丹尼 Dan Jacobson; +Cc: 45925

積丹尼 Dan Jacobson <jidanni@jidanni.org> writes:

> LI> I don't have procmail installed, so I'm not sure what these do -- are
> LI> you sending a mail (to yourself?) here?  Do you have a recipe to
> LI> reproduce this problem without the use of procmail?
>
> $ echo Subject: 一二三 > ~/Maildir/new/Z
> $ file ~/Maildir/new/Z
> ~/Maildir/new/Z: UTF-8 Unicode text

I thought this was about nnml?  Is ~/Maildir/new/Z your nnml directory?

> LI> Is this still with nnml?  If so, could you find the resulting lines in
> LI> the .overview files in the nnml directory and post them here?  (Perhaps
> LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.)
>
> Yes, nnml.
>
> The headers get appended raw to .overview.
>
> Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file.
>
> $ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5
> $ echo Subject: 一二三 > ~/Maildir/new/UT
> $ emacs -f gnus
> $ tail -n 2 Mail/mail/misc/.overview|qprint -e
> 37397 =A4@=A4G=A4T (nobody) <87a6t4gnpx.5.fsf@totally-fudged-out-mess=
> age-id>         0       0       Xref: jidanni5 mail.misc:37397=09
> 37398 =E4=B8=80=E4=BA=8C=E4=B8=89 (nobody) <878s8ognpx.5.fsf@totally-=
> fudged-out-message-id> 0 0 Xref: jidanni5 mail.misc:37398=09

There was just ASCII in the part you posted.  Could you gzip it, as I
asked you to?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-20 16:31     ` Lars Ingebrigtsen
@ 2021-01-21 19:55       ` 積丹尼 Dan Jacobson
  2021-01-21 20:22         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-01-21 19:55 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 45925

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

LI> I thought this was about nnml?  Is ~/Maildir/new/Z your nnml directory?

No. I just made a file where gnus gets its mail from when I hit "g".
g runs the command gnus-group-get-new-news

Anyway, all you need to do to reproduce this bug, is to have somebody
send you a mail with raw UTF-8 in the Subject header.

LI> There was just ASCII in the part you posted.  Could you gzip it, as I
LI> asked you to?

$ perl -nwle 'print if /\P{ASCII}/' Mail/mail/misc/.overview > /tmp/h
$ gzip /tmp/h

[-- Attachment #2: h.gz --]
[-- Type: application/gzip, Size: 863 bytes --]

[-- Attachment #3: Type: text/plain, Size: 436 bytes --]

Here you will see a mix of raw UTF-8, raw big5, all in the same file.
The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary
buffer. In the article buffer, all look fine.

Here are all my config files:
https://www.jidanni.org/comp/configuration/.emacs
https://www.jidanni.org/comp/configuration/.gnus.el
https://www.jidanni.org/comp/configuration/.emacs-custom.el
https://www.jidanni.org/comp/configuration/.emacs-w3m

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-21 19:55       ` 積丹尼 Dan Jacobson
@ 2021-01-21 20:22         ` Eli Zaretskii
  2021-01-21 20:54           ` 積丹尼 Dan Jacobson
  2021-01-22 18:06           ` Lars Ingebrigtsen
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2021-01-21 20:22 UTC (permalink / raw)
  To: 積丹尼 Dan Jacobson; +Cc: larsi, 45925

> From: 積丹尼 Dan Jacobson
>  <jidanni@jidanni.org>
> Date: Fri, 22 Jan 2021 03:55:44 +0800
> Cc: 45925@debbugs.gnu.org
> 
> Here you will see a mix of raw UTF-8, raw big5, all in the same file.
> The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary
> buffer. In the article buffer, all look fine.

Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
supports a single encoding of any chunk of text it gets, be it a file
or an email message.

Files such as this one are simply not supported.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-21 20:22         ` Eli Zaretskii
@ 2021-01-21 20:54           ` 積丹尼 Dan Jacobson
  2021-01-22 18:06           ` Lars Ingebrigtsen
  1 sibling, 0 replies; 8+ messages in thread
From: 積丹尼 Dan Jacobson @ 2021-01-21 20:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, 45925

>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
EZ> Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
EZ> supports a single encoding of any chunk of text it gets, be it a file
EZ> or an email message.

EZ> Files such as this one are simply not supported.

So, Gnus should not just randomly slap raw lines into the same file.
That is the root of all problems!





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers
  2021-01-21 20:22         ` Eli Zaretskii
  2021-01-21 20:54           ` 積丹尼 Dan Jacobson
@ 2021-01-22 18:06           ` Lars Ingebrigtsen
  1 sibling, 0 replies; 8+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-22 18:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45925, 積丹尼 Dan Jacobson

Eli Zaretskii <eliz@gnu.org> writes:

> Why do you expect a mixed-encoding stuff to work in Emacs?  Emacs only
> supports a single encoding of any chunk of text it gets, be it a file
> or an email message.
>
> Files such as this one are simply not supported.

Sure they are.  It's not a text file; it's an octet stream.

But as Dan points out, Gnus doesn't handle these invalid mails
optimally, and doing some RFC2047-encoding to the headers before writing
the .overview file will help a bit here, so I've now done that in Emacs
28.

(Gnus will still display some of these headers "wrong" in the summary
buffer, and display them "right" in the article buffer, because Gnus has
to guess at what the charset is, and it does further guessing in the
article buffer than in the summary buffer, for reasons of efficiency.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-22 18:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-17  5:35 bug#45925: 27.1; *Summary* buffer vs. raw utf-8 headers 積丹尼 Dan Jacobson
2021-01-19  5:30 ` Lars Ingebrigtsen
2021-01-20  5:58   ` 積丹尼 Dan Jacobson
2021-01-20 16:31     ` Lars Ingebrigtsen
2021-01-21 19:55       ` 積丹尼 Dan Jacobson
2021-01-21 20:22         ` Eli Zaretskii
2021-01-21 20:54           ` 積丹尼 Dan Jacobson
2021-01-22 18:06           ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).