Try this simple experiment: $ echo Subject: 一二三|procmail $ echo Subject: 一二三|iconv -t big5|procmail $ emacs -f gnus In the *Article* buffer, both look like Subject: 一二三 In the *Summary* buffer so does the big5 version. Alas, the utf-8 version looks like c\x80\xd3.... (Yes, these are illegal raw headers. But Gnus is supposed to be accommodating. And it does... but oddly not for the majority (UTF-8) case.) Important settings: value of $LC_COLLATE: C value of $LC_CTYPE: zh_TW.UTF-8 value of $LC_MESSAGES: C value of $LANG: zh_TW.UTF-8 value of $XMODIFIERS: @im=ibus locale-coding-system: utf-8-unix (Might be related to bug#45724.) (https://www.jidanni.org/comp/configuration/ has my dot files. )
積丹尼 Dan Jacobson <jidanni@jidanni.org> writes: > Try this simple experiment: > $ echo Subject: 一二三|procmail > $ echo Subject: 一二三|iconv -t big5|procmail I don't have procmail installed, so I'm not sure what these do -- are you sending a mail (to yourself?) here? Do you have a recipe to reproduce this problem without the use of procmail? > $ emacs -f gnus > > In the *Article* buffer, both look like > Subject: 一二三 > In the *Summary* buffer so does the big5 version. > Alas, the utf-8 version looks like > c\x80\xd3.... > > (Yes, these are illegal raw headers. But Gnus is supposed to be > accommodating. And it does... but oddly not for the majority (UTF-8) case.) [...] > (Might be related to bug#45724.) Is this still with nnml? If so, could you find the resulting lines in the .overview files in the nnml directory and post them here? (Perhaps after gzipping them to avoid Emacs helpfully re-encoding the lines.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no
>>>>> "LI" == Lars Ingebrigtsen <larsi@gnus.org> writes: LI> 積丹尼 Dan Jacobson <jidanni@jidanni.org> writes: >> Try this simple experiment: >> $ echo Subject: 一二三|procmail >> $ echo Subject: 一二三|iconv -t big5|procmail LI> I don't have procmail installed, so I'm not sure what these do -- are LI> you sending a mail (to yourself?) here? Do you have a recipe to LI> reproduce this problem without the use of procmail? $ echo Subject: 一二三 > ~/Maildir/new/Z $ file ~/Maildir/new/Z ~/Maildir/new/Z: UTF-8 Unicode text >> $ emacs -f gnus >> >> In the *Article* buffer, both look like >> Subject: 一二三 >> In the *Summary* buffer so does the big5 version. >> Alas, the utf-8 version looks like >> c\x80\xd3.... >> >> (Yes, these are illegal raw headers. But Gnus is supposed to be >> accommodating. And it does... but oddly not for the majority (UTF-8) case.) LI> [...] >> (Might be related to bug#45724.) LI> Is this still with nnml? If so, could you find the resulting lines in LI> the .overview files in the nnml directory and post them here? (Perhaps LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.) Yes, nnml. The headers get appended raw to .overview. Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file. $ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5 $ echo Subject: 一二三 > ~/Maildir/new/UT $ emacs -f gnus $ tail -n 2 Mail/mail/misc/.overview|qprint -e 37397 =A4@=A4G=A4T (nobody) <87a6t4gnpx.5.fsf@totally-fudged-out-mess= age-id> 0 0 Xref: jidanni5 mail.misc:37397=09 37398 =E4=B8=80=E4=BA=8C=E4=B8=89 (nobody) <878s8ognpx.5.fsf@totally-= fudged-out-message-id> 0 0 Xref: jidanni5 mail.misc:37398=09 Anyway: *Summary* oddly can only deal with raw big5, not raw UTF-8. However *Article* can deal with both.
積丹尼 Dan Jacobson <jidanni@jidanni.org> writes: > LI> I don't have procmail installed, so I'm not sure what these do -- are > LI> you sending a mail (to yourself?) here? Do you have a recipe to > LI> reproduce this problem without the use of procmail? > > $ echo Subject: 一二三 > ~/Maildir/new/Z > $ file ~/Maildir/new/Z > ~/Maildir/new/Z: UTF-8 Unicode text I thought this was about nnml? Is ~/Maildir/new/Z your nnml directory? > LI> Is this still with nnml? If so, could you find the resulting lines in > LI> the .overview files in the nnml directory and post them here? (Perhaps > LI> after gzipping them to avoid Emacs helpfully re-encoding the lines.) > > Yes, nnml. > > The headers get appended raw to .overview. > > Thus .overview contains a mix of ASCII, big5, and UTF-8, all in the same file. > > $ echo Subject: 一二三|iconv -t big5 > ~/Maildir/new/B5 > $ echo Subject: 一二三 > ~/Maildir/new/UT > $ emacs -f gnus > $ tail -n 2 Mail/mail/misc/.overview|qprint -e > 37397 =A4@=A4G=A4T (nobody) <87a6t4gnpx.5.fsf@totally-fudged-out-mess= > age-id> 0 0 Xref: jidanni5 mail.misc:37397=09 > 37398 =E4=B8=80=E4=BA=8C=E4=B8=89 (nobody) <878s8ognpx.5.fsf@totally-= > fudged-out-message-id> 0 0 Xref: jidanni5 mail.misc:37398=09 There was just ASCII in the part you posted. Could you gzip it, as I asked you to? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no
[-- Attachment #1: Type: text/plain, Size: 492 bytes --] LI> I thought this was about nnml? Is ~/Maildir/new/Z your nnml directory? No. I just made a file where gnus gets its mail from when I hit "g". g runs the command gnus-group-get-new-news Anyway, all you need to do to reproduce this bug, is to have somebody send you a mail with raw UTF-8 in the Subject header. LI> There was just ASCII in the part you posted. Could you gzip it, as I LI> asked you to? $ perl -nwle 'print if /\P{ASCII}/' Mail/mail/misc/.overview > /tmp/h $ gzip /tmp/h [-- Attachment #2: h.gz --] [-- Type: application/gzip, Size: 863 bytes --] [-- Attachment #3: Type: text/plain, Size: 436 bytes --] Here you will see a mix of raw UTF-8, raw big5, all in the same file. The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary buffer. In the article buffer, all look fine. Here are all my config files: https://www.jidanni.org/comp/configuration/.emacs https://www.jidanni.org/comp/configuration/.gnus.el https://www.jidanni.org/comp/configuration/.emacs-custom.el https://www.jidanni.org/comp/configuration/.emacs-w3m
> From: 積丹尼 Dan Jacobson
> <jidanni@jidanni.org>
> Date: Fri, 22 Jan 2021 03:55:44 +0800
> Cc: 45925@debbugs.gnu.org
>
> Here you will see a mix of raw UTF-8, raw big5, all in the same file.
> The raw big5 works fine, but the raw UTF-8 looks garbled, in the summary
> buffer. In the article buffer, all look fine.
Why do you expect a mixed-encoding stuff to work in Emacs? Emacs only
supports a single encoding of any chunk of text it gets, be it a file
or an email message.
Files such as this one are simply not supported.
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
EZ> Why do you expect a mixed-encoding stuff to work in Emacs? Emacs only
EZ> supports a single encoding of any chunk of text it gets, be it a file
EZ> or an email message.
EZ> Files such as this one are simply not supported.
So, Gnus should not just randomly slap raw lines into the same file.
That is the root of all problems!
Eli Zaretskii <eliz@gnu.org> writes: > Why do you expect a mixed-encoding stuff to work in Emacs? Emacs only > supports a single encoding of any chunk of text it gets, be it a file > or an email message. > > Files such as this one are simply not supported. Sure they are. It's not a text file; it's an octet stream. But as Dan points out, Gnus doesn't handle these invalid mails optimally, and doing some RFC2047-encoding to the headers before writing the .overview file will help a bit here, so I've now done that in Emacs 28. (Gnus will still display some of these headers "wrong" in the summary buffer, and display them "right" in the article buffer, because Gnus has to guess at what the charset is, and it does further guessing in the article buffer than in the summary buffer, for reasons of efficiency.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no