all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* RMAIL doesn't always recognize rmail-mime-charset-pattern
@ 2002-05-18 13:18 David Kuehling
  2002-05-18 16:26 ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: David Kuehling @ 2002-05-18 13:18 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

RMAIL didn't decode a mail, which contained a 

Content-Type: text/plain; charset=euc-jp

header.  Instead it decoded it as iso-latin-1-unix.  I wanted to
reproduce the bug, so I saved the mail into INBOX format.  But when I
opened the inbox format mail, the coding system was detected correctly.

I don't understand how this can happen.  When receiving and opening
mails, the decoding is done by the same routine,
rmail-convert-to-babyl-format.

This is difficult to debug and rmail.el is extremly ugly.  If just
someone with more knowledge of rmail could have a look at it...

I attach the INBOX format mail as buggy.mail.gz, the wronlgy converted
BABYL format mail as buggy.xmail.gz.  If you want to try to reproduce
the mail retrieval bug, you can subscribe to the mailinglist, that's
sending those mails:

https://ieeecs.ece.utexas.edu/mailman/listinfo/gakushuu_kotd

(I'm retrieving the mails using a pop3 po:<...> mailbox)

David Kühling
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40


[-- Attachment #2: INBOX-format mail --]
[-- Type: application/octet-stream, Size: 6219 bytes --]

[-- Attachment #3: BABYL-format mail --]
[-- Type: application/octet-stream, Size: 6600 bytes --]

[-- Attachment #4: Type: text/plain, Size: 35 bytes --]


PS: I'm not subscribed, please CC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-18 13:18 RMAIL doesn't always recognize rmail-mime-charset-pattern David Kuehling
@ 2002-05-18 16:26 ` Eli Zaretskii
  2002-05-18 17:52   ` David Kuehling
  2002-05-19 13:09   ` David Kuehling
  0 siblings, 2 replies; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-18 16:26 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: David Kuehling <dvdkhlng@gmx.de>
> Date: 18 May 2002 15:18:47 +0200
> 
> RMAIL didn't decode a mail, which contained a 
> 
> Content-Type: text/plain; charset=euc-jp
> 
> header.  Instead it decoded it as iso-latin-1-unix.  I wanted to
> reproduce the bug, so I saved the mail into INBOX format.  But when I
> opened the inbox format mail, the coding system was detected correctly.
> 
> I don't understand how this can happen.  When receiving and opening
> mails, the decoding is done by the same routine,
> rmail-convert-to-babyl-format.

I cannot reproduce this, either.  I took your buggy.mail, removed the
"X-Coding-system:" line, then typed "C-u g buggy.mail RET" inside an
RMAIL buffer.  The message was correctly decoded as EUC-JP.

Did you try to remove the "X-Coding-system:" header, and see if that
allows you to reproduce the problem?  If not, can you please try that?

> (I'm retrieving the mails using a pop3 po:<...> mailbox)

Do you still have the original mailbox retrieved by pop3?  If so, can
you spot some differences between the original file and buggy.mail
you sent with your report?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-18 16:26 ` Eli Zaretskii
@ 2002-05-18 17:52   ` David Kuehling
  2002-05-19 13:09   ` David Kuehling
  1 sibling, 0 replies; 14+ messages in thread
From: David Kuehling @ 2002-05-18 17:52 UTC (permalink / raw)


>>>>> "Eli" == Eli Zaretskii <eliz@is.elta.co.il> writes:

> I cannot reproduce this, either.  I took your buggy.mail, removed the
> "X-Coding-system:" line, then typed "C-u g buggy.mail RET" inside an
> RMAIL buffer.  The message was correctly decoded as EUC-JP.

I just tried the same.  My Emacs also decoded the message correctly.
That's strange.

> Do you still have the original mailbox retrieved by pop3?  If so, can
> you spot some differences between the original file and buggy.mail you
> sent with your report?

I don't have the original mailbox.  But I'll receive the next mail of
that kind tomorrow.  Hopefully I'll then remember to make a copy of my
mailserver's inbox before using Emacs' mail retrieval...

David Kühling
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-18 16:26 ` Eli Zaretskii
  2002-05-18 17:52   ` David Kuehling
@ 2002-05-19 13:09   ` David Kuehling
  2002-05-19 13:27     ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: David Kuehling @ 2002-05-19 13:09 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

>>>>> "Eli" == Eli Zaretskii <eliz@is.elta.co.il> writes:

> Do you still have the original mailbox retrieved by pop3?

Today I made a copy of my inbox file, before using Emacs' pop3 mail
retrieval.  The bug is now reproducable, when receiving that mail box
via pop3 (2nd mail in the mailbox file).  But it won't occur, when I use
C-u g savedmailbox <Ret>.  That's cool!

And I just discovered another mysterious bug.  The last mailinglist mail
in the mailbox (from debian-japanese) contains a 

Content-Type: TEXT/PLAIN; charset=iso-8859-1

header, although its contents are obviously iso-2022-jp (7bit) encoded.
When receiving the mailbox via pop3, the japanese characters in the mail
show up correctly, and an

X-Coding-System: iso-2022-jp-unix

header is generated.  How can that happen?  The same mail, however, read
via C-u g ... will be decoded as iso-8859-1 (as it should be).

I attach the full mailbox: savedmailbox.gz.

David Kühling
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40


[-- Attachment #2: savedmailbox.gz --]
[-- Type: application/octet-stream, Size: 16121 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-19 13:09   ` David Kuehling
@ 2002-05-19 13:27     ` Eli Zaretskii
  2002-05-19 15:24       ` David Kuehling
  2002-05-19 15:28       ` Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-19 13:27 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On 19 May 2002, David Kuehling wrote:

> Today I made a copy of my inbox file, before using Emacs' pop3 mail
> retrieval.  The bug is now reproducable, when receiving that mail box
> via pop3 (2nd mail in the mailbox file).  But it won't occur, when I use
> C-u g savedmailbox <Ret>.  That's cool!

This probably means that the culprit is something related to process I/O, 
not file I/O.  Do you read your pop3-fetched mail by receiving output of 
some process, or does that process create a file which you then direct 
RMAIL to read?

> And I just discovered another mysterious bug.  The last mailinglist mail
> in the mailbox (from debian-japanese) contains a 
> 
> Content-Type: TEXT/PLAIN; charset=iso-8859-1
> 
> header, although its contents are obviously iso-2022-jp (7bit) encoded.
> When receiving the mailbox via pop3, the japanese characters in the mail
> show up correctly, and an
> 
> X-Coding-System: iso-2022-jp-unix
> 
> header is generated.  How can that happen?  The same mail, however, read
> via C-u g ... will be decoded as iso-8859-1 (as it should be).

Probably due to the same factor: somehow, when RMAIL reads your mail ``as 
usual'', it applies some other default.

I will dwell on this and see what can I come up with.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-19 13:27     ` Eli Zaretskii
@ 2002-05-19 15:24       ` David Kuehling
  2002-05-19 17:31         ` Eli Zaretskii
  2002-05-19 15:28       ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: David Kuehling @ 2002-05-19 15:24 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

>>>>> "Eli" == Eli Zaretskii <eliz@is.elta.co.il> writes:

> This probably means that the culprit is something related to process
> I/O, not file I/O.  Do you read your pop3-fetched mail by receiving
> output of some process, or does that process create a file which you
> then direct RMAIL to read?

I use the standard pop3 mail facility of emacs; using a "po:username"
mailbox in `rmail-primary-inbox-list' (and the environment variable
MAILHOST specifying the mail server's name).

I had a closer look on how emacs pop3 mail retrieval works.  It uses the
movemail program for copying the remote mailbox into a local temporary
file and then gets mail from that file as it would do with a local
inbox.  So I used the movemail program for manually moving the mail.
The resulting file, mailbox-popretrieved (attached), shows a few small
differences from the orignal mailbox.  There are some additional lines
and flags added to the mails.

That mail, read using C-u g mailbox-popretrieved <Ret>, now triggers the
bug, ie EUC-JP is not correctly decoded.  Is this due to the minor
differences?  Hopefully you can reproduce the bug (else it might already
be fixed with your version... I'm running 21.1.1).

David Kühling
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40


[-- Attachment #2: mailbox-popretrieved.gz --]
[-- Type: application/octet-stream, Size: 15852 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-19 13:27     ` Eli Zaretskii
  2002-05-19 15:24       ` David Kuehling
@ 2002-05-19 15:28       ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-19 15:28 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Eli Zaretskii <eliz@is.elta.co.il>
> Date: Sun, 19 May 2002 16:27:11 +0300 (IDT)
> 
> Probably due to the same factor: somehow, when RMAIL reads your mail ``as 
> usual'', it applies some other default.

Umm, it is probably important to know what exactly do you type to read
your mail through pop3 (not "C-u g", the other, normal, procedure).
Could you please describe that?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-19 15:24       ` David Kuehling
@ 2002-05-19 17:31         ` Eli Zaretskii
  2002-05-20 14:47           ` Richard Stallman
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-19 17:31 UTC (permalink / raw)
  Cc: emacs-devel, handa

[I'm taking this to emacs-devel, since I think we need to discuss the issue.]

> From: David Kuehling <dvdkhlng@gmx.de>
> Date: 19 May 2002 17:24:25 +0200
> 
> I had a closer look on how emacs pop3 mail retrieval works.  It uses the
> movemail program for copying the remote mailbox into a local temporary
> file and then gets mail from that file as it would do with a local
> inbox.  So I used the movemail program for manually moving the mail.
> The resulting file, mailbox-popretrieved (attached), shows a few small
> differences from the orignal mailbox.  There are some additional lines
> and flags added to the mails.
> 
> That mail, read using C-u g mailbox-popretrieved <Ret>, now triggers the
> bug, ie EUC-JP is not correctly decoded.  Is this due to the minor
> differences?  Hopefully you can reproduce the bug (else it might already
> be fixed with your version... I'm running 21.1.1).

Yes, I can reproduce this now, and the problem is indeed the format
used by movemail to write the mailbox file: it's the Babyl format.
(That format is not unique to pop3 protocol, though.)

It is actually very simple: when rmail-convert-to-babyl-format sees a
mailbox in Babyl format, it decodes the messages using
decode-coding-region.  Needless to say, decode-coding-region does not
know anything about the special importance of the charset= header, so
it simply summons its usual guesswork trying to detect the encoding of
each message.  The result is that, with some messages, you will get
your default coding system as the message encoding; I'm guessing that
Latin-1 is your default (my default is different, so I got a different
result).

By contrast, when rmail-convert-to-babyl-format sees an mbox format,
it honors the charset= headers, so the result is correct.

Handa-san, can you tell why does the Babyl branch of
rmail-convert-to-babyl-format use decode-coding-region without ever
looking at the charset= header?  It sounds like a bug, since movemail
uses that format.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
@ 2002-05-20 12:56 Kenichi Handa
  0 siblings, 0 replies; 14+ messages in thread
From: Kenichi Handa @ 2002-05-20 12:56 UTC (permalink / raw)
  Cc: dvdkhlng, emacs-devel

"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> By contrast, when rmail-convert-to-babyl-format sees an mbox format,
> it honors the charset= headers, so the result is correct.

> Handa-san, can you tell why does the Babyl branch of
> rmail-convert-to-babyl-format use decode-coding-region without ever
> looking at the charset= header?  It sounds like a bug, since movemail
> uses that format.

I don't remember.  :-(

Perhaps, I thought that Babyl format is produced only by
rmail, thus, that part should have been already encoded by
rmail-file-coding-sysstem or something.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-19 17:31         ` Eli Zaretskii
@ 2002-05-20 14:47           ` Richard Stallman
  2002-05-20 15:13             ` Paul Michael Reilly
  2002-05-20 16:02             ` Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Stallman @ 2002-05-20 14:47 UTC (permalink / raw)
  Cc: dvdkhlng, emacs-devel, handa

    Handa-san, can you tell why does the Babyl branch of
    rmail-convert-to-babyl-format use decode-coding-region without ever
    looking at the charset= header?  It sounds like a bug, since movemail
    uses that format.

Just the opposite--it is right for Rmail to read a Babyl file this way,
since they are supposed to be saved as a whole using a single coding system.
The bug is in movemail.  If it is to produce a Babyl file, it has to
handle the decoding.  That, however, is not practical.
It seems clear that movemail should not convert to Babyl format.

We are changing Rmail to get rid of Babyl format entirely; when that
change is installed, we will need movemail to output inbox format.  It
sounds like we should make this change now, so as to fix the current
bug.

Does anyone want to do it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-20 14:47           ` Richard Stallman
@ 2002-05-20 15:13             ` Paul Michael Reilly
  2002-05-20 16:03               ` Eli Zaretskii
  2002-05-20 16:02             ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Paul Michael Reilly @ 2002-05-20 15:13 UTC (permalink / raw)
  Cc: eliz, dvdkhlng, emacs-devel, handa

 > Just the opposite--it is right for Rmail to read a Babyl file this way,
 > since they are supposed to be saved as a whole using a single coding system.
 > The bug is in movemail.  If it is to produce a Babyl file, it has to
 > handle the decoding.  That, however, is not practical.
 > It seems clear that movemail should not convert to Babyl format.
 > 
 > We are changing Rmail to get rid of Babyl format entirely; when that
 > change is installed, we will need movemail to output inbox format.  It
 > sounds like we should make this change now, so as to fix the current
 > bug.

I don't think this is correct.  I'm using Rmail/mbox and I needed no
changes to movemail.

-pmr

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-20 14:47           ` Richard Stallman
  2002-05-20 15:13             ` Paul Michael Reilly
@ 2002-05-20 16:02             ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-20 16:02 UTC (permalink / raw)
  Cc: dvdkhlng, emacs-devel, handa


On Mon, 20 May 2002, Richard Stallman wrote:

> Just the opposite--it is right for Rmail to read a Babyl file this way,
> since they are supposed to be saved as a whole using a single coding system.

IIRC, Babyl files saved by Emacs are read by a different branch of code 
in rmail-convert-to-babyl.  They don't need to be decoded by 
decode-coding-region because they take the encoding from the 
X-Coding-System: header inserted by the initial conversion.

But I agree that changing movemail to save the mailbox in mbox format 
will solve the problem in a nicer manner.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-20 15:13             ` Paul Michael Reilly
@ 2002-05-20 16:03               ` Eli Zaretskii
  2002-05-20 17:44                 ` Paul Michael Reilly
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2002-05-20 16:03 UTC (permalink / raw)
  Cc: rms, dvdkhlng, emacs-devel, handa


On Mon, 20 May 2002, Paul Michael Reilly wrote:

> I'm using Rmail/mbox and I needed no changes to movemail.

You mean, you cannot reproduce the problem reported by David with the 
file he sent?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RMAIL doesn't always recognize rmail-mime-charset-pattern
  2002-05-20 16:03               ` Eli Zaretskii
@ 2002-05-20 17:44                 ` Paul Michael Reilly
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Michael Reilly @ 2002-05-20 17:44 UTC (permalink / raw)
  Cc: rms, dvdkhlng, emacs-devel, handa

 > You mean, you cannot reproduce the problem reported by David with the 
 > file he sent?

No, I never tried to reproduce that problem.

David, please resend your original message to me and I'll look
into it.

Thanks,

-pmr

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-05-20 17:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-18 13:18 RMAIL doesn't always recognize rmail-mime-charset-pattern David Kuehling
2002-05-18 16:26 ` Eli Zaretskii
2002-05-18 17:52   ` David Kuehling
2002-05-19 13:09   ` David Kuehling
2002-05-19 13:27     ` Eli Zaretskii
2002-05-19 15:24       ` David Kuehling
2002-05-19 17:31         ` Eli Zaretskii
2002-05-20 14:47           ` Richard Stallman
2002-05-20 15:13             ` Paul Michael Reilly
2002-05-20 16:03               ` Eli Zaretskii
2002-05-20 17:44                 ` Paul Michael Reilly
2002-05-20 16:02             ` Eli Zaretskii
2002-05-19 15:28       ` Eli Zaretskii
  -- strict thread matches above, loose matches on Subject: below --
2002-05-20 12:56 Kenichi Handa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.