* Re: Rmail changes for Emacs 22
2002-10-16 6:09 ` Eli Zaretskii
@ 2002-10-16 7:19 ` Kenichi Handa
2002-10-19 4:25 ` Paul Michael Reilly
2002-10-19 4:55 ` Richard Stallman
2002-10-18 22:59 ` Richard Stallman
` (2 subsequent siblings)
3 siblings, 2 replies; 35+ messages in thread
From: Kenichi Handa @ 2002-10-16 7:19 UTC (permalink / raw)
Cc: rms, emacs-devel, pmr
In article <Pine.SUN.3.91.1021016080428.20518C-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes:
>> How will the mbox file encoding be treated?
>>
>> I don't know, and that is a good question. I did not work on that
>> aspect of Rmail before, and I am not sure what to do about it now.
>> We might want to save these files normally in emacs-mule encoding,
>> or maybe we would want to decode each message individually when
>> displaying it. pmr@pajato.com is the person doing it.
> If this aspect wasn't discussed before, it's probably a good idea to
> discuss that now.
> Personally, I think emacs-mule is not a good idea in this case, since
> mbox is not Emacs-private format, so some other software should be able
> to read it. A good alternative would be to encode each message as what
> the charset= header says (and add/fix such a header if there is none, or
> if the one that's there lies).
I agree with that approach. I think we can proceed the
modification of rmail in these steps.
(1) Divide the current code into BABYL format handler
(babyl-backend) and rmail user-interface provider
(rmail-frontend). Babyl-backend reads a BABYL file
without any code conversion in an unibyte buffer, and
provides various functions (e.g. extract message
headers, extract a specific message header, extract a
message body, get new messages, etc).
(2) Make mbox-backend that provides the same facilities as
babyl-backend.
(3) Make rmail-frontend to use babyl-backend or mbox-backend
depending on users mail file. Rmail-frontend displays a
message in a different buffer (rmail-view-buffer) than
the original mail file buffer. Rmail-frontend utilizes
MIME handler to decode message headers and body.
This way, we can easiy add more backends, for instance,
IMAP, per-message files (like MH or GNUS), etc.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-16 7:19 ` Kenichi Handa
@ 2002-10-19 4:25 ` Paul Michael Reilly
2002-10-19 4:55 ` Richard Stallman
1 sibling, 0 replies; 35+ messages in thread
From: Paul Michael Reilly @ 2002-10-19 4:25 UTC (permalink / raw)
Cc: handa
> > Personally, I think emacs-mule is not a good idea in this case, since
> > mbox is not Emacs-private format, so some other software should be able
> > to read it. A good alternative would be to encode each message as what
> > the charset= header says (and add/fix such a header if there is none, or
> > if the one that's there lies).
>
> I agree with that approach. I think we can proceed the
> modification of rmail in these steps.
>
> (1) Divide the current code into BABYL format handler
> (babyl-backend) and rmail user-interface provider
> (rmail-frontend). Babyl-backend reads a BABYL file
> without any code conversion in an unibyte buffer, and
> provides various functions (e.g. extract message
> headers, extract a specific message header, extract a
> message body, get new messages, etc).
>
> (2) Make mbox-backend that provides the same facilities as
> babyl-backend.
>
> (3) Make rmail-frontend to use babyl-backend or mbox-backend
> depending on users mail file. Rmail-frontend displays a
> message in a different buffer (rmail-view-buffer) than
> the original mail file buffer. Rmail-frontend utilizes
> MIME handler to decode message headers and body.
>
> This way, we can easiy add more backends, for instance,
> IMAP, per-message files (like MH or GNUS), etc.
This is an excellent approach. I'm kicking myself for not seeing it
when I added mbox support. I will get to it as quickly as I can.
-pmr
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-16 7:19 ` Kenichi Handa
2002-10-19 4:25 ` Paul Michael Reilly
@ 2002-10-19 4:55 ` Richard Stallman
2002-10-20 7:03 ` Eli Zaretskii
1 sibling, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-19 4:55 UTC (permalink / raw)
Cc: eliz, emacs-devel, pmr
(1) Divide the current code into BABYL format handler
(babyl-backend) and rmail user-interface provider
(rmail-frontend).
This plan is not inherently flawed, but it would be a lot of work.
Our actual plan is both simpler and better: to get rid of Babyl format
entirely. Paul has already rewritten Rmail to work on mbox format.
There is no need to keep Babyl format, and using mbox format will be
more convenient all around.
Rmail-frontend displays a
message in a different buffer (rmail-view-buffer) than
the original mail file buffer.
That would be inconvenient for editing the message
and several other things. I think that having
two separate buffers is something to avoid.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-19 4:55 ` Richard Stallman
@ 2002-10-20 7:03 ` Eli Zaretskii
0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-20 7:03 UTC (permalink / raw)
Cc: emacs-devel, pmr
On Sat, 19 Oct 2002, Richard Stallman wrote:
> There is no need to keep Babyl format
I suspect there are lots of people who keep their mail archives in Babyl
format. While I understand that conversion is possible (even today), it
might be impractical and/or user-unfriendly to require that all of them
be converted. Especially since Emacs 21 keeps Babyl files in emacs-mule
while earlier versions didn't; this might cause some breakage during
conversion to mbox and the return of the ubiquitous \201 bug.
So I tend to agree with Handa-san on this issue.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-16 6:09 ` Eli Zaretskii
2002-10-16 7:19 ` Kenichi Handa
@ 2002-10-18 22:59 ` Richard Stallman
2002-10-20 19:40 ` Stefan Monnier
2002-10-21 15:33 ` Dave Love
2002-10-22 6:31 ` Kai Großjohann
3 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-18 22:59 UTC (permalink / raw)
Cc: emacs-devel, pmr
A good alternative would be to encode each message as what
the charset= header says (and add/fix such a header if there is none, or
if the one that's there lies).
Paul, what do you think of this idea?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-18 22:59 ` Richard Stallman
@ 2002-10-20 19:40 ` Stefan Monnier
2002-10-22 3:12 ` Richard Stallman
0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-20 19:40 UTC (permalink / raw)
Cc: eliz, emacs-devel, pmr
> A good alternative would be to encode each message as what
> the charset= header says (and add/fix such a header if there is none, or
> if the one that's there lies).
>
> Paul, what do you think of this idea?
mbox format was chosen because it is standard, so whatever we
do, it is important that it works with other programs.
That probably means that the format should be "whatever was received".
I.e. the mail-reader should never encode anything (only the mail-sender
should do that).
Stefan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-20 19:40 ` Stefan Monnier
@ 2002-10-22 3:12 ` Richard Stallman
2002-10-22 6:33 ` Kai Großjohann
2002-10-22 14:32 ` Stefan Monnier
0 siblings, 2 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-22 3:12 UTC (permalink / raw)
Cc: eliz, emacs-devel, pmr
That probably means that the format should be "whatever was received".
I.e. the mail-reader should never encode anything (only the mail-sender
should do that).
It is impossible to display the message text without decoding it from
whatever coding system it is encoded in.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 3:12 ` Richard Stallman
@ 2002-10-22 6:33 ` Kai Großjohann
2002-10-22 18:48 ` Eli Zaretskii
2002-10-22 14:32 ` Stefan Monnier
1 sibling, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22 6:33 UTC (permalink / raw)
Richard Stallman <rms@gnu.org> writes:
> That probably means that the format should be "whatever was received".
> I.e. the mail-reader should never encode anything (only the mail-sender
> should do that).
>
> It is impossible to display the message text without decoding it from
> whatever coding system it is encoded in.
I think that's what Eli meant: an incoming message is already encoded
in some way, and Eli suggested to just leave it like that and to
decode on viewing. (Only a Content-Type header might have to be added
or changed so that Rmail knows which encoding is used in the message.)
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 6:33 ` Kai Großjohann
@ 2002-10-22 18:48 ` Eli Zaretskii
0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-22 18:48 UTC (permalink / raw)
Cc: emacs-devel
> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Date: Tue, 22 Oct 2002 08:33:02 +0200
>
> > It is impossible to display the message text without decoding it from
> > whatever coding system it is encoded in.
>
> I think that's what Eli meant: an incoming message is already encoded
> in some way, and Eli suggested to just leave it like that and to
> decode on viewing.
Almost. There are the cases where the existing charset= header lies.
A user can then do a "M-x rmail-redecode-body RET ENCODING RET", and
get the message decoded differently. I think in these cases Emacs
should rewrite the charset= header according to the new encoding.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 3:12 ` Richard Stallman
2002-10-22 6:33 ` Kai Großjohann
@ 2002-10-22 14:32 ` Stefan Monnier
2002-10-23 7:12 ` Richard Stallman
1 sibling, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-22 14:32 UTC (permalink / raw)
Cc: monnier+gnu/emacs, eliz, emacs-devel, pmr
> That probably means that the format should be "whatever was received".
> I.e. the mail-reader should never encode anything (only the mail-sender
> should do that).
>
> It is impossible to display the message text without decoding it from
> whatever coding system it is encoded in.
I said it should never *en*code. Obviously, it will have to decode
somewhere on the way between the mbox file and the display.
Stefan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 14:32 ` Stefan Monnier
@ 2002-10-23 7:12 ` Richard Stallman
2002-10-23 8:13 ` Kenichi Handa
2002-10-23 9:57 ` Paul Michael Reilly
0 siblings, 2 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-23 7:12 UTC (permalink / raw)
Cc: monnier+gnu/emacs, eliz, emacs-devel, pmr
I said it should never *en*code. Obviously, it will have to decode
somewhere on the way between the mbox file and the display.
The question at hand is when and how to do the decoding.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-23 7:12 ` Richard Stallman
@ 2002-10-23 8:13 ` Kenichi Handa
2002-10-25 5:36 ` Richard Stallman
2002-10-23 9:57 ` Paul Michael Reilly
1 sibling, 1 reply; 35+ messages in thread
From: Kenichi Handa @ 2002-10-23 8:13 UTC (permalink / raw)
Cc: monnier+gnu/emacs, monnier+gnu/emacs, eliz, emacs-devel, pmr
In article <E184Fgb-0007jf-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> I said it should never *en*code. Obviously, it will have to decode
> somewhere on the way between the mbox file and the display.
> The question at hand is when and how to do the decoding.
I have not yet thought of it deeply, but it seems that we
have these options.
(1) A simple way:
Decode only when we need the contents of a message (e.g. for
displaying or searching).
This may be slow on searching all messages repeatedly.
Morioka-san's rmail-mime package is implemented by this way.
Actually, the current rmail code already contains necessary
code to implement it easily (see
rmail-XXX-mime-YYY-function).
(2) Another simple but memory consuming way:
Have a parallel decoded buffer that contains all messaages
decoded.
This may cause a memory shortage if RMAIL file is large,
(3) Not simple but efficient way:
Have a parallel decoded buffer but make it grow on demand.
(4) More efficient way:
Same as (3), but make rmail-backend not to keep the original
RMAIL file in a buffer. It read the file once, scan it and
keep file positions of all messages, then kill the buffer.
Later, on request, rmail-backend reads portion of the RMAIL
file one by one.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-23 8:13 ` Kenichi Handa
@ 2002-10-25 5:36 ` Richard Stallman
0 siblings, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-25 5:36 UTC (permalink / raw)
Cc: monnier+gnu/emacs, monnier+gnu/emacs, eliz, emacs-devel, pmr
(1) A simple way:
Decode only when we need the contents of a message (e.g. for
displaying or searching).
This may be slow on searching all messages repeatedly.
This could be a good point about searching. But maybe computers
now are so fast that it is ok.
We should try to avoid all methods that use more than one buffer.
Another idea is to decode all the messages when the contents are
wanted the first time, and leave the decoded message in the buffer in
place of the original. The original could be found in the file if
needed (if the user gives a command to "please decode again with a
different coding system").
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-23 7:12 ` Richard Stallman
2002-10-23 8:13 ` Kenichi Handa
@ 2002-10-23 9:57 ` Paul Michael Reilly
2002-10-23 16:58 ` Eli Zaretskii
1 sibling, 1 reply; 35+ messages in thread
From: Paul Michael Reilly @ 2002-10-23 9:57 UTC (permalink / raw)
Cc: monnier+gnu/emacs, eliz, emacs-devel
> From rms@gnu.org Wed Oct 23 03:12:21 2002
> Reply-to: rms@gnu.org
> X-BABYL-V6-ATTRIBUTES: -------
>
> I said it should never *en*code. Obviously, it will have to decode
> somewhere on the way between the mbox file and the display.
>
> The question at hand is when and how to do the decoding.
I'm not sure that it is totally obvious, but AFAICS there are TWO
distinct coding system issues. First is the message based decoding
that everyone seems to recognize that is necessary to view messages.
Second is the coding system used for mail file/buffer. They are
mostly orthogonal.
The mail buffer coding system will be dynamic. It should mostly be
iso-latin1 according to mail rfcs but Users will tend to abuse the
specs so Rmail needs to be robust enough to handle that abuse. How
exactly remains to be decided. Editing of messages is discussed
below.
As messages in the mail file are viewed, the buffer coding system will
very likely change, at least in the narrowed region viewing the
message.
My gut feel is that the use of special view buffers (apart from the
mail file buffer) will be necessary in certain cases (yet to be
determined) as we integrate MIME and IMAP for first class (default)
treatment. I strongly agree with Richard that separate view buffers
are to be avoided like the plague. If memory serves, VM uses special
viewing buffers on a limited basis.
Editing of messages opens up a huge can of worms wrt coding system.
If anyone can state a sensible and effective policy for dealing with
coding system conflicts while editing messages, more power to 'em.
I'm listening. It is easy to foresee Users changing message headers
and bodies in ways that would render a message unmailable and/or
unviewable in another mail agent but are nevertheless doable within
Emacs.
FWIW, I fully support the notion of a front-end / multiple back-end
design and have already started on it. Any opinions on a good model
in the current code base? I've looked at Gnus in the past and found
it very, very complex. VC is straightforward and Richard has
mentioned compose-mail.
-pmr
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-23 9:57 ` Paul Michael Reilly
@ 2002-10-23 16:58 ` Eli Zaretskii
2002-10-24 7:29 ` Stefan Monnier
0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-23 16:58 UTC (permalink / raw)
Cc: emacs-devel
> Date: Wed, 23 Oct 2002 05:57:07 -0400
> From: Paul Michael Reilly <pmr@pajato.com>
>
> I'm not sure that it is totally obvious, but AFAICS there are TWO
> distinct coding system issues. First is the message based decoding
> that everyone seems to recognize that is necessary to view messages.
> Second is the coding system used for mail file/buffer. They are
> mostly orthogonal.
??? It is customary in Emacs that after decoding text we set the
buffer's file coding system to what was used to decode the text.
That's what RMAIL does today when it decodes and displays a message:
the (narrowed) buffer's buffer-file-coding-system is set to the
coding system used to decode the message.
So, unless I grossly misunderstand what you wanted to say, the two
issues you mentioned are not at all orthogonal, they are more like one
and the same.
> The mail buffer coding system will be dynamic. It should mostly be
> iso-latin1 according to mail rfcs but Users will tend to abuse the
> specs so Rmail needs to be robust enough to handle that abuse. How
> exactly remains to be decided.
Why not use what RMAIL does today: it looks at the charset= header,
and if that's absent, guesses using the user settings, the defaults,
and the encoding-detection routines (in that order)?
> Editing of messages opens up a huge can of worms wrt coding system.
> If anyone can state a sensible and effective policy for dealing with
> coding system conflicts while editing messages, more power to 'em.
Assuming normal usage, I don't see why we should deviate from the
normal policy used for saving buffers to disk files. Emacs already
has machinery to deal with mixed charsets in a buffer, including
prompting the user for choosing the encoding if Emacs unable to
decide.
In general, as I've said elsewhere in this thread, I think Emacs
should encode each message in its original encoding (given by
charset=). There are some exceptions to that rule (which I also
mentioned), but I'd suggest first to agree on the rule.
> It is easy to foresee Users changing message headers
> and bodies in ways that would render a message unmailable and/or
> unviewable in another mail agent but are nevertheless doable within
> Emacs.
Emacs gives you enough rope to hang yourself. I won't worry too much
about those who do.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-23 16:58 ` Eli Zaretskii
@ 2002-10-24 7:29 ` Stefan Monnier
2002-10-24 17:30 ` Eli Zaretskii
0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-24 7:29 UTC (permalink / raw)
Cc: pmr, emacs-devel
> In general, as I've said elsewhere in this thread, I think Emacs
> should encode each message in its original encoding (given by
> charset=).
I agree except I'd say "keep" instead of "encode": it should preserve
the `mbox' content byte-for-byte which is sometimes difficult to do when
you do decode+encode.
BTW MIME mail can contain several parts with different encodings in each
one of them, so it's a bit misleading to say talk about "the buffer's
coding system" unless each part is displayed in another buffer, which
is undesirable in the general case.
Stefan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-24 7:29 ` Stefan Monnier
@ 2002-10-24 17:30 ` Eli Zaretskii
0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-24 17:30 UTC (permalink / raw)
Cc: pmr, emacs-devel
> From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
> Date: Thu, 24 Oct 2002 03:29:35 -0400
>
> I agree except I'd say "keep" instead of "encode": it should preserve
> the `mbox' content byte-for-byte which is sometimes difficult to do when
> you do decode+encode.
If we want to preserve the mbox file verbatim, we will have to keep it
in memory unchanged and decode messages into another buffer.
> BTW MIME mail can contain several parts with different encodings in each
> one of them, so it's a bit misleading to say talk about "the buffer's
> coding system" unless each part is displayed in another buffer, which
> is undesirable in the general case.
I was talking about RMAIL which doesn't yet support MIME. IIRC,
adding such support was the motivation for decoding into a different
buffer in the discussions I recall.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-16 6:09 ` Eli Zaretskii
2002-10-16 7:19 ` Kenichi Handa
2002-10-18 22:59 ` Richard Stallman
@ 2002-10-21 15:33 ` Dave Love
2002-10-21 16:37 ` Kai Großjohann
2002-10-22 6:31 ` Kai Großjohann
3 siblings, 1 reply; 35+ messages in thread
From: Dave Love @ 2002-10-21 15:33 UTC (permalink / raw)
Cc: Richard Stallman, emacs-devel, pmr
Eli Zaretskii <eliz@is.elta.co.il> writes:
> Personally, I think emacs-mule is not a good idea in this case, since
> mbox is not Emacs-private format, so some other software should be able
> to read it.
I don't see how that follows, but any file that has to represent the
full range of Emacs characters has to be stored in the internal
encoding. I don't know what the rationale is for any of this, or why
rmail uses emacs-mule now.
> A good alternative would be to encode each message as what
> the charset= header says (and add/fix such a header if there is none, or
> if the one that's there lies).
I doubt you should do anything to them, especially as you have no
assurance any headers are correct.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-21 15:33 ` Dave Love
@ 2002-10-21 16:37 ` Kai Großjohann
2002-10-21 20:50 ` Stefan Monnier
0 siblings, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-21 16:37 UTC (permalink / raw)
Dave Love <d.love@dl.ac.uk> writes:
> Eli Zaretskii <eliz@is.elta.co.il> writes:
>
>> Personally, I think emacs-mule is not a good idea in this case, since
>> mbox is not Emacs-private format, so some other software should be able
>> to read it.
>
> I don't see how that follows, but any file that has to represent the
> full range of Emacs characters has to be stored in the internal
> encoding. I don't know what the rationale is for any of this, or why
> rmail uses emacs-mule now.
Well, mbox files usually contain data that arrived via email. So it
would be safe to just keep the data as it arrived, unmodified.
So most messages won't contain characters that only Emacs knows
about. So there is a pretty good chance that an mbox file contains
only charsets that other programs also grok.
But what do other programs do? Convert all incoming messages to
Unicode? If they read from /var/mail, that might be difficult to
do. Or do other programs just grok multiple charsets (encodings?) in
the same file?
It would, however, be slightly difficult to keep messages encoded in
ascii and utf-16 in the same file. Hm. But if one keeps
Content-Length headers, say, then one would know that one is looking
at the From_ line. Therefore, one could tell whether those five
characters are encoded in something that looks like ascii or whether
it looks like utf-16. That might be sufficient to find the
Content-type header to be really sure what the charset/encoding is.
>> A good alternative would be to encode each message as what
>> the charset= header says (and add/fix such a header if there is none, or
>> if the one that's there lies).
>
> I doubt you should do anything to them, especially as you have no
> assurance any headers are correct.
Maybe it would be useful to offer the user a command so that they can
say "this message is encoded in Big5" and the like. Then RMAIL could
store this information in a header (in the Content-Type header?) and
subsequent views of the message would automatically use the "right"
charset/encoding.
Presumably, the user just tries a number of possible charsets and then
they can just look at the message to see whether their guess was
right. And if they are like me who can't distinguish a GB2312
encoded Chinese text from a Big5 encoded one, then choosing the wrong
charset won't be much of a loss as they won't be able to read it
anyhow :-)
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-21 16:37 ` Kai Großjohann
@ 2002-10-21 20:50 ` Stefan Monnier
2002-10-22 6:28 ` Kai Großjohann
0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-21 20:50 UTC (permalink / raw)
Cc: emacs-devel
> But what do other programs do? Convert all incoming messages to
> Unicode? If they read from /var/mail, that might be difficult to
> do. Or do other programs just grok multiple charsets (encodings?) in
> the same file?
>
> It would, however, be slightly difficult to keep messages encoded in
> ascii and utf-16 in the same file. Hm. But if one keeps
> Content-Length headers, say, then one would know that one is looking
> at the From_ line. Therefore, one could tell whether those five
> characters are encoded in something that looks like ascii or whether
> it looks like utf-16. That might be sufficient to find the
> Content-type header to be really sure what the charset/encoding is.
Much simpler: because the format is basically the format used during
transfer, you benefit from the work done on MIME and can reuse the
same tricks: the header, for example, is always written in more
or less pure ASCII (at least in theory) and any non-ASCII char has
to be encoded using the =?<charset>?<encoding>?<text>?= thingy.
This way you can unambiguously read the Content-Type and its
charset argument.
Stefan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-21 20:50 ` Stefan Monnier
@ 2002-10-22 6:28 ` Kai Großjohann
0 siblings, 0 replies; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22 6:28 UTC (permalink / raw)
"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> Much simpler: because the format is basically the format used during
> transfer, you benefit from the work done on MIME and can reuse the
> same tricks: the header, for example, is always written in more
> or less pure ASCII (at least in theory) and any non-ASCII char has
> to be encoded using the =?<charset>?<encoding>?<text>?= thingy.
> This way you can unambiguously read the Content-Type and its
> charset argument.
Ah, of course. Good :-)
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-16 6:09 ` Eli Zaretskii
` (2 preceding siblings ...)
2002-10-21 15:33 ` Dave Love
@ 2002-10-22 6:31 ` Kai Großjohann
2002-10-22 18:40 ` Eli Zaretskii
3 siblings, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22 6:31 UTC (permalink / raw)
Eli Zaretskii <eliz@is.elta.co.il> writes:
> Personally, I think emacs-mule is not a good idea in this case, since
> mbox is not Emacs-private format, so some other software should be able
> to read it. A good alternative would be to encode each message as what
> the charset= header says (and add/fix such a header if there is none, or
> if the one that's there lies).
Maybe "encode" is a bit misleading in this case, as the bytes in the
message are not changed (modulo adding/fixing the Content-Type
header).
It's more the case that the message that's displayed is _de_coded for
viewing, right?
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 6:31 ` Kai Großjohann
@ 2002-10-22 18:40 ` Eli Zaretskii
2002-10-23 5:24 ` Kai Großjohann
0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-22 18:40 UTC (permalink / raw)
Cc: emacs-devel
> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Date: Tue, 22 Oct 2002 08:31:38 +0200
>
> Eli Zaretskii <eliz@is.elta.co.il> writes:
>
> > Personally, I think emacs-mule is not a good idea in this case, since
> > mbox is not Emacs-private format, so some other software should be able
> > to read it. A good alternative would be to encode each message as what
> > the charset= header says (and add/fix such a header if there is none, or
> > if the one that's there lies).
>
> Maybe "encode" is a bit misleading in this case, as the bytes in the
> message are not changed (modulo adding/fixing the Content-Type
> header).
>
> It's more the case that the message that's displayed is _de_coded for
> viewing, right?
Currently, RMAIL decodes the messages in-place, i.e. the encoded text
as received from the MTA is replaced in the RMAIL buffer with the
decoded text. If this modus operandi is retained, you must encode the
text when you save the RMAIL buffer to a file.
(I've heard that there was an intent to decode the messages into
another buffer, but I don't know whether this is being worked on.)
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Rmail changes for Emacs 22
2002-10-22 18:40 ` Eli Zaretskii
@ 2002-10-23 5:24 ` Kai Großjohann
0 siblings, 0 replies; 35+ messages in thread
From: Kai Großjohann @ 2002-10-23 5:24 UTC (permalink / raw)
"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> Currently, RMAIL decodes the messages in-place, i.e. the encoded text
> as received from the MTA is replaced in the RMAIL buffer with the
> decoded text. If this modus operandi is retained, you must encode the
> text when you save the RMAIL buffer to a file.
>
> (I've heard that there was an intent to decode the messages into
> another buffer, but I don't know whether this is being worked on.)
Oh, right. Maybe it is easier to use an extra buffer for display.
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 35+ messages in thread