Re: Rmail changes for Emacs 22

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Rmail changes for Emacs 22
       [not found] <rzqu1jseva6.fsf@albion.dl.ac.uk>
@ 2002-10-13  4:08 ` Richard Stallman
  2002-10-15 17:40   ` Dave Love
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-13  4:08 UTC (permalink / raw)
  Cc: emacs-devel

    The current Rmail won't work properly in Emacs 22 since no-conversion
    means something different, i.e. a different internal encoding.

Does Emacs 22 mean the Unicode Emacs?

My understanding is that no-conversion still means "use the internal
multibyte rep", but since the internal rep of most characters has
changed, the actual file contents will be different.  Are you saying
the same thing?

Is there any way to read a file into the Unicode Emacs
that was written using no-conversion in the current Emacs?

    Here's a suggestion.  I did it ages ago and don't remember whether I
    actually tested it.

Which branch is this change proposed for?  The Unicode Emacs branch?
The RC branch?  The current HEAD?

Please note that a major change in Rmail is being developed:
we are going to eliminate Babyl format and use Inbox format.
I expect this to be installed by the time the Unicode branch
is ready for release.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-13  4:08 ` Rmail changes for Emacs 22 Richard Stallman
@ 2002-10-15 17:40   ` Dave Love
  2002-10-16  4:38     ` Richard Stallman
  2002-10-16  4:38     ` Richard Stallman
  0 siblings, 2 replies; 35+ messages in thread
From: Dave Love @ 2002-10-15 17:40 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Does Emacs 22 mean the Unicode Emacs?

Yes.  Is that not a policy decision?

> My understanding is that no-conversion still means "use the internal
> multibyte rep", but since the internal rep of most characters has
> changed, the actual file contents will be different.  Are you saying
> the same thing?

Yes.

> Is there any way to read a file into the Unicode Emacs
> that was written using no-conversion in the current Emacs?

Yes.  That's what that change is for.  Do you mean it doesn't work?

> Which branch is this change proposed for?  The Unicode Emacs branch?

Yes.

> The RC branch?

No.

> The current HEAD?

I'm not sure.

> Please note that a major change in Rmail is being developed:
> we are going to eliminate Babyl format and use Inbox format.

I hope `eliminate' doesn't mean you won't be able to read it any more.
If you won't be able to write it, then you can eliminate the part of
the change which updates the Babyl version, but you may need to make
similar changes to whatever you end up with.  How will the mbox file
encoding be treated?

> I expect this to be installed by the time the Unicode branch
> is ready for release.

As for Emacs 21? :-/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-15 17:40   ` Dave Love
@ 2002-10-16  4:38     ` Richard Stallman
  2002-10-16  6:09       ` Eli Zaretskii
                         ` (2 more replies)
  2002-10-16  4:38     ` Richard Stallman
  1 sibling, 3 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-16  4:38 UTC (permalink / raw)
  Cc: emacs-devel

    > Is there any way to read a file into the Unicode Emacs
    > that was written using no-conversion in the current Emacs?

    Yes.  That's what that change is for.  Do you mean it doesn't work?

I'm glad to hear people are working on reading in old no-conversion
files.  I don't know whether it works; I have not tried it.  I am just
finding out the situation.

    > Please note that a major change in Rmail is being developed:
    > we are going to eliminate Babyl format and use Inbox format.

    I hope `eliminate' doesn't mean you won't be able to read it any more.

We will have a conversion program.

    If you won't be able to write it, then you can eliminate the part of
    the change which updates the Babyl version, but you may need to make
    similar changes to whatever you end up with.  How will the mbox file
    encoding be treated?

I don't know, and that is a good question.  I did not work on that
aspect of Rmail before, and I am not sure what to do about it now.
We might want to save these files normally in emacs-mule encoding,
or maybe we would want to decode each message individually when
displaying it.  pmr@pajato.com is the person doing it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-15 17:40   ` Dave Love
  2002-10-16  4:38     ` Richard Stallman
@ 2002-10-16  4:38     ` Richard Stallman
  2002-10-21 15:31       ` Dave Love
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-16  4:38 UTC (permalink / raw)
  Cc: emacs-devel

    > Does Emacs 22 mean the Unicode Emacs?

    Yes.  Is that not a policy decision?

It has not been decided yet.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  4:38     ` Richard Stallman
@ 2002-10-16  6:09       ` Eli Zaretskii
  2002-10-16  7:19         ` Kenichi Handa
                           ` (3 more replies)
  2002-10-16 12:21       ` Paul Michael Reilly
  2002-10-21 15:34       ` Dave Love
  2 siblings, 4 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-16  6:09 UTC (permalink / raw)
  Cc: emacs-devel, pmr

On Wed, 16 Oct 2002, Richard Stallman wrote:

>     How will the mbox file encoding be treated?
> 
> I don't know, and that is a good question.  I did not work on that
> aspect of Rmail before, and I am not sure what to do about it now.
> We might want to save these files normally in emacs-mule encoding,
> or maybe we would want to decode each message individually when
> displaying it.  pmr@pajato.com is the person doing it.

If this aspect wasn't discussed before, it's probably a good idea to 
discuss that now.

Personally, I think emacs-mule is not a good idea in this case, since 
mbox is not Emacs-private format, so some other software should be able 
to read it.  A good alternative would be to encode each message as what 
the charset= header says (and add/fix such a header if there is none, or 
if the one that's there lies).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  6:09       ` Eli Zaretskii
@ 2002-10-16  7:19         ` Kenichi Handa
  2002-10-19  4:25           ` Paul Michael Reilly
  2002-10-19  4:55           ` Richard Stallman
  2002-10-18 22:59         ` Richard Stallman
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 35+ messages in thread
From: Kenichi Handa @ 2002-10-16  7:19 UTC (permalink / raw)
  Cc: rms, emacs-devel, pmr

In article <Pine.SUN.3.91.1021016080428.20518C-100000@is>, Eli Zaretskii <eliz@is.elta.co.il> writes:
>>      How will the mbox file encoding be treated?
>>  
>>  I don't know, and that is a good question.  I did not work on that
>>  aspect of Rmail before, and I am not sure what to do about it now.
>>  We might want to save these files normally in emacs-mule encoding,
>>  or maybe we would want to decode each message individually when
>>  displaying it.  pmr@pajato.com is the person doing it.

> If this aspect wasn't discussed before, it's probably a good idea to 
> discuss that now.

> Personally, I think emacs-mule is not a good idea in this case, since 
> mbox is not Emacs-private format, so some other software should be able 
> to read it.  A good alternative would be to encode each message as what 
> the charset= header says (and add/fix such a header if there is none, or 
> if the one that's there lies).

I agree with that approach.  I think we can proceed the
modification of rmail in these steps.

(1) Divide the current code into BABYL format handler
    (babyl-backend) and rmail user-interface provider
    (rmail-frontend).  Babyl-backend reads a BABYL file
    without any code conversion in an unibyte buffer, and
    provides various functions (e.g. extract message
    headers, extract a specific message header, extract a
    message body, get new messages, etc).

(2) Make mbox-backend that provides the same facilities as
    babyl-backend.

(3) Make rmail-frontend to use babyl-backend or mbox-backend
    depending on users mail file.  Rmail-frontend displays a
    message in a different buffer (rmail-view-buffer) than
    the original mail file buffer.  Rmail-frontend utilizes
    MIME handler to decode message headers and body.

This way, we can easiy add more backends, for instance,
IMAP, per-message files (like MH or GNUS), etc.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  4:38     ` Richard Stallman
  2002-10-16  6:09       ` Eli Zaretskii
@ 2002-10-16 12:21       ` Paul Michael Reilly
  2002-10-19  4:56         ` Richard Stallman
  2002-10-21 15:34       ` Dave Love
  2 siblings, 1 reply; 35+ messages in thread
From: Paul Michael Reilly @ 2002-10-16 12:21 UTC (permalink / raw)
  Cc: d.love, emacs-devel

 >     > Is there any way to read a file into the Unicode Emacs
 >     > that was written using no-conversion in the current Emacs?
 > 
 >     Yes.  That's what that change is for.  Do you mean it doesn't work?
 > 
 > I'm glad to hear people are working on reading in old no-conversion
 > files.  I don't know whether it works; I have not tried it.  I am just
 > finding out the situation.
 > 
 >     > Please note that a major change in Rmail is being developed:
 >     > we are going to eliminate Babyl format and use Inbox format.
 > 
 >     I hope `eliminate' doesn't mean you won't be able to read it any more.
 > 
 > We will have a conversion program.
 > 
 >     If you won't be able to write it, then you can eliminate the part of
 >     the change which updates the Babyl version, but you may need to make
 >     similar changes to whatever you end up with.  How will the mbox file
 >     encoding be treated?
 > 
 > I don't know, and that is a good question.  I did not work on that
 > aspect of Rmail before, and I am not sure what to do about it now.
 > We might want to save these files normally in emacs-mule encoding,
 > or maybe we would want to decode each message individually when
 > displaying it.  pmr@pajato.com is the person doing it.

I'm inclined to decode each message when displaying it.

-pmr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  6:09       ` Eli Zaretskii
  2002-10-16  7:19         ` Kenichi Handa
@ 2002-10-18 22:59         ` Richard Stallman
  2002-10-20 19:40           ` Stefan Monnier
  2002-10-21 15:33         ` Dave Love
  2002-10-22  6:31         ` Kai Großjohann
  3 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-18 22:59 UTC (permalink / raw)
  Cc: emacs-devel, pmr

      A good alternative would be to encode each message as what 
    the charset= header says (and add/fix such a header if there is none, or 
    if the one that's there lies).

Paul, what do you think of this idea?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  7:19         ` Kenichi Handa
@ 2002-10-19  4:25           ` Paul Michael Reilly
  2002-10-19  4:55           ` Richard Stallman
  1 sibling, 0 replies; 35+ messages in thread
From: Paul Michael Reilly @ 2002-10-19  4:25 UTC (permalink / raw)
  Cc: handa

 > > Personally, I think emacs-mule is not a good idea in this case, since 
 > > mbox is not Emacs-private format, so some other software should be able 
 > > to read it.  A good alternative would be to encode each message as what 
 > > the charset= header says (and add/fix such a header if there is none, or 
 > > if the one that's there lies).
 > 
 > I agree with that approach.  I think we can proceed the
 > modification of rmail in these steps.
 > 
 > (1) Divide the current code into BABYL format handler
 >     (babyl-backend) and rmail user-interface provider
 >     (rmail-frontend).  Babyl-backend reads a BABYL file
 >     without any code conversion in an unibyte buffer, and
 >     provides various functions (e.g. extract message
 >     headers, extract a specific message header, extract a
 >     message body, get new messages, etc).
 > 
 > (2) Make mbox-backend that provides the same facilities as
 >     babyl-backend.
 > 
 > (3) Make rmail-frontend to use babyl-backend or mbox-backend
 >     depending on users mail file.  Rmail-frontend displays a
 >     message in a different buffer (rmail-view-buffer) than
 >     the original mail file buffer.  Rmail-frontend utilizes
 >     MIME handler to decode message headers and body.
 > 
 > This way, we can easiy add more backends, for instance,
 > IMAP, per-message files (like MH or GNUS), etc.

This is an excellent approach.  I'm kicking myself for not seeing it
when I added mbox support.  I will get to it as quickly as I can.

-pmr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  7:19         ` Kenichi Handa
  2002-10-19  4:25           ` Paul Michael Reilly
@ 2002-10-19  4:55           ` Richard Stallman
  2002-10-20  7:03             ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-19  4:55 UTC (permalink / raw)
  Cc: eliz, emacs-devel, pmr

    (1) Divide the current code into BABYL format handler
	(babyl-backend) and rmail user-interface provider
	(rmail-frontend).

This plan is not inherently flawed, but it would be a lot of work.

Our actual plan is both simpler and better: to get rid of Babyl format
entirely.  Paul has already rewritten Rmail to work on mbox format.
There is no need to keep Babyl format, and using mbox format will be
more convenient all around.

      Rmail-frontend displays a
	message in a different buffer (rmail-view-buffer) than
	the original mail file buffer.

That would be inconvenient for editing the message
and several other things.  I think that having
two separate buffers is something to avoid.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16 12:21       ` Paul Michael Reilly
@ 2002-10-19  4:56         ` Richard Stallman
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-19  4:56 UTC (permalink / raw)
  Cc: d.love, emacs-devel

    I'm inclined to decode each message when displaying it.

That is not as simple as it sounds.  When exactly would Rmail decode
a message, and when would it reencode a message?  Would the current
message always be in decoded form and the others in encoded form?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-19  4:55           ` Richard Stallman
@ 2002-10-20  7:03             ` Eli Zaretskii
  0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-20  7:03 UTC (permalink / raw)
  Cc: emacs-devel, pmr

On Sat, 19 Oct 2002, Richard Stallman wrote:

> There is no need to keep Babyl format

I suspect there are lots of people who keep their mail archives in Babyl 
format.  While I understand that conversion is possible (even today), it 
might be impractical and/or user-unfriendly to require that all of them 
be converted.  Especially since Emacs 21 keeps Babyl files in emacs-mule 
while earlier versions didn't; this might cause some breakage during 
conversion to mbox and the return of the ubiquitous \201 bug.

So I tend to agree with Handa-san on this issue.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-18 22:59         ` Richard Stallman
@ 2002-10-20 19:40           ` Stefan Monnier
  2002-10-22  3:12             ` Richard Stallman
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-20 19:40 UTC (permalink / raw)
  Cc: eliz, emacs-devel, pmr

>       A good alternative would be to encode each message as what 
>     the charset= header says (and add/fix such a header if there is none, or 
>     if the one that's there lies).
> 
> Paul, what do you think of this idea?

mbox format was chosen because it is standard, so whatever we
do, it is important that it works with other programs.
That probably means that the format should be "whatever was received".
I.e. the mail-reader should never encode anything (only the mail-sender
should do that).

	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  4:38     ` Richard Stallman
@ 2002-10-21 15:31       ` Dave Love
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Love @ 2002-10-21 15:31 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     > Does Emacs 22 mean the Unicode Emacs?
> 
>     Yes.  Is that not a policy decision?
> 
> It has not been decided yet.

I'm pretty sure that was said, or I wouldn't have taken it for
granted.  Perhaps you could make that decision now.  Code exists on
that basis, e.g. to distinguish byte-compiled files in old and new
internal encodings, and it would be a significant problem if there
wasn't a new major version number reflecting the change in
representation.

By the way, I realized that rmail should actually use emacs-mule (not
no-conversion or raw-text, whichever it was).  See NEWS:

  Therefore, Lisp programs that read files which contain the internal
  MULE encoding should use `emacs-mule-unix'.  `no-conversion' is only
  appropriate for reading truly binary files.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  6:09       ` Eli Zaretskii
  2002-10-16  7:19         ` Kenichi Handa
  2002-10-18 22:59         ` Richard Stallman
@ 2002-10-21 15:33         ` Dave Love
  2002-10-21 16:37           ` Kai Großjohann
  2002-10-22  6:31         ` Kai Großjohann
  3 siblings, 1 reply; 35+ messages in thread
From: Dave Love @ 2002-10-21 15:33 UTC (permalink / raw)
  Cc: Richard Stallman, emacs-devel, pmr

Eli Zaretskii <eliz@is.elta.co.il> writes:

> Personally, I think emacs-mule is not a good idea in this case, since 
> mbox is not Emacs-private format, so some other software should be able 
> to read it.

I don't see how that follows, but any file that has to represent the
full range of Emacs characters has to be stored in the internal
encoding.  I don't know what the rationale is for any of this, or why
rmail uses emacs-mule now.

> A good alternative would be to encode each message as what 
> the charset= header says (and add/fix such a header if there is none, or 
> if the one that's there lies).

I doubt you should do anything to them, especially as you have no
assurance any headers are correct.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  4:38     ` Richard Stallman
  2002-10-16  6:09       ` Eli Zaretskii
  2002-10-16 12:21       ` Paul Michael Reilly
@ 2002-10-21 15:34       ` Dave Love
  2002-10-22 16:36         ` Richard Stallman
  2 siblings, 1 reply; 35+ messages in thread
From: Dave Love @ 2002-10-21 15:34 UTC (permalink / raw)
  Cc: emacs-devel, pmr

Richard Stallman <rms@gnu.org> writes:

>     I hope `eliminate' doesn't mean you won't be able to read it any more.
> 
> We will have a conversion program.

Note that Gnus has a Babyl backend, which calls Rmail to do its stuff.

> We might want to save these files normally in emacs-mule encoding,

I don't know why, but if you do, you probably have no way to figure
out how to read them in future.

> or maybe we would want to decode each message individually when
> displaying it.

That's what I would expect.  I hope you'll consider what Gnus does,
though I doubt you can use its code directly (unfortunately).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-21 15:33         ` Dave Love
@ 2002-10-21 16:37           ` Kai Großjohann
  2002-10-21 20:50             ` Stefan Monnier
  0 siblings, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-21 16:37 UTC (permalink / raw)

Dave Love <d.love@dl.ac.uk> writes:

> Eli Zaretskii <eliz@is.elta.co.il> writes:
>
>> Personally, I think emacs-mule is not a good idea in this case, since 
>> mbox is not Emacs-private format, so some other software should be able 
>> to read it.
>
> I don't see how that follows, but any file that has to represent the
> full range of Emacs characters has to be stored in the internal
> encoding.  I don't know what the rationale is for any of this, or why
> rmail uses emacs-mule now.

Well, mbox files usually contain data that arrived via email.  So it
would be safe to just keep the data as it arrived, unmodified.
So most messages won't contain characters that only Emacs knows
about.  So there is a pretty good chance that an mbox file contains
only charsets that other programs also grok.

But what do other programs do?  Convert all incoming messages to
Unicode?  If they read from /var/mail, that might be difficult to
do.  Or do other programs just grok multiple charsets (encodings?) in
the same file?

It would, however, be slightly difficult to keep messages encoded in
ascii and utf-16 in the same file.  Hm.  But if one keeps
Content-Length headers, say, then one would know that one is looking
at the From_ line.  Therefore, one could tell whether those five
characters are encoded in something that looks like ascii or whether
it looks like utf-16.  That might be sufficient to find the
Content-type header to be really sure what the charset/encoding is.

>> A good alternative would be to encode each message as what 
>> the charset= header says (and add/fix such a header if there is none, or 
>> if the one that's there lies).
>
> I doubt you should do anything to them, especially as you have no
> assurance any headers are correct.

Maybe it would be useful to offer the user a command so that they can
say "this message is encoded in Big5" and the like.  Then RMAIL could
store this information in a header (in the Content-Type header?) and
subsequent views of the message would automatically use the "right"
charset/encoding.

Presumably, the user just tries a number of possible charsets and then
they can just look at the message to see whether their guess was
right.  And if they are like me who can't distinguish a GB2312
encoded Chinese text from a Big5 encoded one, then choosing the wrong
charset won't be much of a loss as they won't be able to read it
anyhow :-)

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-21 16:37           ` Kai Großjohann
@ 2002-10-21 20:50             ` Stefan Monnier
  2002-10-22  6:28               ` Kai Großjohann
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-21 20:50 UTC (permalink / raw)
  Cc: emacs-devel

> But what do other programs do?  Convert all incoming messages to
> Unicode?  If they read from /var/mail, that might be difficult to
> do.  Or do other programs just grok multiple charsets (encodings?) in
> the same file?
> 
> It would, however, be slightly difficult to keep messages encoded in
> ascii and utf-16 in the same file.  Hm.  But if one keeps
> Content-Length headers, say, then one would know that one is looking
> at the From_ line.  Therefore, one could tell whether those five
> characters are encoded in something that looks like ascii or whether
> it looks like utf-16.  That might be sufficient to find the
> Content-type header to be really sure what the charset/encoding is.

Much simpler: because the format is basically the format used during
transfer, you benefit from the work done on MIME and can reuse the
same tricks: the header, for example, is always written in more
or less pure ASCII (at least in theory) and any non-ASCII char has
to be encoded using the =?<charset>?<encoding>?<text>?= thingy.
This way you can unambiguously read the Content-Type and its
charset argument.


	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-20 19:40           ` Stefan Monnier
@ 2002-10-22  3:12             ` Richard Stallman
  2002-10-22  6:33               ` Kai Großjohann
  2002-10-22 14:32               ` Stefan Monnier
  0 siblings, 2 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-22  3:12 UTC (permalink / raw)
  Cc: eliz, emacs-devel, pmr

    That probably means that the format should be "whatever was received".
    I.e. the mail-reader should never encode anything (only the mail-sender
    should do that).

It is impossible to display the message text without decoding it from
whatever coding system it is encoded in.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-21 20:50             ` Stefan Monnier
@ 2002-10-22  6:28               ` Kai Großjohann
  0 siblings, 0 replies; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22  6:28 UTC (permalink / raw)


"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> Much simpler: because the format is basically the format used during
> transfer, you benefit from the work done on MIME and can reuse the
> same tricks: the header, for example, is always written in more
> or less pure ASCII (at least in theory) and any non-ASCII char has
> to be encoded using the =?<charset>?<encoding>?<text>?= thingy.
> This way you can unambiguously read the Content-Type and its
> charset argument.

Ah, of course.  Good :-)

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-16  6:09       ` Eli Zaretskii
                           ` (2 preceding siblings ...)
  2002-10-21 15:33         ` Dave Love
@ 2002-10-22  6:31         ` Kai Großjohann
  2002-10-22 18:40           ` Eli Zaretskii
  3 siblings, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22  6:31 UTC (permalink / raw)


Eli Zaretskii <eliz@is.elta.co.il> writes:

> Personally, I think emacs-mule is not a good idea in this case, since 
> mbox is not Emacs-private format, so some other software should be able 
> to read it.  A good alternative would be to encode each message as what 
> the charset= header says (and add/fix such a header if there is none, or 
> if the one that's there lies).

Maybe "encode" is a bit misleading in this case, as the bytes in the
message are not changed (modulo adding/fixing the Content-Type
header).

It's more the case that the message that's displayed is _de_coded for
viewing, right?

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22  3:12             ` Richard Stallman
@ 2002-10-22  6:33               ` Kai Großjohann
  2002-10-22 18:48                 ` Eli Zaretskii
  2002-10-22 14:32               ` Stefan Monnier
  1 sibling, 1 reply; 35+ messages in thread
From: Kai Großjohann @ 2002-10-22  6:33 UTC (permalink / raw)


Richard Stallman <rms@gnu.org> writes:

>     That probably means that the format should be "whatever was received".
>     I.e. the mail-reader should never encode anything (only the mail-sender
>     should do that).
>
> It is impossible to display the message text without decoding it from
> whatever coding system it is encoded in.

I think that's what Eli meant: an incoming message is already encoded
in some way, and Eli suggested to just leave it like that and to
decode on viewing.  (Only a Content-Type header might have to be added
or changed so that Rmail knows which encoding is used in the message.)

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22  3:12             ` Richard Stallman
  2002-10-22  6:33               ` Kai Großjohann
@ 2002-10-22 14:32               ` Stefan Monnier
  2002-10-23  7:12                 ` Richard Stallman
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-22 14:32 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, eliz, emacs-devel, pmr

>     That probably means that the format should be "whatever was received".
>     I.e. the mail-reader should never encode anything (only the mail-sender
>     should do that).
> 
> It is impossible to display the message text without decoding it from
> whatever coding system it is encoded in.

I said it should never *en*code.  Obviously, it will have to decode
somewhere on the way between the mbox file and the display.


	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-21 15:34       ` Dave Love
@ 2002-10-22 16:36         ` Richard Stallman
  2002-10-24 21:37           ` Dave Love
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Stallman @ 2002-10-22 16:36 UTC (permalink / raw)
  Cc: emacs-devel, pmr

      I hope you'll consider what Gnus does,
    though I doubt you can use its code directly (unfortunately).

We could consider it if someone explains what it is.
To figure it out by reading the code would probably
take more time than I can spend.  I don't know what PMR thinks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22  6:31         ` Kai Großjohann
@ 2002-10-22 18:40           ` Eli Zaretskii
  2002-10-23  5:24             ` Kai Großjohann
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-22 18:40 UTC (permalink / raw)
  Cc: emacs-devel

> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Date: Tue, 22 Oct 2002 08:31:38 +0200
> 
> Eli Zaretskii <eliz@is.elta.co.il> writes:
> 
> > Personally, I think emacs-mule is not a good idea in this case, since 
> > mbox is not Emacs-private format, so some other software should be able 
> > to read it.  A good alternative would be to encode each message as what 
> > the charset= header says (and add/fix such a header if there is none, or 
> > if the one that's there lies).
> 
> Maybe "encode" is a bit misleading in this case, as the bytes in the
> message are not changed (modulo adding/fixing the Content-Type
> header).
> 
> It's more the case that the message that's displayed is _de_coded for
> viewing, right?

Currently, RMAIL decodes the messages in-place, i.e. the encoded text
as received from the MTA is replaced in the RMAIL buffer with the
decoded text.  If this modus operandi is retained, you must encode the
text when you save the RMAIL buffer to a file.

(I've heard that there was an intent to decode the messages into
another buffer, but I don't know whether this is being worked on.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22  6:33               ` Kai Großjohann
@ 2002-10-22 18:48                 ` Eli Zaretskii
  0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-22 18:48 UTC (permalink / raw)
  Cc: emacs-devel

> From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
> Date: Tue, 22 Oct 2002 08:33:02 +0200
> 
> > It is impossible to display the message text without decoding it from
> > whatever coding system it is encoded in.
> 
> I think that's what Eli meant: an incoming message is already encoded
> in some way, and Eli suggested to just leave it like that and to
> decode on viewing.

Almost.  There are the cases where the existing charset= header lies.
A user can then do a "M-x rmail-redecode-body RET ENCODING RET", and
get the message decoded differently.  I think in these cases Emacs
should rewrite the charset= header according to the new encoding.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22 18:40           ` Eli Zaretskii
@ 2002-10-23  5:24             ` Kai Großjohann
  0 siblings, 0 replies; 35+ messages in thread
From: Kai Großjohann @ 2002-10-23  5:24 UTC (permalink / raw)


"Eli Zaretskii" <eliz@is.elta.co.il> writes:

> Currently, RMAIL decodes the messages in-place, i.e. the encoded text
> as received from the MTA is replaced in the RMAIL buffer with the
> decoded text.  If this modus operandi is retained, you must encode the
> text when you save the RMAIL buffer to a file.
>
> (I've heard that there was an intent to decode the messages into
> another buffer, but I don't know whether this is being worked on.)

Oh, right.  Maybe it is easier to use an extra buffer for display.

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22 14:32               ` Stefan Monnier
@ 2002-10-23  7:12                 ` Richard Stallman
  2002-10-23  8:13                   ` Kenichi Handa
  2002-10-23  9:57                   ` Paul Michael Reilly
  0 siblings, 2 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-23  7:12 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, eliz, emacs-devel, pmr

    I said it should never *en*code.  Obviously, it will have to decode
    somewhere on the way between the mbox file and the display.

The question at hand is when and how to do the decoding.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-23  7:12                 ` Richard Stallman
@ 2002-10-23  8:13                   ` Kenichi Handa
  2002-10-25  5:36                     ` Richard Stallman
  2002-10-23  9:57                   ` Paul Michael Reilly
  1 sibling, 1 reply; 35+ messages in thread
From: Kenichi Handa @ 2002-10-23  8:13 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, monnier+gnu/emacs, eliz, emacs-devel, pmr

In article <E184Fgb-0007jf-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     I said it should never *en*code.  Obviously, it will have to decode
>     somewhere on the way between the mbox file and the display.

> The question at hand is when and how to do the decoding.

I have not yet thought of it deeply, but it seems that we
have these options.

(1) A simple way:

Decode only when we need the contents of a message (e.g. for
displaying or searching).

This may be slow on searching all messages repeatedly.

Morioka-san's rmail-mime package is implemented by this way.
Actually, the current rmail code already contains necessary
code to implement it easily (see
rmail-XXX-mime-YYY-function).

(2) Another simple but memory consuming way:

Have a parallel decoded buffer that contains all messaages
decoded.

This may cause a memory shortage if RMAIL file is large,

(3) Not simple but efficient way:

Have a parallel decoded buffer but make it grow on demand.

(4) More efficient way:

Same as (3), but make rmail-backend not to keep the original
RMAIL file in a buffer.  It read the file once, scan it and
keep file positions of all messages, then kill the buffer.
Later, on request, rmail-backend reads portion of the RMAIL
file one by one.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-23  7:12                 ` Richard Stallman
  2002-10-23  8:13                   ` Kenichi Handa
@ 2002-10-23  9:57                   ` Paul Michael Reilly
  2002-10-23 16:58                     ` Eli Zaretskii
  1 sibling, 1 reply; 35+ messages in thread
From: Paul Michael Reilly @ 2002-10-23  9:57 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, eliz, emacs-devel

 > From rms@gnu.org  Wed Oct 23 03:12:21 2002
 > Reply-to: rms@gnu.org
 > X-BABYL-V6-ATTRIBUTES: -------
 > 
 >     I said it should never *en*code.  Obviously, it will have to decode
 >     somewhere on the way between the mbox file and the display.
 > 
 > The question at hand is when and how to do the decoding.

I'm not sure that it is totally obvious, but AFAICS there are TWO
distinct coding system issues.  First is the message based decoding
that everyone seems to recognize that is necessary to view messages.
Second is the coding system used for mail file/buffer.  They are
mostly orthogonal.

The mail buffer coding system will be dynamic.  It should mostly be
iso-latin1 according to mail rfcs but Users will tend to abuse the
specs so Rmail needs to be robust enough to handle that abuse.  How
exactly remains to be decided.  Editing of messages is discussed
below.

As messages in the mail file are viewed, the buffer coding system will
very likely change, at least in the narrowed region viewing the
message.

My gut feel is that the use of special view buffers (apart from the
mail file buffer) will be necessary in certain cases (yet to be
determined) as we integrate MIME and IMAP for first class (default)
treatment.  I strongly agree with Richard that separate view buffers
are to be avoided like the plague.  If memory serves, VM uses special
viewing buffers on a limited basis.

Editing of messages opens up a huge can of worms wrt coding system.
If anyone can state a sensible and effective policy for dealing with
coding system conflicts while editing messages, more power to 'em.
I'm listening.  It is easy to foresee Users changing message headers
and bodies in ways that would render a message unmailable and/or
unviewable in another mail agent but are nevertheless doable within
Emacs.

FWIW, I fully support the notion of a front-end / multiple back-end
design and have already started on it.  Any opinions on a good model
in the current code base?  I've looked at Gnus in the past and found
it very, very complex.  VC is straightforward and Richard has
mentioned compose-mail.

-pmr

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-23  9:57                   ` Paul Michael Reilly
@ 2002-10-23 16:58                     ` Eli Zaretskii
  2002-10-24  7:29                       ` Stefan Monnier
  0 siblings, 1 reply; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-23 16:58 UTC (permalink / raw)
  Cc: emacs-devel

> Date: Wed, 23 Oct 2002 05:57:07 -0400
> From: Paul Michael Reilly <pmr@pajato.com>
> 
> I'm not sure that it is totally obvious, but AFAICS there are TWO
> distinct coding system issues.  First is the message based decoding
> that everyone seems to recognize that is necessary to view messages.
> Second is the coding system used for mail file/buffer.  They are
> mostly orthogonal.

??? It is customary in Emacs that after decoding text we set the
buffer's file coding system to what was used to decode the text.
That's what RMAIL does today when it decodes and displays a message:
the (narrowed) buffer's buffer-file-coding-system is set to the
coding system used to decode the message.

So, unless I grossly misunderstand what you wanted to say, the two
issues you mentioned are not at all orthogonal, they are more like one
and the same.

> The mail buffer coding system will be dynamic.  It should mostly be
> iso-latin1 according to mail rfcs but Users will tend to abuse the
> specs so Rmail needs to be robust enough to handle that abuse.  How
> exactly remains to be decided.

Why not use what RMAIL does today: it looks at the charset= header,
and if that's absent, guesses using the user settings, the defaults,
and the encoding-detection routines (in that order)?

> Editing of messages opens up a huge can of worms wrt coding system.
> If anyone can state a sensible and effective policy for dealing with
> coding system conflicts while editing messages, more power to 'em.

Assuming normal usage, I don't see why we should deviate from the
normal policy used for saving buffers to disk files.  Emacs already
has machinery to deal with mixed charsets in a buffer, including
prompting the user for choosing the encoding if Emacs unable to
decide.

In general, as I've said elsewhere in this thread, I think Emacs
should encode each message in its original encoding (given by
charset=).  There are some exceptions to that rule (which I also
mentioned), but I'd suggest first to agree on the rule.

> It is easy to foresee Users changing message headers
> and bodies in ways that would render a message unmailable and/or
> unviewable in another mail agent but are nevertheless doable within
> Emacs.

Emacs gives you enough rope to hang yourself.  I won't worry too much
about those who do.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-23 16:58                     ` Eli Zaretskii
@ 2002-10-24  7:29                       ` Stefan Monnier
  2002-10-24 17:30                         ` Eli Zaretskii
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Monnier @ 2002-10-24  7:29 UTC (permalink / raw)
  Cc: pmr, emacs-devel

> In general, as I've said elsewhere in this thread, I think Emacs
> should encode each message in its original encoding (given by
> charset=).

I agree except I'd say "keep" instead of "encode": it should preserve
the `mbox' content byte-for-byte which is sometimes difficult to do when
you do decode+encode.

BTW MIME mail can contain several parts with different encodings in each
one of them, so it's a bit misleading to say talk about "the buffer's
coding system" unless each part is displayed in another buffer, which
is undesirable in the general case.

	Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-24  7:29                       ` Stefan Monnier
@ 2002-10-24 17:30                         ` Eli Zaretskii
  0 siblings, 0 replies; 35+ messages in thread
From: Eli Zaretskii @ 2002-10-24 17:30 UTC (permalink / raw)
  Cc: pmr, emacs-devel

> From: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
> Date: Thu, 24 Oct 2002 03:29:35 -0400
> 
> I agree except I'd say "keep" instead of "encode": it should preserve
> the `mbox' content byte-for-byte which is sometimes difficult to do when
> you do decode+encode.

If we want to preserve the mbox file verbatim, we will have to keep it
in memory unchanged and decode messages into another buffer.

> BTW MIME mail can contain several parts with different encodings in each
> one of them, so it's a bit misleading to say talk about "the buffer's
> coding system" unless each part is displayed in another buffer, which
> is undesirable in the general case.

I was talking about RMAIL which doesn't yet support MIME.  IIRC,
adding such support was the motivation for decoding into a different
buffer in the discussions I recall.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-22 16:36         ` Richard Stallman
@ 2002-10-24 21:37           ` Dave Love
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Love @ 2002-10-24 21:37 UTC (permalink / raw)
  Cc: emacs-devel, pmr

Richard Stallman <rms@gnu.org> writes:

>       I hope you'll consider what Gnus does,
>     though I doubt you can use its code directly (unfortunately).
> 
> We could consider it if someone explains what it is.
> To figure it out by reading the code would probably
> take more time than I can spend.  I don't know what PMR thinks.

I'd have to read the code too for details, and I'm also not sure how
Rmail works.  I assume larsi should explain the Gnus design.  However,
it's clear it maintains mbox-type files effectively as received --
articles aren't re-written except, perhaps for additional headers.  I
don't think it makes sense to do anything else if you're going to
provide MIME processing, as I presume Rmail users would hope.  I'd
guess it's useful to figure out how VM works too; I think that works
with mbox files too.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Rmail changes for Emacs 22
  2002-10-23  8:13                   ` Kenichi Handa
@ 2002-10-25  5:36                     ` Richard Stallman
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Stallman @ 2002-10-25  5:36 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, monnier+gnu/emacs, eliz, emacs-devel, pmr

    (1) A simple way:

    Decode only when we need the contents of a message (e.g. for
    displaying or searching).

    This may be slow on searching all messages repeatedly.

This could be a good point about searching.  But maybe computers
now are so fast that it is ok.

We should try to avoid all methods that use more than one buffer.

Another idea is to decode all the messages when the contents are
wanted the first time, and leave the decoded message in the buffer in
place of the original.  The original could be found in the file if
needed (if the user gives a command to "please decode again with a
different coding system").

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-10-25  5:36 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <rzqu1jseva6.fsf@albion.dl.ac.uk>
2002-10-13  4:08 ` Rmail changes for Emacs 22 Richard Stallman
2002-10-15 17:40   ` Dave Love
2002-10-16  4:38     ` Richard Stallman
2002-10-16  6:09       ` Eli Zaretskii
2002-10-16  7:19         ` Kenichi Handa
2002-10-19  4:25           ` Paul Michael Reilly
2002-10-19  4:55           ` Richard Stallman
2002-10-20  7:03             ` Eli Zaretskii
2002-10-18 22:59         ` Richard Stallman
2002-10-20 19:40           ` Stefan Monnier
2002-10-22  3:12             ` Richard Stallman
2002-10-22  6:33               ` Kai Großjohann
2002-10-22 18:48                 ` Eli Zaretskii
2002-10-22 14:32               ` Stefan Monnier
2002-10-23  7:12                 ` Richard Stallman
2002-10-23  8:13                   ` Kenichi Handa
2002-10-25  5:36                     ` Richard Stallman
2002-10-23  9:57                   ` Paul Michael Reilly
2002-10-23 16:58                     ` Eli Zaretskii
2002-10-24  7:29                       ` Stefan Monnier
2002-10-24 17:30                         ` Eli Zaretskii
2002-10-21 15:33         ` Dave Love
2002-10-21 16:37           ` Kai Großjohann
2002-10-21 20:50             ` Stefan Monnier
2002-10-22  6:28               ` Kai Großjohann
2002-10-22  6:31         ` Kai Großjohann
2002-10-22 18:40           ` Eli Zaretskii
2002-10-23  5:24             ` Kai Großjohann
2002-10-16 12:21       ` Paul Michael Reilly
2002-10-19  4:56         ` Richard Stallman
2002-10-21 15:34       ` Dave Love
2002-10-22 16:36         ` Richard Stallman
2002-10-24 21:37           ` Dave Love
2002-10-16  4:38     ` Richard Stallman
2002-10-21 15:31       ` Dave Love

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).