Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found] <i56wugl13qq.fsf@mao.acc.umu.se>
@ 2003-05-21  2:43 ` Kenichi Handa
  2003-05-22  8:33   ` Richard Stallman
       [not found] ` <i563cj8kz7e.fsf@mao.acc.umu.se>
  1 sibling, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2003-05-21  2:43 UTC (permalink / raw)
  Cc: emacs-devel

In article <i56wugl13qq.fsf@mao.acc.umu.se>, stktrc <stktrc@yahoo.com> writes:
> Is it possible to have different (buffer-file-)coding-systems for
> different regions of one buffer?

As buffer-file-coding-system is a coding system to use on
writing a buffer to a file, I think it's nonsense to have
different values in different regions.

> Why?  Trying to find a way of making Rmail handle MIME messages nicely
> (without modifying the underlying file or corresponding buffer by
> replacing the encoded data with decoded data, in place).

> The idea would be to let the region of each body part have a
> coding-system that makes the encoded text readable in Emacs.

I don't understand how buffer-file-coding-system is related
to that.  And, I think it's impossible to handle MIME
messages nicely without replacing the encoded data with
decoded data in some buffer.  Just displaying it may be
possible, but search/cut&paste etc. are impossible.

So, we have to decode the RMAIL buffer itself, or create a
separate view buffer that contains decoded text (as done by
Gnus).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-21  2:43 ` Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME) Kenichi Handa
@ 2003-05-22  8:33   ` Richard Stallman
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Stallman @ 2003-05-22  8:33 UTC (permalink / raw)
  Cc: emacs-devel

    > Is it possible to have different (buffer-file-)coding-systems for
    > different regions of one buffer?

    As buffer-file-coding-system is a coding system to use on
    writing a buffer to a file, I think it's nonsense to have
    different values in different regions.

He's not asking what's implemented today, he is asking what might
potentially be implemented.  As an idea for future features,
the idea isn't nonsense.

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <i563cj8kz7e.fsf@mao.acc.umu.se>]

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found] ` <i563cj8kz7e.fsf@mao.acc.umu.se>
@ 2003-05-21 19:53   ` Stefan Monnier
       [not found]     ` <i56smr8j8lk.fsf@mao.acc.umu.se>
  2003-05-22 13:16     ` Kai Großjohann
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Monnier @ 2003-05-21 19:53 UTC (permalink / raw)
  Cc: emacs-devel

> Would it be possible to have different coding-systems (the decoding of
> octets from a file into characters in a buffer) for different ranges
> of octets in a file?

Of course, it's possible: coding systems are operations, not data.
Emacs offers straightforward ways to apply those operations to whole
files when reading and saving them, as well as straightforward
ways to apply those operations to parts of a buffer.

> For example: in a file of 2000 octets, octet 1-1000 would be decoded
> using ISO-8859-1, octet 1001-1500 with UTF-8, 1501-2000 with
> (currently non-existent?)  Quoted-Printable and so on.  This would in
> my opinion allow a pretty way of handling MIME messages.

quoted-printable is not a coding-system.  As for the rest, I don't
see what's preventing you from doing it.  After all Gnus does is.
I.e. load the raw undecoded file, parse its content to figure out
where parts begin and end and what coding-system to use for them
(and maybe also un-base64 or un-qp them) and then apply
decode-coding-region.
Upon saving, just do the opposite.

> Though I can't come up with any other uses except for the proposed
> Rmail usage for different coding-systems for different regions, I
> don't see how it is nonsense.  It is like opening a file constructed
> by concatenating several files with different character encodings (and
> knowledge of what part of the file uses what encoding qould be
> extracted from the MIME data).  Do you see what I'm trying to
> accomplish?

I still don't understand what you want that's not already present.

> Unless I have overlooked something, I *do* think it would be possible
> to handle MIME messages nicely without replacing the encoded data, if
> the facilities for decoding different parts of a file (which is done
> with a coding-system, right?) with different character encodings
> exist.

I don't understand what you mean by "replacing".

	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <i56smr8j8lk.fsf@mao.acc.umu.se>]

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found]     ` <i56smr8j8lk.fsf@mao.acc.umu.se>
@ 2003-05-21 22:29       ` Stefan Monnier
       [not found]         ` <i56u1bnn567.fsf@mao.acc.umu.se>
  2003-05-22  1:32       ` Kenichi Handa
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2003-05-21 22:29 UTC (permalink / raw)
  Cc: emacs-devel

> When Emacs loads a file (that is supposed to be encoded in for example
> ISO 2022, a 7 bit based encoding), does it first load the file
> contents into a buffer as ASCII text and then applies
> decode-coding-region to the entire buffer (conceptually)?

It's optimized, but conceptually, yes, that's what it does.

> Using decode-coding-region modifies the buffer contents because the
> actual characters present in the buffer change (the two characters AA
> might become the character Å or whatever, and hence the buffer
> contents has been modified).

(let ((mod (buffer-modified-p)))
  (decode-coding-region start end)
  (restore-buffer-modified-p mod))

Why does it matter if the buffer is "modified" or not ?
I still don't get it.  Note, I'm not trying to turn you away,
but I really don't understand.

> 1. I like the simple model of Rmail seeing a message as a narrowed
>    down part of the actual mail folder file (that is, no separate
>    display buffer).

What do you like about it, other than the concept ?

> 2. It would be desirable that Rmail didn't modify the underlying
>    message (= save decoded message), at least save it to disk.

I have trouble parsing this sentence.

Could you describe very concretely what you want to do ?
Do you want to stores emails in fully decoded form in the Rmail file ?
If so, why wouldn't "fully decoded" use the emacs-mule coding-system ?

> Number 2 leads, due to the strong association between the message and
> the mail folder file, that it also is undesirable to modify the
> buffer, which is what decode-coding-region does.  Instead, why not
> decode the parts of the file that are encoded in a different manner
> "correctly" the first time?

Why does it matter whether it's the first time or the second ?
In either case, it's different and needs to be re-encoded when saving.

> Gnus doesn't have Rmails strong association of the message actually
> being a narrowed down region of the mail folder file.  It is fine for
> Gnus to create a separate buffer wholly intended for working with the
> message.  That is why Gnus can load the encoded data into the buffer,
> do decode-coding-region, insert buttons and play around with the
> buffer contents--the buffer will be thrown away later.

Yup, it's a very clean way to do things, indeed.  You can decode all
you want, throw stuff away, rewrite it, etc to your heart's content
while still being 100% sure that you didn't mess up anything.
Also it can be significantly more efficient since working on a 10KB
display buffer is more efficient than working on a 100MB Rmail
folder file.  Sounds like a winner to me ;-)
Which part do you not like ?
Admittedly, when you don't have much to do to make the message "viewable",
copying the message to a separate buffer is a waste and is less efficient
than just narrow-and-go.  Is that what bothers you ?

> >> Though I can't come up with any other uses except for the proposed
> >> Rmail usage for different coding-systems for different regions, I
> >> don't see how it is nonsense.  It is like opening a file constructed
> >> by concatenating several files with different character encodings (and
> >> knowledge of what part of the file uses what encoding qould be
> >> extracted from the MIME data).  Do you see what I'm trying to
> >> accomplish?
> >
> > I still don't understand what you want that's not already present.
> 
> I don't want to first load encoded data into the buffer, then decode
> it (thereby modifying the contents).  I want the decoding to happen
> before the characters hit the Emacs buffer.  And I want to use
> different decoding for different parts of the underlying raw bytes.

That describes *how* you want to reach your goal, but it doesn't
describe your goal.

	Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <i56u1bnn567.fsf@mao.acc.umu.se>]

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found]         ` <i56u1bnn567.fsf@mao.acc.umu.se>
@ 2003-05-22  3:41           ` Stephen J. Turnbull
       [not found]             ` <i56ptmateuj.fsf@mao.acc.umu.se>
  2003-05-23 12:03             ` Richard Stallman
  0 siblings, 2 replies; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-22  3:41 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "stktrc" == stktrc  <stktrc@yahoo.com> writes:

    stktrc> Because modifying the buffer means modifying the
    stktrc> underlying file (because Rmail writes the buffer back to
    stktrc> the file on exit for various reasons, message flags are
    stktrc> one I suppose).

This is why I originally switched to VM.  Rmail was unable to read,
and sometimes trashing[1], my mail files, which contained mixed
Japanese/ASCII/European encodings, and all of Unix/DOS/Macintosh
newline conventions.  We're not in Kansas anymore, Toto, and the Rmail
"narrow-to-message" model simply increases complexity dramatically
because of the underlying Mule model of coding-per-file.  (Which is
hard to see how you can avoid it.)

    stktrc> I'm not totally opposed to that approach.  It is an
    stktrc> alternative, but as stated before, I like the one folder -
    stktrc> one buffer concept for it's simplicity.

Fine.  This will go gangbusters if you set your spamassassin to throw
away all mail with "Content-Type" headers.  Now the world is simple
enough to fit into that concept well.

Otherwise, there are at least two radically different views of many
files, and there must be a buffer (in the generic sense of a separate
region of memory) for presentation, and one for the much more
restricted changes you wish propagated back to the file (setting
flags).  I see no good reason why the region of memory used for
presentation shouldn't "waste" a few score bytes and be promoted to an
Emacs buffer.

    stktrc> I thought there was a layer between the file system and
    stktrc> the Emacs buffer that decoded the bytes from the file into
    stktrc> characters that were inserted into the buffer (and the
    stktrc> other way when writing).  That is, the buffer would never
    stktrc> see the encoded data, it would just receive already
    stktrc> decoded characters in an Emacs internal representation.
    stktrc> If it had been like this,

It is like that in practice, most of the time.  But it's only
practical for simple cases, eg, where the whole file is encoded in one
encoding, or the whole file conforms to a fixed standard such as ISO
2022.  Multimedia (in the MIME sense) files can't work that way.  The
meta-information used to create the presentation is often a hint, or a
user option.

    stktrc> with the addition of the possibility of different
    stktrc> decodings for different parts, it could have been used for
    stktrc> the purposes described earlier (to display MIME messages
    stktrc> as if they had been decoded inline *without modifying* the
    stktrc> buffer).

But then how do you save the buffer (eg, if you have set flags)?  It
differs from the file, and the decoding process is not an isomorphism
(multimedia).

Footnotes: 
[1]  Many years ago.  And the trashed cases required a very special
configuration of very non-RFC-conformant messages.  Unfortunately,
these were quite common in Japan ca. 1994.  The Internet can be a very
hostile place!

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <i56ptmateuj.fsf@mao.acc.umu.se>]

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found]             ` <i56ptmateuj.fsf@mao.acc.umu.se>
@ 2003-05-23 10:56               ` Stephen J. Turnbull
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-23 10:56 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "stktrc" == stktrc  <stktrc@yahoo.com> writes:

    stktrc> "Stephen J. Turnbull" <stephen@xemacs.org> writes:

    >> But then how do you save the buffer (eg, if you have set
    >> flags)?  It differs from the file, and the decoding process is
    >> not an isomorphism (multimedia).

    stktrc> The various multimedia would never be decoded inside the
    stktrc> buffer (there would be a minor mode that extracts parts to
    stktrc> a pipe or file I imagine).

The "multimedia" I'm referring to here are MIME types, subtypes, and
parameters.  Eg, "text/plain; charset=US-ASCII".

    stktrc> This means the buffer would not contain any modifications,
    stktrc> except for the flag modifications, and writing the buffer
    stktrc> back to disk is hence ok.

You think so?  I really wish it were that easy.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-22  3:41           ` Stephen J. Turnbull
       [not found]             ` <i56ptmateuj.fsf@mao.acc.umu.se>
@ 2003-05-23 12:03             ` Richard Stallman
  2003-05-23 15:03               ` Stephen J. Turnbull
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Stallman @ 2003-05-23 12:03 UTC (permalink / raw)
  Cc: emacs-devel

    Otherwise, there are at least two radically different views of many
    files, and there must be a buffer (in the generic sense of a separate
    region of memory) for presentation, and one for the much more
    restricted changes you wish propagated back to the file (setting
    flags).  I see no good reason why the region of memory used for
    presentation shouldn't "waste" a few score bytes and be promoted to an
    Emacs buffer.

It pretty much has to be an Emacs buffer, or part of one.  There is no
other natural or easy way to implement it in the context of Emacs.
The question would be, is it a separate buffer, or a part of another
buffer, or what?

In Rmail currently it is possible to type e and edit a message.
Right now we do this through editing the buffer of the RMAIL file.
With better MIME support, this may have to be implemented differently,
but I hope we can keep it working somehow.

If we copy the message into another buffer for viewing, that tends to
lead to complications of the situation, because there are multiple
buffers instead of just one.  We could try adding features to hide
that, or we could expose it and not hide anything.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-23 12:03             ` Richard Stallman
@ 2003-05-23 15:03               ` Stephen J. Turnbull
  2003-05-24 23:19                 ` Richard Stallman
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-23 15:03 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    rms> It pretty much has to be an Emacs buffer, or part of one.
    rms> There is no other natural or easy way to implement it in the
    rms> context of Emacs.  The question would be, is it a separate
    rms> buffer, or a part of another buffer, or what?

If you want to preserve the original contents of the buffer, you must
copy them somewhere, because many of the transformations performed to
make text displayable are not invertible.  For example, it is
perfectly legal in ISO 2022 coding systems to have two charset
designations with no intervening text.  The first one will get lost.
Messages from older MUAs may contain the so-called abbreviated escape
sequences for Japanese and Chinese; a modern one would write them out
in the longer form.  It is almost always possible to unify ISO 8859/1
and ISO 8859/15 to ISO 8859/15, yet with cut and paste under current
situation, Emacs will produce a buffer with multiple charsets.
However, Emacs may or may not attempt to unify those charsets on
write, depending, I believe, on user options.

Quoted-printable, division of MIME encoded-words, and so on all
present similar issues.

Of course all of these could be handled by setting a text property
saying "in the original this was ISO 8859/1 but it has been unified to
ISO 8859/2" or putting an invisible property on a redundant escape
sequence and leaving it in the buffer, but that's ugly and
fault-prone.

    rms> In Rmail currently it is possible to type e and edit a
    rms> message.  Right now we do this through editing the buffer of
    rms> the RMAIL file.  With better MIME support, this may have to
    rms> be implemented differently, but I hope we can keep it working
    rms> somehow.

I think this will require a lot of work if you wish to preserve file
text verbatim unless explicitly edited (and this is essential for
signed messages, for example).

    rms> If we copy the message into another buffer for viewing, that
    rms> tends to lead to complications of the situation, because
    rms> there are multiple buffers instead of just one.  We could try
    rms> adding features to hide that, or we could expose it and not
    rms> hide anything.

I don't see how it gets complicated.  You put a couple of markers in
the original buffer, copy the region to the presentation buffer, and
transform it.  If you don't edit it, (erase-buffer) and go on to the
next message.  If you want to edit, you edit the presentation buffer,
in exactly the same way that currently you would edit the Rmail
buffer.  Once you've changed the presentation buffer, I see no reason
not to unify charsets, remove redundant escape sequences, and so on.
Once you're done, you simply replace the marked region in the original
buffer.  Reversion is simple: you refresh from the original buffer, no
messing with undo etc.  In this model, the only operations you perform
on the original buffer are (1) visible header movement, (2) setting
flags in Rmail-specific headers, and (3) replacement of the whole
message with an edited version.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
Universi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-23 15:03               ` Stephen J. Turnbull
@ 2003-05-24 23:19                 ` Richard Stallman
  2003-05-25 18:28                   ` Kai Großjohann
  2003-05-26  5:20                   ` Stephen J. Turnbull
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Stallman @ 2003-05-24 23:19 UTC (permalink / raw)
  Cc: emacs-devel

    If you want to preserve the original contents of the buffer, you must
    copy them somewhere, because many of the transformations performed to
    make text displayable are not invertible.  For example, it is
    perfectly legal in ISO 2022 coding systems to have two charset
    designations with no intervening text.  The first one will get lost.

It is important to have a way to edit the text that is displayed.  It
is desirable but not very important to preserve the first charset
designation.  If we can't do both, we should do the former.

	rms> In Rmail currently it is possible to type e and edit a
	rms> message.  Right now we do this through editing the buffer of
	rms> the RMAIL file.  With better MIME support, this may have to
	rms> be implemented differently, but I hope we can keep it working
	rms> somehow.

    I think this will require a lot of work if you wish to preserve file
    text verbatim unless explicitly edited (and this is essential for
    signed messages, for example).

This is easy to do--just don't replace the original text unless the
user has edited the message.

The user must type an explicit command, currently e, to edit the
message.  The edited text would be stored back into the file
only when the user exits edit mode (and not if he aborts the edit).

In addition, it is easy to compare the buffer contents with what is
produced by preparing the original message for display.  If they are
the same, then don't alter the original message text.

	rms> If we copy the message into another buffer for viewing, that
	rms> tends to lead to complications of the situation, because
	rms> there are multiple buffers instead of just one.  We could try
	rms> adding features to hide that, or we could expose it and not
	rms> hide anything.

    I don't see how it gets complicated.

Right now, if I switch to buffer RMAIL, I see the message that is
selected in the buffer RMAIL.  If that message is actually displayed
in another buffer, then switching to RMAIL won't show it, or won't
show it properly.

Perhaps we need a feature of "alias buffers".  If we set up buffer
RMAIL to specify buffer *View RMAIL* as its alias, then when buffer
RMAIL is selected in a window, buffer *View RMAIL* will actually
appear in the window and will actually be the current buffer most of
the time.  (You could still make RMAIL the current buffer by
explicitly calling set-buffer.)

Tar mode could also make use of this.  And Info.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-24 23:19                 ` Richard Stallman
@ 2003-05-25 18:28                   ` Kai Großjohann
  2003-05-27 12:44                     ` Richard Stallman
  2003-05-26  5:20                   ` Stephen J. Turnbull
  1 sibling, 1 reply; 20+ messages in thread
From: Kai Großjohann @ 2003-05-25 18:28 UTC (permalink / raw)


Richard Stallman <rms@gnu.org> writes:

> Right now, if I switch to buffer RMAIL, I see the message that is
> selected in the buffer RMAIL.  If that message is actually displayed
> in another buffer, then switching to RMAIL won't show it, or won't
> show it properly.

The whole file could be in a buffer " RMAIL file".  Then the RMAIL
buffer would contain what you expect.

Some voodoo may be required to make C-x C-f ~/RMAIL RET work as
expected.  (What is expected?  Maybe show the first message from the
file, or something.)
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-25 18:28                   ` Kai Großjohann
@ 2003-05-27 12:44                     ` Richard Stallman
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Stallman @ 2003-05-27 12:44 UTC (permalink / raw)
  Cc: emacs-devel

    > Right now, if I switch to buffer RMAIL, I see the message that is
    > selected in the buffer RMAIL.  If that message is actually displayed
    > in another buffer, then switching to RMAIL won't show it, or won't
    > show it properly.

    The whole file could be in a buffer " RMAIL file".  Then the RMAIL
    buffer would contain what you expect.

That would not completely work.  Doing C-x C-s in the RMAIL buffer
would not save the file.  One way or another we would need to record
the relationship between the two buffers so that various Emacs features
could treat them properly.

    Some voodoo may be required to make C-x C-f ~/RMAIL RET work as
    expected.  (What is expected?  Maybe show the first message from the
    file, or something.)

Yes, that another aspect of this problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-24 23:19                 ` Richard Stallman
  2003-05-25 18:28                   ` Kai Großjohann
@ 2003-05-26  5:20                   ` Stephen J. Turnbull
  2003-05-26 17:30                     ` Eli Zaretskii
  2003-05-27 12:44                     ` Richard Stallman
  1 sibling, 2 replies; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-26  5:20 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

        If you want to preserve the original contents of the
    buffer, you must copy them somewhere, because many of the
    transformations performed to make text displayable are not
    invertible.  For example, it is perfectly legal in ISO 2022
    coding systems to have two charset designations with no
    intervening text.  The first one will get lost.

    rms> It is important to have a way to edit the text that is
    rms> displayed.  It is desirable but not very important to
    rms> preserve the first charset designation.

That's assuming that it is text.  This implementation would make
corruption of attached binaries likely and signed messages somewhat
likely (I haven't done a survey to see how many redundant designations
are in my text mail files in about 6 years, but in 1997 there were a
fair number; about 1 every 20 Japanese messages).

    rms> If we can't do both, we should do the former.

I don't see why editing the presentation buffer, then replacing the
region in the full buffer, doesn't satisfy this requirement.

    rms> In Rmail currently it is possible to type e and edit a
    rms> message.  Right now we do this through editing the buffer of
    rms> the RMAIL file.  With better MIME support, this may have to
    rms> be implemented differently, but I hope we can keep it working
    rms> somehow.

        I think this will require a lot of work if you wish to
    preserve file text verbatim unless explicitly edited (and
    this is essential for signed messages, for example).

    rms> This is easy to do--just don't replace the original text
    rms> unless the user has edited the message.

But you don't have the original text any more (except in the disk
file).  You have decoded text in the buffer.  When you save that, you
will lose redundant ISO 2022 designations and directional sequences,
you will lose "naked newlines" in DOS files.  You will likely lose the
exactly format of decoded MIME words in headers in embedded messages.

And remember, if you implement this as an "rmail-coding-system", _all_
messages in the buffer will be automatically decoded.  It is not just
the current message whose corruption is being risked.

    rms> The user must type an explicit command, currently e, to edit
    rms> the message.  The edited text would be stored back into the
    rms> file only when the user exits edit mode (and not if he aborts
    rms> the edit).

If you change the displayed buffer at all (eg, to set a READ flag),
you save the decoded buffer contents.  These are in general different
at the binary level from what is on disk, and must be encoded.  ISO
2022 and MIME do not guarantee that decoding is invertible, and in
fact often the originating and receiving functions are different
implementations, and are not inverses.

	rms> If we copy the message into another buffer for viewing, that
	rms> tends to lead to complications of the situation, because
	rms> there are multiple buffers instead of just one.  We could try
	rms> adding features to hide that, or we could expose it and not
	rms> hide anything.

    I don't see how it gets complicated.

    rms> Right now, if I switch to buffer RMAIL, I see the message
    rms> that is selected in the buffer RMAIL.  If that message is
    rms> actually displayed in another buffer, then switching to RMAIL
    rms> won't show it, or won't show it properly.

Kai Großjohann's answer to this (rename the presentation buffer to
RMAIL, users rarely will want to see the full buffer, so it can be
renamed to a "hidden" buffer name) is correct as far as I can tell
from my own experience, convenient for the user, and easily
implemented.  It's even convenient for third-party developers who
provide add-on facilities, as they don't need to worry so much about
buffer restrictions as under Rmail.  I used to find that a major
annoyance (admittedly, I was a rather unskilled Lisp programmer when I
was using Rmail).

You are right that this is a potential problem.  Both VM and Gnus try
to maintain "window configurations", and they occasionally get it
wrong.  It's easy to work around, but annoying when it happens.
However, their window configurations are much more complex than the
scheme Kai suggested, and I think the probability of it causing
problems is very low.

AFAIK Rmail is the only remaining major Emacs MUA that handles mail
folders implemented as single files by transforming to display format
in place in a single buffer.  In fact, IIRC VM switched from Rmail-
style "full-buffer-is-display-buffer" for text-only messages
relatively recently (it always had presentation buffers, but
restricted them to multimedia messages, eg, containing images or
audio).  While that's not in itself a good reason for Rmail to switch,
I think it tends to indicate that difficulty of implementation is not
so great as you think.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-26  5:20                   ` Stephen J. Turnbull
@ 2003-05-26 17:30                     ` Eli Zaretskii
  2003-05-27 10:03                       ` Stephen J. Turnbull
  2003-05-27 12:44                     ` Richard Stallman
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2003-05-26 17:30 UTC (permalink / raw)
  Cc: emacs-devel

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Date: Mon, 26 May 2003 14:20:54 +0900
> 
>     rms> It is important to have a way to edit the text that is
>     rms> displayed.  It is desirable but not very important to
>     rms> preserve the first charset designation.
> 
> That's assuming that it is text.  This implementation would make
> corruption of attached binaries likely and signed messages somewhat
> likely

Aren't attachments clearly marked in the message as being such?  Can't
Emacs look for those markers (the part delimiters in a multi-part
message) and refrain from decoding binary data while decoding text?

> Kai Grossjohann's answer to this (rename the presentation buffer to
> RMAIL, users rarely will want to see the full buffer, so it can be
> renamed to a "hidden" buffer name) is correct as far as I can tell
> from my own experience, convenient for the user, and easily
> implemented.

I see one significant disadvantage of this design: it will require
thorough rewrite of many parts in RMAIL, since the code as it is now
assumes a single buffer, narrowed as required.  I don't have enough
information and experience to judge whether this is a serious
disadvantage, though.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-26 17:30                     ` Eli Zaretskii
@ 2003-05-27 10:03                       ` Stephen J. Turnbull
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-27 10:03 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:

    Eli> Aren't attachments clearly marked in the message as being
    Eli> such?  Can't Emacs look for those markers (the part
    Eli> delimiters in a multi-part message) and refrain from decoding
    Eli> binary data while decoding text?

Yes and yes.  It's complex, though, and coding is not an area where
you want that.  For example, consider that the state of the buffer is
fragile, partly decoded to multibyte and partly raw unibyte.  Failure
to handle an error while decoding risks crashes.

    Eli> I see one significant disadvantage of this design: it will
    Eli> require thorough rewrite of many parts in RMAIL, since the
    Eli> code as it is now assumes a single buffer, narrowed as
    Eli> required.  I don't have enough information and experience to
    Eli> judge whether this is a serious disadvantage, though.

It's surely possible to maintain the single buffer model by adding a
mail-coding-system.  But I think that will make the code less modular,
require extensive duplication of existing functionality, be harder to
maintain, and present greater risk of undetected corruption (and even
crashes) than using the presentation buffer model.

I've presented my opinion, but that's all it is.  At the very least it
exposes some of the potential pitfalls, and thus could help with
dealing with them.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-26  5:20                   ` Stephen J. Turnbull
  2003-05-26 17:30                     ` Eli Zaretskii
@ 2003-05-27 12:44                     ` Richard Stallman
  2003-05-27 15:12                       ` Stephen J. Turnbull
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Stallman @ 2003-05-27 12:44 UTC (permalink / raw)
  Cc: emacs-devel

    That's assuming that it is text.  This implementation would make
    corruption of attached binaries likely and signed messages somewhat
    likely

I don't think so.  Why would you edit a binary attachment?

    But you don't have the original text any more (except in the disk
    file).

I have the impression we are not talking about the same thing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-27 12:44                     ` Richard Stallman
@ 2003-05-27 15:12                       ` Stephen J. Turnbull
  2003-05-28 23:57                         ` Richard Stallman
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-27 15:12 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    rms> I have the impression we are not talking about the same
    rms> thing.

Perhaps not.  The thread started with a proposal for a coding system
that would present Rmail with an already decoded buffer, so that the
process of dealing with multiple coding systems and the like would be
transparent to Rmail.  And I've been assuming that.

But the basic idea applies (although with somewhat less force) to any
one-buffer implementation.  If decoding and encoding are not inverses,
you risk corruption simply by reading mail.

    That's assuming that it is text.  This implementation
    would make corruption of attached binaries likely and signed
    messages somewhat likely

    rms> I don't think so.  Why would you edit a binary attachment?

application/x-patch, for example.  From the point of view of the user
it's text to be read, but patch will get upset if you cause any change
to the original context lines.  I've been bit by that one a number of
times when a program decided to reencode a patch to Japanese text with
a variant of ISO-2022-JP different from the original.  Patch won't
apply, with no visible sign of why not.  Grrr!  GPG-signed, for
another example.

You don't need to explicitly edit one of these attachments, just
(non-invertibly) decode it for viewing and reencode it for saving the
buffer.

It just occurred to me that you must use a presentation buffer, or
save the entire encrypted text, for a GPG _encrypted_ message, since
you can't reencode that.  Of course that's a solitary special case.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-27 15:12                       ` Stephen J. Turnbull
@ 2003-05-28 23:57                         ` Richard Stallman
  2003-05-29  8:20                           ` Stephen J. Turnbull
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Stallman @ 2003-05-28 23:57 UTC (permalink / raw)
  Cc: emacs-devel

    Perhaps not.  The thread started with a proposal for a coding system
    that would present Rmail with an already decoded buffer, so that the
    process of dealing with multiple coding systems and the like would be
    transparent to Rmail.  And I've been assuming that.

I thought that idea had been pretty much proved to be unusable, and
then someone (was it you?) brought up the alternative of using two
buffers.  So I have been talking about how to do that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-28 23:57                         ` Richard Stallman
@ 2003-05-29  8:20                           ` Stephen J. Turnbull
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen J. Turnbull @ 2003-05-29  8:20 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "rms" == Richard Stallman <rms@gnu.org> writes:

    Perhaps not.  The thread started with a proposal for a coding system
    that would present Rmail with an already decoded buffer, so that the
    process of dealing with multiple coding systems and the like would be
    transparent to Rmail.  And I've been assuming that.

    rms> I thought that idea had been pretty much proved to be
    rms> unusable, and then someone (was it you?) brought up the
    rms> alternative of using two buffers.  So I have been talking
    rms> about how to do that.

Right, I see that now.

I think both approaches have merits.  I suspect that you will discover
(as Kyle and larsi apparently did) that a full-fledged MIME
implementation is much cleaner if you use a separate presentation
buffer.  But for the minimal usage (converting text/plain; charset=FOO
to Mule code, and displaying text attachments like patches inline),
you don't need it.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
       [not found]     ` <i56smr8j8lk.fsf@mao.acc.umu.se>
  2003-05-21 22:29       ` Stefan Monnier
@ 2003-05-22  1:32       ` Kenichi Handa
  1 sibling, 0 replies; 20+ messages in thread
From: Kenichi Handa @ 2003-05-22  1:32 UTC (permalink / raw)
  Cc: emacs-devel

In article <i56smr8j8lk.fsf@mao.acc.umu.se>, stktrc <stktrc@yahoo.com> writes:
> Yes, but decode-coding-region modifies the character content of the
> buffer (which is undesirable).  I want the decoding to happen before
> the characters hit the buffer.

??? I'm confused.  You wrote you want to have unmodified
encoded message in a buffer, but here you wrote decoding is
ok.

What's the difference in decodings that happen before and
after the characters being inserted the buffer?

> When Emacs loads a file (that is supposed to be encoded in for example
> ISO 2022, a 7 bit based encoding), does it first load the file
> contents into a buffer as ASCII text and then applies
> decode-coding-region to the entire buffer (conceptually)?

> I wouldn't think so (but I don't know).

Actually yes (conceptually), why not?

> Using decode-coding-region modifies the buffer contents because the
> actual characters present in the buffer change (the two characters AA
> might become the character Å or whatever, and hence the buffer
> contents has been modified).

If what you concern is the buffer modified flag, you can
reset that by set-buffer-modified-p.  If what you concern is
the undo list, you can also set buffer-undo-list to nil.
That is what insert-file-contents does when called with
VISIT arg as t.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME)
  2003-05-21 19:53   ` Stefan Monnier
       [not found]     ` <i56smr8j8lk.fsf@mao.acc.umu.se>
@ 2003-05-22 13:16     ` Kai Großjohann
  1 sibling, 0 replies; 20+ messages in thread
From: Kai Großjohann @ 2003-05-22 13:16 UTC (permalink / raw)

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

>> Would it be possible to have different coding-systems (the decoding of
>> octets from a file into characters in a buffer) for different ranges
>> of octets in a file?
>
> Of course, it's possible: coding systems are operations, not data.
> Emacs offers straightforward ways to apply those operations to whole
> files when reading and saving them, as well as straightforward
> ways to apply those operations to parts of a buffer.

Actually, an mbox-coding-system sounds rather attractive to me.  I
think there are existing encodings that use escape sequences so
that an application reads ascii, then comes an escape sequence that
says "Japanese from here on", then comes another escape sequence that
says "ascii again".  Is this true?

Emacs-mule uses a similar mechanism, except that the escape sequences
are always applied to one character only.  So \201 means to read one
character in Latin-1, and so on.

And, conceptually, the charset specs in Content-Type headers are just
such escape sequences.

I guess it would be difficult to implement the encoding, but it would
be just what RMAIL needs.  You can then just find the file and then
narrow to certain regions.  I understand that Richard likes this way
of working with mailboxes.  (Reading the file into the buffer is
comparatively easy, but I'm not sure how to make sure that
read-then-write doesn't change the file.)
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2003-05-29  8:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <i56wugl13qq.fsf@mao.acc.umu.se>
2003-05-21  2:43 ` Different (buffer-file-)coding-systems for different regions of one buffer? (for Rmail MIME) Kenichi Handa
2003-05-22  8:33   ` Richard Stallman
     [not found] ` <i563cj8kz7e.fsf@mao.acc.umu.se>
2003-05-21 19:53   ` Stefan Monnier
     [not found]     ` <i56smr8j8lk.fsf@mao.acc.umu.se>
2003-05-21 22:29       ` Stefan Monnier
     [not found]         ` <i56u1bnn567.fsf@mao.acc.umu.se>
2003-05-22  3:41           ` Stephen J. Turnbull
     [not found]             ` <i56ptmateuj.fsf@mao.acc.umu.se>
2003-05-23 10:56               ` Stephen J. Turnbull
2003-05-23 12:03             ` Richard Stallman
2003-05-23 15:03               ` Stephen J. Turnbull
2003-05-24 23:19                 ` Richard Stallman
2003-05-25 18:28                   ` Kai Großjohann
2003-05-27 12:44                     ` Richard Stallman
2003-05-26  5:20                   ` Stephen J. Turnbull
2003-05-26 17:30                     ` Eli Zaretskii
2003-05-27 10:03                       ` Stephen J. Turnbull
2003-05-27 12:44                     ` Richard Stallman
2003-05-27 15:12                       ` Stephen J. Turnbull
2003-05-28 23:57                         ` Richard Stallman
2003-05-29  8:20                           ` Stephen J. Turnbull
2003-05-22  1:32       ` Kenichi Handa
2003-05-22 13:16     ` Kai Großjohann

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.