revert-buffer and changes in encoding

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* revert-buffer and changes in encoding
@ 2004-11-29  6:12 Richard Stallman
  2005-05-17 14:02 ` Evil Boris
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Stallman @ 2004-11-29  6:12 UTC (permalink / raw)


We have had discussions in the past about how revert-buffer
should deal with encoding.  I think this statement, by Stefan,
is the right thing to do:

    Looks like we should remember whether the current coding-system was
    automatically inferred or whether it was explicitly specified.
    Of course, we need to remember it separatly for the line-ending part
    of the coding-system.

If the file was originally visited with a specified coding system,
revert-buffer should keep on using that specified coding system.  If
the file coding system was autodetected before, revert-buffer should
autodetect it again, and it should set buffer-file-coding-system again
so that saving the buffer uses the most recent coding system.

Setting this flag to record whether the coding system was explicitly
specified is easy to do in Finsert_buffer.  It should only set this
flag when the VISIT argument specifies visiting a file.

Can someone please implement this, and ack to me?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
       [not found] <E1Cgi5j-0007hF-AI@fencepost.gnu.org>
@ 2004-12-30 12:44 ` Kenichi Handa
  2004-12-30 20:59   ` Richard Stallman
  0 siblings, 1 reply; 11+ messages in thread
From: Kenichi Handa @ 2004-12-30 12:44 UTC (permalink / raw)
  Cc: emacs-devel

In article <E1Cgi5j-0007hF-AI@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

> Can you implement this change?  Nobody else offered to,
> and I think it's in your area.  Please ack when it's done.

> We have had discussions in the past about how revert-buffer
> should deal with encoding.  I think this statement, by Stefan,
> is the right thing to do:

>     Looks like we should remember whether the current coding-system was
>     automatically inferred or whether it was explicitly specified.
>     Of course, we need to remember it separatly for the line-ending part
>     of the coding-system.

> If the file was originally visited with a specified coding system,
> revert-buffer should keep on using that specified coding system.  If
> the file coding system was autodetected before, revert-buffer should
> autodetect it again, and it should set buffer-file-coding-system again
> so that saving the buffer uses the most recent coding system.

> Setting this flag to record whether the coding system was explicitly
> specified is easy to do in Finsert_buffer.  It should only set this
> flag when the VISIT argument specifies visiting a file.

> Can someone please implement this, and ack to me?

I've just installed these changes:

(1) Make a new buffer local variable explicit-buffer-file-coding-system.

(2) Set it to coding-system-for-read in after-insert-file-set-coding.

(3) Set it to last-coding-system-used in basic-save-buffer-1
    after writing.

(4) In revert-buffer, bind coding-system-for-read of
    explicit-buffer-file-coding-system.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2004-12-30 12:44 ` Kenichi Handa
@ 2004-12-30 20:59   ` Richard Stallman
  2004-12-30 23:52     ` Kenichi Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Stallman @ 2004-12-30 20:59 UTC (permalink / raw)
  Cc: emacs-devel

    (1) Make a new buffer local variable explicit-buffer-file-coding-system.

Could you rename that to buffer-file-coding-system-explicit?  I think
it should start with buffer-.

    (3) Set it to last-coding-system-used in basic-save-buffer-1
	after writing.

Is that right?  Wouldn't that get automatically-chosen coding systems
as well as explicit user-specified coding systems?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2004-12-30 20:59   ` Richard Stallman
@ 2004-12-30 23:52     ` Kenichi Handa
  2005-01-01  5:24       ` Richard Stallman
  2005-01-01 21:28       ` Stefan
  0 siblings, 2 replies; 11+ messages in thread
From: Kenichi Handa @ 2004-12-30 23:52 UTC (permalink / raw)
  Cc: emacs-devel

In article <E1Ck7Nu-0006kl-7y@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     (1) Make a new buffer local variable explicit-buffer-file-coding-system.
> Could you rename that to buffer-file-coding-system-explicit?  I think
> it should start with buffer-.

Ok.

>     (3) Set it to last-coding-system-used in basic-save-buffer-1
> 	after writing.

> Is that right?  Wouldn't that get automatically-chosen coding systems
> as well as explicit user-specified coding systems?

Yes.  But, whatever coding system is used for writing a
file, revert-buffer should read the file with the same
coding system.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2004-12-30 23:52     ` Kenichi Handa
@ 2005-01-01  5:24       ` Richard Stallman
  2005-01-01 21:28       ` Stefan
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2005-01-01  5:24 UTC (permalink / raw)
  Cc: emacs-devel

    > Is that right?  Wouldn't that get automatically-chosen coding systems
    > as well as explicit user-specified coding systems?

    Yes.  But, whatever coding system is used for writing a
    file, revert-buffer should read the file with the same
    coding system.

That seems like a good argument.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2004-12-30 23:52     ` Kenichi Handa
  2005-01-01  5:24       ` Richard Stallman
@ 2005-01-01 21:28       ` Stefan
  2005-01-05  1:18         ` Kenichi Handa
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan @ 2005-01-01 21:28 UTC (permalink / raw)
  Cc: rms, emacs-devel

>> Is that right?  Wouldn't that get automatically-chosen coding systems
>> as well as explicit user-specified coding systems?

> Yes.  But, whatever coding system is used for writing a
> file, revert-buffer should read the file with the same
> coding system.

But then the name you chose is wrong.  If the name is "foobar-explicit" it
should only be non-nil if foobar was set explicitly, not automatically.

Knowing when buffer-file-coding-system was set explicitly is important also
in select-safe-coding-system (where it should not try any other cs, including
the preferred cs).

        Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2005-01-01 21:28       ` Stefan
@ 2005-01-05  1:18         ` Kenichi Handa
  2005-01-05  1:34           ` Miles Bader
  2005-01-05 20:07           ` Richard Stallman
  0 siblings, 2 replies; 11+ messages in thread
From: Kenichi Handa @ 2005-01-05  1:18 UTC (permalink / raw)
  Cc: rms, emacs-devel

In article <m1pt0o4z9m.fsf-monnier+emacs@gnu.org>, Stefan <monnier@iro.umontreal.ca> writes:

>>>  Is that right?  Wouldn't that get automatically-chosen coding systems
>>>  as well as explicit user-specified coding systems?

>>  Yes.  But, whatever coding system is used for writing a
>>  file, revert-buffer should read the file with the same
>>  coding system.

> But then the name you chose is wrong.  If the name is "foobar-explicit" it
> should only be non-nil if foobar was set explicitly, not automatically.

Ah, ummm, I agree that the name is not good.  The current
semantics of that variable is "what Emacs thinks as the
encoding of disk file" and the semantics of
buffer-file-coding-system is "what Emacs will use by default
for writing out the buffer".

It seems that just file-coding-system is better than
buffer-file-coding-system-explicit.

> Knowing when buffer-file-coding-system was set explicitly is important also
> in select-safe-coding-system (where it should not try any other cs, including
> the preferred cs).

I think what important in select-safe-coding-system is
whether buffer-file-coding-system has local binding or not
(while treating undecided-unix as no local binding in text
encoding and has local binding in eol encoding).  If it has
local binding, select-safe-coding-system should not try any
other encoding.  With that change, I think
select-safe-coding-system behaves correctly in any cases.

You wrote:
> If I open a new file, insert é and then do the following:

>    C-x RET f us-ascii RET
>    C-x C-s

> the file is saved in latin-1.  This is because when saving
> buffer-file-coding-system is just one of several coding-systems used.

> Another annoying situation is when you load a utf-8 file containing mostly
> latin-1 chars plus a few non-latin-1 chars.  Let's say you don't know that
> there are non-latin-1 chars and want to change the file to latin-1.  You do:

>    C-x RET f latin-1 RET
>    C-x C-s

> and the buffer and file is back to utf-8 !?!

In both cases, with the above change,
select-safe-coding-system will ask you what coding system to
use while showing offending chars.

> Another problem I've encountered (recently with the iso-2022-7bit ->
> utf-8 -> iso-2022-7bit dance in mule-cmds.el) is that iso-2022-7bit cannot
> encode eight-bit-control characters, so if you read an iso-2022-7bit file
> with invalid sequences in it, you get a buffer that you can't save.
> Worse yet, when you try to save it it might say "selected encoding
> mule-utf-8 disagrees with iso-2022-7bit-unix specified by file contents" but
> if you look at the buffer's modeline it says "J", not "u", so you wonder
> what's up with this utf-8 thing.

Whether we should allow saving such a file by iso-2022-7bit
silently or not is another problem.  If offending characters
are only raw-bytes, how about this:

Show in *Warning* buffer:

As the buffer contains 8-bit characters, if you save it by
iso-2022-7bit, the file won't be read back correctly by the
same coding system.

And ask a user:

Do you really want to save it by iso-2022-7bit (y or n)?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2005-01-05  1:18         ` Kenichi Handa
@ 2005-01-05  1:34           ` Miles Bader
  2005-01-05  4:23             ` Kenichi Handa
  2005-01-05 20:07           ` Richard Stallman
  1 sibling, 1 reply; 11+ messages in thread
From: Miles Bader @ 2005-01-05  1:34 UTC (permalink / raw)
  Cc: emacs-devel, Stefan, rms

> And ask a user:
> 
> Do you really want to save it by iso-2022-7bit (y or n)?

What does it do if he answers "n"?  Abort?

If I was a beginner, I think I'd be pretty confused about what to do
next then...

Maybe either the prompt should give more options (e.g., allow saving
using different coding systems), or the *Warning* buffer should also
give advice about how to change the coding system.

-Miles

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2005-01-05  1:34           ` Miles Bader
@ 2005-01-05  4:23             ` Kenichi Handa
  0 siblings, 0 replies; 11+ messages in thread
From: Kenichi Handa @ 2005-01-05  4:23 UTC (permalink / raw)
  Cc: monnier, rms, emacs-devel

In article <fc339e4a050104173443612a7a@mail.gmail.com>, Miles Bader <snogglethorpe@gmail.com> writes:

>>  And ask a user:
>>  
>>  Do you really want to save it by iso-2022-7bit (y or n)?

> What does it do if he answers "n"?  Abort?

> If I was a beginner, I think I'd be pretty confused about what to do
> next then...

> Maybe either the prompt should give more options (e.g., allow saving
> using different coding systems), or the *Warning* buffer should also
> give advice about how to change the coding system.

I think the existence of 8-bit characters in a buffer itself
is a big problem for a beginner, and we can't help him that
much by anything other than deleting all of them or
converting them to something like "<0xXX>".

By the way, I think a beginner won't use or face with
iso-2022-7bit.  I believe he has no idea about what is
iso-2022-7bit.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2005-01-05  1:18         ` Kenichi Handa
  2005-01-05  1:34           ` Miles Bader
@ 2005-01-05 20:07           ` Richard Stallman
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Stallman @ 2005-01-05 20:07 UTC (permalink / raw)
  Cc: monnier, emacs-devel

    Ah, ummm, I agree that the name is not good.  The current
    semantics of that variable is "what Emacs thinks as the
    encoding of disk file" 

I think a better definition is "a choice of encoding based
on some sort of evidence beyond guessing."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: revert-buffer and changes in encoding
  2004-11-29  6:12 revert-buffer and changes in encoding Richard Stallman
@ 2005-05-17 14:02 ` Evil Boris
  0 siblings, 0 replies; 11+ messages in thread
From: Evil Boris @ 2005-05-17 14:02 UTC (permalink / raw)



I do not know if the issue I describe below is related to the thread I
am posting this under (how encoding is handled under revert-buffer).

I have seen the following behavior in both BBDB and RMAIL (with recent
CVS Emacs).  I am not sure it is repeatable every time.  I will
descrive the RMAIL situation, BBDB is similar...

I have Emacs running with RMAIL.  Several messages are in various
non-ASCII encodings (UTF-8, latin-1, koi8-r, for example).  They are
all displayed proprerly.  I read mail in an another instance of
Emacs.  Now I return to this one, RMAIL asks if it should re-read the
file.  I say "yes".  At this point (some of) the non-ASCII msgs are
displayed as \231 gibberish.  My first reaction was that RMAIL managed
to mangle the mailbox, but this is not the case: Killing the buffer
and re-reading it brings everything back to normal (in fact, SAVING
the buffer, killing it, and re-reading it brings everything back).

BBDB is similar---if I change .bbdb in another instance of emacs and
am stupid enough to let bbdb revert the data base, all the "funny"
characters display as gibberish, until I kill .bbdb buffer and let
bbdb reload it.  

Bright ideas are welcome.
       --Boris

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-17 14:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-29  6:12 revert-buffer and changes in encoding Richard Stallman
2005-05-17 14:02 ` Evil Boris
     [not found] <E1Cgi5j-0007hF-AI@fencepost.gnu.org>
2004-12-30 12:44 ` Kenichi Handa
2004-12-30 20:59   ` Richard Stallman
2004-12-30 23:52     ` Kenichi Handa
2005-01-01  5:24       ` Richard Stallman
2005-01-01 21:28       ` Stefan
2005-01-05  1:18         ` Kenichi Handa
2005-01-05  1:34           ` Miles Bader
2005-01-05  4:23             ` Kenichi Handa
2005-01-05 20:07           ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).