Emacs doesn't write with the encoding it used for reading

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* Emacs doesn't write with the encoding it used for reading
@ 2002-04-05 12:59 Rommerskirchen Heinrich
  2002-04-05 16:17 ` Eli Zaretskii
  0 siblings, 1 reply; 2+ messages in thread
From: Rommerskirchen Heinrich @ 2002-04-05 12:59 UTC (permalink / raw)


This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.2.1 (i386-msvc-nt4.0.1381)
 of 2002-03-19 on buffy
configured using `configure --with-msvc (12.00)'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: DEU
  locale-coding-system: iso-latin-1
  default-enable-multibyte-characters: t

If a file contains a mixture of German DOS-encoded text (cp850) and
German Windows-encoded text it is read as latin1 but emacs will not
write it back as latin1.

Simple Example:
A file contains the 4 Bytes 0xE4 0x84 0x0D 0x0A. (Umlaut-a in DOS and Windows
encoding followed by CR LF).

Emacs unter German Windows reads it as latin1. If I now type a
character and delete it again, so that I have the same content,
emacs will not write it back as latin1 but instead suggests a few
encodings with default utf8.

Saving with utf8 gives a file with 6 bytes, which emacs again reads as latin1, but doesn't
write as latin1, choosing again utf8 gives a file with 10 bytes and this doesn't
change anymore, but contains neither of the original two characters.

Saving the original file (4 bytes) with raw-text gives a file with 5 bytes which
doesn't change anymore on reading and writing (emacs uses encoding raw-text-dos), it contains
the original bytes plus a spurious \201 (Umlaut-u in DOS encoding)

Recent input:
C-x C-f / t e m p / x x <return> <end> SPC <backspace> 
C-x C-s C-g <menu-bar> <help-menu> <report-emacs-b
ug>

Recent messages:
(C:\bin\emacs-21.2\bin\emacs.exe -q --no-site-file)
For information about the GNU Project and its goals, type C-h C-p.
Loading image...done
Loading view...done
byte-code: Quit
Loading emacsbug...done

Regards

Heinz

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Emacs doesn't write with the encoding it used for reading
  2002-04-05 12:59 Emacs doesn't write with the encoding it used for reading Rommerskirchen Heinrich
@ 2002-04-05 16:17 ` Eli Zaretskii
  0 siblings, 0 replies; 2+ messages in thread
From: Eli Zaretskii @ 2002-04-05 16:17 UTC (permalink / raw)
  Cc: bug-gnu-emacs

> From: Rommerskirchen Heinrich <Heinrich.Rommerskirchen@icn.siemens.de>
> Date: Fri, 5 Apr 2002 14:59:54 +0200
> 
> If a file contains a mixture of German DOS-encoded text (cp850) and
> German Windows-encoded text it is read as latin1 but emacs will not
> write it back as latin1.

This is a known issue.  As surprising as it sounds, it actually makes
sense (IMHO): this is how the user becomes aware that her files have
inconsistent encoding.  Normally, such files are corrupted in some
way.

When Emacs reads a file with random 8-bit bytes that don't fit
Latin-1, it decodes those bytes into a special character set reserved
for decoding binary bytes.  To see what does Emacs thinks about those
characters, go to one of them and type "C-u C-x =".  You will see that
they are not Latin-1 characters, as far as Emacs is concerned.

It should be possible to allow Emacs to write those bytes when you
save a Latin-1 buffer, but doing so using the existing Emacs machinery
has unpleasant side effects, so it was decided not to do that.

What practical problems do you have with this?  That is, when does it
make sense to have a file that mixes Latin-1 and DOS cp850 encoding?

> Saving with utf8 gives a file with 6 bytes, which emacs again reads
> as latin1

You expect Emacs to guess that the file you saved is in UTF-8.
However, UTF-8 encoding cannot be easily distinguished from other
signe-byte encodings.  Therefore, Emacs uses a priority list, whereby
it chooses the first single-byte encoding from the list.  The default
configuration puts UTF-8 very far from the beginning of the list, so
Emacs guesses wrong.  You should either make UTF-8 your preferred
coding system, or force Emacs to read the file as UTF-8 with "C-x RET c".

> Saving the original file (4 bytes) with raw-text gives a file with 5
> bytes which doesn't change anymore on reading and writing (emacs
> uses encoding raw-text-dos), it contains the original bytes plus a
> spurious \201 (Umlaut-u in DOS encoding)

Right.  If you need to edit a file that mixes several encodings, you
should visit it with raw-text.  Then saving it with raw-text will do
what you expect.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-04-05 16:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-05 12:59 Emacs doesn't write with the encoding it used for reading Rommerskirchen Heinrich
2002-04-05 16:17 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).