From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Eli Zaretskii" Newsgroups: gmane.emacs.bugs Subject: Re: Emacs doesn't write with the encoding it used for reading Date: Fri, 05 Apr 2002 19:17:12 +0300 Sender: bug-gnu-emacs-admin@gnu.org Message-ID: <2950-Fri05Apr2002191711+0300-eliz@is.elta.co.il> References: <2D8309604314D41187850008C7BB0CB24A9913@MCHH248E> Reply-To: Eli Zaretskii NNTP-Posting-Host: localhost.gmane.org X-Trace: main.gmane.org 1018023850 31884 127.0.0.1 (5 Apr 2002 16:24:10 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 5 Apr 2002 16:24:10 +0000 (UTC) Cc: bug-gnu-emacs@gnu.org Original-Received: from fencepost.gnu.org ([199.232.76.164]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 16tWVV-0008I9-00 for ; Fri, 05 Apr 2002 18:24:09 +0200 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 16tWVR-0007Ld-00; Fri, 05 Apr 2002 11:24:05 -0500 Original-Received: from mirapoint.inter.net.il ([192.114.186.20]) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 16tWT9-0007IA-00 for ; Fri, 05 Apr 2002 11:21:43 -0500 Original-Received: from zaretsky (diup-223-102.inter.net.il [213.8.223.102]) by mirapoint.inter.net.il (Mirapoint Messaging Server MOS 2.9.3.2) with ESMTP id AAQ09239; Fri, 5 Apr 2002 19:21:30 +0300 (IDT) Original-To: Heinrich.Rommerskirchen@icn.siemens.de X-Mailer: emacs 21.2.50 (via feedmail 8 I) and Blat ver 1.8.9 In-Reply-To: <2D8309604314D41187850008C7BB0CB24A9913@MCHH248E> (message from Rommerskirchen Heinrich on Fri, 5 Apr 2002 14:59:54 +0200) Errors-To: bug-gnu-emacs-admin@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.0.8 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.bugs:421 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:421 > From: Rommerskirchen Heinrich > Date: Fri, 5 Apr 2002 14:59:54 +0200 > > If a file contains a mixture of German DOS-encoded text (cp850) and > German Windows-encoded text it is read as latin1 but emacs will not > write it back as latin1. This is a known issue. As surprising as it sounds, it actually makes sense (IMHO): this is how the user becomes aware that her files have inconsistent encoding. Normally, such files are corrupted in some way. When Emacs reads a file with random 8-bit bytes that don't fit Latin-1, it decodes those bytes into a special character set reserved for decoding binary bytes. To see what does Emacs thinks about those characters, go to one of them and type "C-u C-x =". You will see that they are not Latin-1 characters, as far as Emacs is concerned. It should be possible to allow Emacs to write those bytes when you save a Latin-1 buffer, but doing so using the existing Emacs machinery has unpleasant side effects, so it was decided not to do that. What practical problems do you have with this? That is, when does it make sense to have a file that mixes Latin-1 and DOS cp850 encoding? > Saving with utf8 gives a file with 6 bytes, which emacs again reads > as latin1 You expect Emacs to guess that the file you saved is in UTF-8. However, UTF-8 encoding cannot be easily distinguished from other signe-byte encodings. Therefore, Emacs uses a priority list, whereby it chooses the first single-byte encoding from the list. The default configuration puts UTF-8 very far from the beginning of the list, so Emacs guesses wrong. You should either make UTF-8 your preferred coding system, or force Emacs to read the file as UTF-8 with "C-x RET c". > Saving the original file (4 bytes) with raw-text gives a file with 5 > bytes which doesn't change anymore on reading and writing (emacs > uses encoding raw-text-dos), it contains the original bytes plus a > spurious \201 (Umlaut-u in DOS encoding) Right. If you need to edit a file that mixes several encodings, you should visit it with raw-text. Then saving it with raw-text will do what you expect.