From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: kai.grossjohann@gmx.net (=?iso-8859-1?q?Kai_Gro=DFjohann?=) Newsgroups: gmane.emacs.help Subject: Re: problem with editing/decoding utf-8 text Date: Fri, 23 May 2003 18:50:08 +0200 Organization: University of Duisburg, Germany Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Message-ID: <843cj5hakf.fsf@lucy.is.informatik.uni-duisburg.de> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1053716550 396 80.91.224.249 (23 May 2003 19:02:30 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Fri, 23 May 2003 19:02:30 +0000 (UTC) Original-X-From: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Fri May 23 21:02:26 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19JHoA-00005y-00 for ; Fri, 23 May 2003 21:02:26 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.20) id 19JHmK-0006v8-JZ for gnu-help-gnu-emacs@m.gmane.org; Fri, 23 May 2003 15:00:32 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!fu-berlin.de!uni-berlin.de!pd9e1e697.dip.t-dialin.NET!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 84 Original-NNTP-Posting-Host: pd9e1e697.dip.t-dialin.net (217.225.230.151) Original-X-Trace: fu-berlin.de 1053716246 1238244 217.225.230.151 (16 [73968]) Mail-Copies-To: never User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3.50 (gnu/linux) Cancel-Lock: sha1:i5RoqGzczPyu6zm0C0iaKmSuzeg= Original-Xref: shelby.stanford.edu gnu.emacs.help:113633 Original-To: help-gnu-emacs@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1b5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.help:10129 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:10129 Fery writes: > I have a UTF-8 text file, containing latin-1 text. When I try to edit it > with emacs, it does not detect that it is utf-8; the > describe-coding-system gives back 'iso-latin-1-unix'. (And I see the > two-byte representation of latin1 chars, which is not bad to me.) Released versions of Emacs put UTF-8 at a rather low priority for automatic encoding detection. So you need to help Emacs by explicitly specifying the encoding. Do C-x RET c utf-8 RET before using C-x C-f to open the file. You can also put utf-8 somewhat earlier in the list for automatic encoding detection. I think this can be achieved in the following way, but I'm not sure. I'm not a Mule expert. If anyone knows better, please help out. (setq coding-category-list (cons 'coding-category-utf-8 (delq 'coding-cateogcoding-utf-8 coding-category-list))) > When I save the buffer, it displays an error message: > > These default coding systems were tried: > iso-latin-1-unix > However, none of them safely encodes the target text. > > Now, no matter what I choose (raw-text, no-conversion, utf-8), it > modifies all of the utf8 chars which are not fit into the ascii charset. > It seems, that it inserts a \201 before every char which is not in the > ascii charset. I.e. if I just load and save a file, emacs does not > behaves transparently. You should make sure that UTF-8 is properly recognized when opening the file, then saving will Just Work. > I have found one solution: opening the file with > universal-coding-system-argument, using even UTF-8 (then I see correctly > the chars, although it is not always important) or e.g. no-conversion. Do not use no-conversion. The file is UTF-8, so UTF-8 is the right encoding to specify. > My questions: > > 0. What is this \201 byte? Emacs encodes Latin-1 characters internally by a two-byte sequence. The first byte is \201 (indicating the Latin-1 character set), and the second byte is the actual character. \202 stands for Latin-2, as you might guess. > 1. Cannot I tell to a buffer (after the load of a file) that interpet it > as binary, and save exactly the same bytes what it did read into the > buffer (i.e. transparent buffer)? It's not a good idea. The buffer contents might already be munged at that point. > 2. What is the difference between raw-text, no-conversion, binary? On > some places, I can choose any of them, on other places not... This whole > coding system is a nightmare... :((( The differences are rather subtle, I'm afraid. I think binary is an alias for no-conversion. raw-text does EOL conversion, whereas no-conversion doesn't. > 3. Cannot I tell to emacs that interpret the keyboard input as > "raw"? I have set input-meta to On, convert-meta to Off in .inputrc, > and if I could tell emacs that "just interpret the bytes from the > terminal input what they are", then I could copy/paste utf-8 data > (in raw format) from another application. (I run emacs on linux, > with the 'putty' terminal on windows). It does not make sense to do that, IMHO. For example, M-f would cease to work because Emacs wouldn't know what characters are represented by the bytes, and so it wouldn't know which characters are parts of words. But it seems your terminal uses utf-8, so you can just teach Emacs about this: C-x RET k utf-8 RET. -- This line is not blank.