From mboxrd@z Thu Jan  1 00:00:00 1970
Path: main.gmane.org!not-for-mail
From: kai.grossjohann@gmx.net (=?iso-8859-1?q?Kai_Gro=DFjohann?=)
Newsgroups: gmane.emacs.help
Subject: Re: problem with editing/decoding utf-8 text
Date: Fri, 23 May 2003 18:50:08 +0200
Organization: University of Duisburg, Germany
Sender: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org
Message-ID: <843cj5hakf.fsf@lucy.is.informatik.uni-duisburg.de>
References: <mailman.6635.1053692285.21513.help-gnu-emacs@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1053716550 396 80.91.224.249 (23 May 2003 19:02:30 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Fri, 23 May 2003 19:02:30 +0000 (UTC)
Original-X-From: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org Fri May 23 21:02:26 2003
Return-path: <help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org>
Original-Received: from monty-python.gnu.org ([199.232.76.173])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19JHoA-00005y-00
	for <gnu-help-gnu-emacs@m.gmane.org>; Fri, 23 May 2003 21:02:26 +0200
Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org)
	by monty-python.gnu.org with esmtp (Exim 4.20)
	id 19JHmK-0006v8-JZ
	for gnu-help-gnu-emacs@m.gmane.org; Fri, 23 May 2003 15:00:32 -0400
Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!fu-berlin.de!uni-berlin.de!pd9e1e697.dip.t-dialin.NET!not-for-mail
Original-Newsgroups: gnu.emacs.help
Original-Lines: 84
Original-NNTP-Posting-Host: pd9e1e697.dip.t-dialin.net (217.225.230.151)
Original-X-Trace: fu-berlin.de 1053716246 1238244 217.225.230.151 (16 [73968])
Mail-Copies-To: never
User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3.50 (gnu/linux)
Cancel-Lock: sha1:i5RoqGzczPyu6zm0C0iaKmSuzeg=
Original-Xref: shelby.stanford.edu gnu.emacs.help:113633
Original-To: help-gnu-emacs@gnu.org
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1b5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Subscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
List-Archive: <http://mail.gnu.org/pipermail/help-gnu-emacs>
List-Unsubscribe: <http://mail.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
Errors-To: help-gnu-emacs-bounces+gnu-help-gnu-emacs=m.gmane.org@gnu.org
Xref: main.gmane.org gmane.emacs.help:10129
X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:10129

Fery <engard.ferenc@innomed.hu> writes:

> I have a UTF-8 text file, containing latin-1 text. When I try to edit it
> with emacs, it does not detect that it is utf-8; the
> describe-coding-system gives back 'iso-latin-1-unix'. (And I see the
> two-byte representation of latin1 chars, which is not bad to me.)

Released versions of Emacs put UTF-8 at a rather low priority for
automatic encoding detection.  So you need to help Emacs by
explicitly specifying the encoding.  Do C-x RET c utf-8 RET before
using C-x C-f to open the file.

You can also put utf-8 somewhat earlier in the list for automatic
encoding detection.  I think this can be achieved in the following
way, but I'm not sure.  I'm not a Mule expert.  If anyone knows
better, please help out.

(setq coding-category-list
      (cons 'coding-category-utf-8
            (delq 'coding-cateogcoding-utf-8
                  coding-category-list)))

> When I save the buffer, it displays an error message:
>
> These default coding systems were tried:
>   iso-latin-1-unix
> However, none of them safely encodes the target text.
>
> Now, no matter what I choose (raw-text, no-conversion, utf-8), it
> modifies all of the utf8 chars which are not fit into the ascii charset.
> It seems, that it inserts a \201 before every char which is not in the
> ascii charset. I.e. if I just load and save a file, emacs does not
> behaves transparently.

You should make sure that UTF-8 is properly recognized when opening
the file, then saving will Just Work.

> I have found one solution: opening the file with
> universal-coding-system-argument, using even UTF-8 (then I see correctly
> the chars, although it is not always important) or e.g. no-conversion.

Do not use no-conversion.  The file is UTF-8, so UTF-8 is the right
encoding to specify.

> My questions:
>
> 0. What is this \201 byte?

Emacs encodes Latin-1 characters internally by a two-byte sequence.
The first byte is \201 (indicating the Latin-1 character set), and
the second byte is the actual character.  \202 stands for Latin-2, as
you might guess.

> 1. Cannot I tell to a buffer (after the load of a file) that interpet it
> as binary, and save exactly the same bytes what it did read into the
> buffer (i.e. transparent buffer)?

It's not a good idea.  The buffer contents might already be munged at
that point.

> 2. What is the difference between raw-text, no-conversion, binary? On
> some places, I can choose any of them, on other places not... This whole
> coding system is a nightmare... :(((

The differences are rather subtle, I'm afraid.  I think binary is an
alias for no-conversion.  raw-text does EOL conversion, whereas
no-conversion doesn't.

> 3. Cannot I tell to emacs that interpret the keyboard input as
> "raw"? I have set input-meta to On, convert-meta to Off in .inputrc,
> and if I could tell emacs that "just interpret the bytes from the
> terminal input what they are", then I could copy/paste utf-8 data
> (in raw format) from another application. (I run emacs on linux,
> with the 'putty' terminal on windows).

It does not make sense to do that, IMHO.  For example, M-f would
cease to work because Emacs wouldn't know what characters are
represented by the bytes, and so it wouldn't know which characters
are parts of words.

But it seems your terminal uses utf-8, so you can just teach Emacs
about this: C-x RET k utf-8 RET.
-- 
This line is not blank.