Re: problem with editing/decoding utf-8 text

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: kai.grossjohann@gmx.net (Kai Großjohann)
Subject: Re: problem with editing/decoding utf-8 text
Date: Fri, 23 May 2003 18:50:08 +0200	[thread overview]
Message-ID: <843cj5hakf.fsf@lucy.is.informatik.uni-duisburg.de> (raw)
In-Reply-To: mailman.6635.1053692285.21513.help-gnu-emacs@gnu.org

Fery <engard.ferenc@innomed.hu> writes:

> I have a UTF-8 text file, containing latin-1 text. When I try to edit it
> with emacs, it does not detect that it is utf-8; the
> describe-coding-system gives back 'iso-latin-1-unix'. (And I see the
> two-byte representation of latin1 chars, which is not bad to me.)

Released versions of Emacs put UTF-8 at a rather low priority for
automatic encoding detection.  So you need to help Emacs by
explicitly specifying the encoding.  Do C-x RET c utf-8 RET before
using C-x C-f to open the file.

You can also put utf-8 somewhat earlier in the list for automatic
encoding detection.  I think this can be achieved in the following
way, but I'm not sure.  I'm not a Mule expert.  If anyone knows
better, please help out.

(setq coding-category-list
      (cons 'coding-category-utf-8
            (delq 'coding-cateogcoding-utf-8
                  coding-category-list)))

> When I save the buffer, it displays an error message:
>
> These default coding systems were tried:
>   iso-latin-1-unix
> However, none of them safely encodes the target text.
>
> Now, no matter what I choose (raw-text, no-conversion, utf-8), it
> modifies all of the utf8 chars which are not fit into the ascii charset.
> It seems, that it inserts a \201 before every char which is not in the
> ascii charset. I.e. if I just load and save a file, emacs does not
> behaves transparently.

You should make sure that UTF-8 is properly recognized when opening
the file, then saving will Just Work.

> I have found one solution: opening the file with
> universal-coding-system-argument, using even UTF-8 (then I see correctly
> the chars, although it is not always important) or e.g. no-conversion.

Do not use no-conversion.  The file is UTF-8, so UTF-8 is the right
encoding to specify.

> My questions:
>
> 0. What is this \201 byte?

Emacs encodes Latin-1 characters internally by a two-byte sequence.
The first byte is \201 (indicating the Latin-1 character set), and
the second byte is the actual character.  \202 stands for Latin-2, as
you might guess.

> 1. Cannot I tell to a buffer (after the load of a file) that interpet it
> as binary, and save exactly the same bytes what it did read into the
> buffer (i.e. transparent buffer)?

It's not a good idea.  The buffer contents might already be munged at
that point.

> 2. What is the difference between raw-text, no-conversion, binary? On
> some places, I can choose any of them, on other places not... This whole
> coding system is a nightmare... :(((

The differences are rather subtle, I'm afraid.  I think binary is an
alias for no-conversion.  raw-text does EOL conversion, whereas
no-conversion doesn't.

> 3. Cannot I tell to emacs that interpret the keyboard input as
> "raw"? I have set input-meta to On, convert-meta to Off in .inputrc,
> and if I could tell emacs that "just interpret the bytes from the
> terminal input what they are", then I could copy/paste utf-8 data
> (in raw format) from another application. (I run emacs on linux,
> with the 'putty' terminal on windows).

It does not make sense to do that, IMHO.  For example, M-f would
cease to work because Emacs wouldn't know what characters are
represented by the bytes, and so it wouldn't know which characters
are parts of words.

But it seems your terminal uses utf-8, so you can just teach Emacs
about this: C-x RET k utf-8 RET.
-- 
This line is not blank.

next      parent reply	other threads:[~2003-05-23 16:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.6635.1053692285.21513.help-gnu-emacs@gnu.org>
2003-05-23 16:50 ` Kai Großjohann [this message]
2003-05-23 19:23   ` problem with editing/decoding utf-8 text Oliver Scholz
2003-05-23 20:53     ` Kai Großjohann
2003-05-23 21:20 ` Stefan Monnier
     [not found] <mailman.6818.1054022957.21513.help-gnu-emacs@gnu.org>
2003-05-27 11:10 ` Oliver Scholz
     [not found] ` <3ED37785.CA5A9AD5@innomed.hu>
     [not found]   ` <ubrxnb5m2.fsf@ID-87814.user.dfncis.de>
2003-05-30 12:45     ` Fery
     [not found]     ` <mailman.7046.1054298932.21513.help-gnu-emacs@gnu.org>
2003-05-30 13:24       ` Kai Großjohann
     [not found] <mailman.6770.1053942670.21513.help-gnu-emacs@gnu.org>
2003-05-27 11:05 ` Oliver Scholz
2003-05-27 11:41   ` Oliver Scholz
2003-05-27  8:06 Fery
  -- strict thread matches above, loose matches on Subject: below --
2003-05-26  9:47 Fery
2003-05-26  9:47 Fery
2003-05-23 12:08 Fery

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=843cj5hakf.fsf@lucy.is.informatik.uni-duisburg.de \
    --to=kai.grossjohann@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.