all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Simon Josefsson <jas@extundo.com>
Cc: emacs-devel@gnu.org
Subject: Re: Cyrillic vs UTF-8
Date: Fri, 25 Apr 2003 19:09:07 +0200	[thread overview]
Message-ID: <iluvfx21p3g.fsf@latte.josefsson.org> (raw)
In-Reply-To: <1858-Fri25Apr2003194023+0300-eliz@elta.co.il> (Eli Zaretskii's message of "Fri, 25 Apr 2003 19:40:23 +0300")

"Eli Zaretskii" <eliz@elta.co.il> writes:

>> From: Simon Josefsson <jas@extundo.com>
>> Date: Fri, 25 Apr 2003 18:12:17 +0200
>> 
>> I think there are two problems.  Opening the file the first time
>> should guess it is a utf-8 file.
>
> IIRC, you need to make the priority of utf-8 higher for this to
> happen.  Unless that's changed in the current CVS, try evaluating the
> following expression:
>
>   (prefer-coding-system 'utf-8)
>
> before you visit a utf-8 encoded file, and see if that helps.  I think
> this is because the encoding detection routines cannot distinguish
> between Latin-n and utf encoding without some help.

This works, but note that Emacs didn't recognize the file as being in
any encoding without it.  The modeline says '-:--'.

It seems binary is preferred over utf-8 and utf-16-* in
coding-category-list.  This seems extremely conservative.  I guess it
means UTF-8 can never be autodetected by default?  Is the unicode
support so bad it shouldn't even be preferred over binary?  UTF-8 is
well formed and restricted; detecting it properly (even compared to
Latin-n) can be done well enough that failures rarely happen in
practice.

Can't we move binary down below UTF-8 in CVS?  IMHO we should move
UTF-8 earlier still, since determining whether data is UTF-8 or not
can be done with good probability.  Prefering binary over UTF-8 seems
just wrong.

There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
say, but it has been removed both in 21.3 and in CVS.  I thought that
meant UTF-8 was better supported now, but this doesn't seem to be the
case.

  reply	other threads:[~2003-04-25 17:09 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson
2003-04-25 16:40 ` Eli Zaretskii
2003-04-25 17:09   ` Simon Josefsson [this message]
2003-04-25 22:39     ` Eli Zaretskii
2003-04-26  8:11     ` Kenichi Handa
2003-04-26 12:25       ` Simon Josefsson
2003-04-28  9:18         ` Kenichi Handa
2003-04-28 11:11           ` Simon Josefsson
2003-04-26 16:21       ` Benjamin Riefenstahl
2003-04-26 16:27         ` Benjamin Riefenstahl
2003-04-28  4:38       ` Richard Stallman
2003-05-01  8:27         ` Kenichi Handa
2003-05-02  7:06           ` Richard Stallman
2003-05-02 21:51             ` Eli Zaretskii
2003-05-03 13:37               ` Juanma Barranquero
2003-05-03 19:04                 ` Eli Zaretskii
2003-05-04 13:03               ` Richard Stallman
2003-05-04 11:04           ` Dave Love
2003-05-04 12:01             ` Simon Josefsson
2003-05-04 17:13               ` Dave Love
2003-05-04 18:03                 ` Simon Josefsson
2003-05-05  8:47             ` Kenichi Handa
2003-04-26 13:44     ` Richard Stallman
2003-04-26 14:10       ` Simon Josefsson
2003-04-28 21:49     ` Stefan Monnier
2003-04-28 22:29       ` Simon Josefsson
2003-04-29 13:49         ` Stefan Monnier
2003-04-29 14:27           ` Simon Josefsson
2003-04-30  4:42             ` Stephen J. Turnbull
2003-04-30  5:43           ` Richard Stallman
2003-05-19  0:40       ` Kenichi Handa
2003-05-19  0:52         ` Stefan Monnier
2003-05-19  2:31           ` Kenichi Handa
2003-05-19 13:28             ` Stefan Monnier
2003-05-19 13:49               ` Stefan Monnier
2003-04-25 16:54 ` Simon Josefsson
2003-04-26  3:55   ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull
2003-04-28 11:09     ` Kenichi Handa
2003-04-28 12:27       ` Implementing charset-aware X font names Stephen J. Turnbull
2003-05-01 11:13         ` Kenichi Handa
2003-05-01 14:14           ` Alex Schroeder
2003-05-01 23:16             ` Kenichi Handa
2003-04-26  7:59   ` Cyrillic vs UTF-8 Kenichi Handa
2003-04-26 12:14     ` Simon Josefsson
2003-05-01  7:20       ` Kenichi Handa
2003-05-01 14:06         ` Alex Schroeder
2003-05-01 18:03         ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz
2003-05-02  5:17           ` Customizing fontsets Alex Schroeder
2003-05-02  6:32             ` Kenichi Handa
2003-05-02 13:25               ` Stefan Monnier
2003-05-03  0:40               ` Oliver Scholz
2003-05-03  1:50                 ` Kenichi Handa
2003-05-03 12:08                   ` Oliver Scholz
2003-05-07  1:22                     ` Kenichi Handa
2003-05-03  0:33             ` Oliver Scholz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=iluvfx21p3g.fsf@latte.josefsson.org \
    --to=jas@extundo.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.