From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de
Subject: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k
Date: Mon, 02 Mar 2009 20:43:58 +0900 [thread overview]
Message-ID: <E1Le6Yw-0006th-Om@etlken> (raw)
In-Reply-To: <uab86q1ih.fsf@gnu.org> (message from Eli Zaretskii on Sat, 28 Feb 2009 12:49:58 +0200)
In article <uab86q1ih.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> M-: (coding-system-priority-list) RET
>>> (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided)
> So UTF-8 is indeed ``pretty high'', but lower than the locale's
> default.
> > So this still looks like a real bug.
> Perhaps it is, but I didn't know Emacs 23 can reliably distinguish
> between Latin-1 and UTF-8, even when UTF-8 sequences are present in
> the text. Can we do that reliably? Perhaps Handa-san can shed some
> light on this.
The coding system iso-latin-1 is for the character set
iso-8859-1, and the code-space of iso-8859-1 is 0x00..0xFF
(without gap, i.e. including 0x80..0x9F) (see
/usr/share/i18n/charmaps/ISO-8859-1.gz). So, if we follows
it strictly, any byte sequence can be a correct iso-8859-1
stream, and it means that when iso-latin-1 has the highest
priority, all files are detected as iso-latin-1.
So, as far as we strictly follows the definition of
iso-8859-1...
In article <jwv7i3az0fc.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> That seems to be the source of the problem. utf-8 should always come
> before latin-1 in that list, since utf-8 streams that are valid latin-1
> streams are not uncommon, whereas latin-1 streams that are valid utf-8
> streams are extremely rare.
I think that is the only solution.
In article <87ab86ah9z.fsf@tum.de>, Uwe Siart <uwe.siart@tum.de> writes:
> Assumed this is not possible right now we should distinguish between
> »high reliability« and »poor reliability«. From my perception it has
> been much more reliable earlier so (as a user with limited viewpoint)
> I vote for reverting the change.
In Emacs 22, the coding system iso-latin-1 was defined as a
variant of iso-2022-based coding system, and thus 0x80..0x9F
were not a valid byte (except for 0x91 and etc. in
latin-extra-code-table). So, some of UTF-8 texts were not
detected as iso-latin-1.
To recover that behaviour, we can define iso-latin-1 as
before by doing this:
(define-coding-system 'iso-latin-1
"Emacs 22 iso-latin-1."
:mnemonic ?1
:coding-type 'iso-2022
:charset-list '(ascii latin-iso8859-1)
:ascii-compatible-p t
:mime-charset 'iso-8859-1
:designation [ascii latin-iso8859-1 nil nil])
But, even with that, still some valid UTF-8 texts will be
detected as iso-latin-1. So I don't think this is the
solution of "high reliability".
---
Kenichi Handa
handa@m17n.org
next prev parent reply other threads:[~2009-03-02 11:43 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <uzlg6oiq3.fsf@gnu.org>
2009-02-17 10:35 ` bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 David Engster
2009-02-17 16:45 ` Juanma Barranquero
2009-02-17 18:04 ` David Engster
2009-02-28 12:30 ` bug#2354: marked as done (23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1) Emacs bug Tracking System
2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart
2009-02-27 16:03 ` Eli Zaretskii
2009-02-27 16:48 ` Uwe Siart
2009-02-27 18:19 ` Eli Zaretskii
2009-02-27 20:35 ` Uwe Siart
2009-02-28 4:40 ` Stefan Monnier
2009-02-28 8:17 ` Uwe Siart
2009-02-28 10:14 ` David Engster
2009-02-28 12:09 ` Eli Zaretskii
2009-02-28 14:16 ` Jason Rumney
2009-02-28 14:31 ` David Engster
2009-02-28 22:00 ` Stefan Monnier
2009-02-28 10:49 ` Eli Zaretskii
2009-02-28 12:16 ` Uwe Siart
2009-02-28 22:04 ` Stefan Monnier
2009-03-02 11:43 ` Kenichi Handa [this message]
2009-03-02 15:25 ` Stefan Monnier
2009-03-02 19:25 ` Eli Zaretskii
2009-03-03 16:34 ` Stefan Monnier
2009-02-27 16:11 ` Juanma Barranquero
2009-02-27 16:16 ` Juanma Barranquero
2009-02-27 16:27 ` Uwe Siart
2009-02-27 16:32 ` Juanma Barranquero
2009-02-27 16:23 ` Uwe Siart
2009-02-27 16:38 ` Juanma Barranquero
2009-02-27 18:19 ` Eli Zaretskii
2009-02-27 20:38 ` Juanma Barranquero
2009-02-28 1:29 ` Jason Rumney
2009-02-27 17:02 ` Leo
2009-02-27 17:46 ` David Engster
2009-02-27 21:15 ` Uwe Siart
2009-02-28 1:32 ` Jason Rumney
2009-02-28 1:35 ` Processed (with 5 errors): " Emacs bug Tracking System
2009-02-27 23:34 ` bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Richard M Stallman
2009-02-28 9:47 ` Uwe Siart
2009-02-28 18:08 ` Richard M Stallman
2009-02-28 12:30 ` bug#2497: marked as done (23.0.91; Fails to read UTF-8 on Win2k) Emacs bug Tracking System
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1Le6Yw-0006th-Om@etlken \
--to=handa@m17n.org \
--cc=2497@emacsbugs.donarmstrong.com \
--cc=eliz@gnu.org \
--cc=uwe.siart@tum.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.