From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.bugs
Subject: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k
Date: Mon, 02 Mar 2009 20:43:58 +0900
Message-ID: <E1Le6Yw-0006th-Om@etlken>
References: <877i3c55tg.fsf@tum.de> <usklzq33v.fsf@gnu.org>
	<87ljrromgg.fsf@tum.de> <uocwnpwtj.fsf@gnu.org>
	<jwv4oyf18le.fsf-monnier+emacsbugreports@gnu.org>
	<uab86q1ih.fsf@gnu.org>
Reply-To: Kenichi Handa <handa@m17n.org>, 2497@emacsbugs.donarmstrong.com
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1236003863 26307 80.91.229.12 (2 Mar 2009 14:24:23 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 2 Mar 2009 14:24:23 +0000 (UTC)
Cc: 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Mar 02 15:25:39 2009
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1Le95G-00082q-RJ
	for geb-bug-gnu-emacs@m.gmane.org; Mon, 02 Mar 2009 15:25:31 +0100
Original-Received: from localhost ([127.0.0.1]:44036 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Le93v-0000FL-Lh
	for geb-bug-gnu-emacs@m.gmane.org; Mon, 02 Mar 2009 09:24:07 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Le6s3-0000Ow-Ko
	for bug-gnu-emacs@gnu.org; Mon, 02 Mar 2009 07:03:43 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Le6s2-0000OY-L7
	for bug-gnu-emacs@gnu.org; Mon, 02 Mar 2009 07:03:43 -0500
Original-Received: from [199.232.76.173] (port=37070 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Le6s2-0000OV-F7
	for bug-gnu-emacs@gnu.org; Mon, 02 Mar 2009 07:03:42 -0500
Original-Received: from rzlab.ucr.edu ([138.23.92.77]:35517)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <debbugs@rzlab.ucr.edu>) id 1Le6s1-0005ys-JX
	for bug-gnu-emacs@gnu.org; Mon, 02 Mar 2009 07:03:42 -0500
Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1])
	by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n22C3cWs007395; 
	Mon, 2 Mar 2009 04:03:39 -0800
Original-Received: (from debbugs@localhost)
	by rzlab.ucr.edu (8.13.8/8.13.8/Submit) id n22Bo2Nl003817;
	Mon, 2 Mar 2009 03:50:02 -0800
X-Loop: owner@emacsbugs.donarmstrong.com
Resent-From: Kenichi Handa <handa@m17n.org>
Resent-To: bug-submit-list@donarmstrong.com
Resent-CC: Emacs Bugs <bug-gnu-emacs@gnu.org>
Resent-Date: Mon, 02 Mar 2009 11:50:02 +0000
Resent-Message-ID: <handler.2497.B2497.12359942082487@emacsbugs.donarmstrong.com>
Resent-Sender: owner@emacsbugs.donarmstrong.com
X-Emacs-PR-Message: followup 2497
X-Emacs-PR-Package: emacs
X-Emacs-PR-Keywords: 
Original-Received: via spool by 2497-submit@emacsbugs.donarmstrong.com
	id=B2497.12359942082487
	(code B ref 2497); Mon, 02 Mar 2009 11:50:02 +0000
Original-Received: (at 2497) by emacsbugs.donarmstrong.com; 2 Mar 2009 11:43:28 +0000
X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available.
	hammytokens:Tokens not available.
Original-Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133])
	by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n22BhO18002481
	for <2497@emacsbugs.donarmstrong.com>; Mon, 2 Mar 2009 03:43:25 -0800
Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123])
	by mx1.aist.go.jp  with ESMTP id n22BhMFJ005580;
	Mon, 2 Mar 2009 20:43:22 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from smtp2.aist.go.jp
	by rqsmtp2.aist.go.jp  with ESMTP id n22BhMKR017824;
	Mon, 2 Mar 2009 20:43:22 +0900 (JST) env-from (handa@m17n.org)
Original-Received: by smtp2.aist.go.jp  with ESMTP id n22BhKWb006691;
	Mon, 2 Mar 2009 20:43:20 +0900 (JST) env-from (handa@m17n.org)
Original-Received: from handa by etlken with local (Exim 4.69)
	(envelope-from <handa@m17n.org>)
	id 1Le6Yw-0006th-Om; Mon, 02 Mar 2009 20:43:58 +0900
In-reply-to: <uab86q1ih.fsf@gnu.org> (message from Eli Zaretskii on Sat, 28
	Feb 2009 12:49:58 +0200)
X-MIME-Autoconverted: from 8bit to quoted-printable by rzlab.ucr.edu id
	n22C3cWs007395
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3)
Resent-Date: Mon, 02 Mar 2009 07:03:43 -0500
X-Mailman-Approved-At: Mon, 02 Mar 2009 09:21:39 -0500
X-BeenThere: bug-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:25914
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/25914>

In article <uab86q1ih.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

>   M-: (coding-system-priority-list) RET
>>> (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2=
 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit u=
tf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-wi=
th-signature utf-16be utf-16le japanese-shift-jis undecided)

> So UTF-8 is indeed ``pretty high'', but lower than the locale's
> default.

> > So this still looks like a real bug.

> Perhaps it is, but I didn't know Emacs 23 can reliably distinguish
> between Latin-1 and UTF-8, even when UTF-8 sequences are present in
> the text.  Can we do that reliably?  Perhaps Handa-san can shed some
> light on this.

The coding system iso-latin-1 is for the character set
iso-8859-1, and the code-space of iso-8859-1 is 0x00..0xFF
(without gap, i.e. including 0x80..0x9F) (see
/usr/share/i18n/charmaps/ISO-8859-1.gz).  So, if we follows
it strictly, any byte sequence can be a correct iso-8859-1
stream, and it means that when iso-latin-1 has the highest
priority, all files are detected as iso-latin-1.

So, as far as we strictly follows the definition of
iso-8859-1...

In article <jwv7i3az0fc.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monn=
ier <monnier@iro.umontreal.ca> writes:

> That seems to be the source of the problem.  utf-8 should always come
> before latin-1 in that list, since utf-8 streams that are valid latin-1
> streams are not uncommon, whereas latin-1 streams that are valid utf-8
> streams are extremely rare.

I think that is the only solution.

In article <87ab86ah9z.fsf@tum.de>, Uwe Siart <uwe.siart@tum.de> writes:

> Assumed this is not possible right now we should distinguish between
> =C2=BBhigh reliability=C2=AB and =C2=BBpoor reliability=C2=AB. From my =
perception it has
> been much more reliable earlier so (as a user with limited viewpoint)
> I vote for reverting the change.

In Emacs 22, the coding system iso-latin-1 was defined as a
variant of iso-2022-based coding system, and thus 0x80..0x9F
were not a valid byte (except for 0x91 and etc. in
latin-extra-code-table).  So, some of UTF-8 texts were not
detected as iso-latin-1.

To recover that behaviour, we can define iso-latin-1 as
before by doing this:

(define-coding-system 'iso-latin-1
  "Emacs 22 iso-latin-1."
  :mnemonic ?1
  :coding-type 'iso-2022
  :charset-list '(ascii latin-iso8859-1)
  :ascii-compatible-p t
  :mime-charset 'iso-8859-1
  :designation [ascii latin-iso8859-1 nil nil])

But, even with that, still some valid UTF-8 texts will be
detected as iso-latin-1.  So I don't think this is the
solution of "high reliability".

---
Kenichi Handa
handa@m17n.org