From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Dan Jacobson <jidanni@jidanni.org>
Newsgroups: gmane.emacs.bugs
Subject: detect big5, utf8, gb2312 as good as firefox
Date: Sat, 25 Feb 2006 06:43:25 +0800
Message-ID: <87lkw04bua.fsf@jidanni.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1141197478 5833 80.91.229.2 (1 Mar 2006 07:17:58 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 1 Mar 2006 07:17:58 +0000 (UTC)
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 01 08:17:58 2006
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1FELaq-00068F-HC
	for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 Mar 2006 08:17:52 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1FELap-0003lW-Rx
	for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 Mar 2006 02:17:51 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1FEEVR-0007wV-UF
	for bug-gnu-emacs@gnu.org; Tue, 28 Feb 2006 18:43:50 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1FClvN-0001FZ-Bk
	for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:34 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1FClvM-0001F4-Tn
	for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:33 -0500
Original-Received: from [204.74.68.40] (helo=frodo.hserus.net)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FClvZ-0004pp-KA
	for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:45 -0500
Original-Received: from tc218-187-20-146.dialup.dynamic.apol.com.tw
	([218.187.20.146]:4529 helo=jidanni1)
	by frodo.hserus.net with esmtpsa 
	(Cipher TLSv1:AES256-SHA:256) (Exim 4.60 #0)
	id 1FClvD-0005fp-5c by authid <jidanni> with plain
	for <bug-gnu-emacs@gnu.org>; Sat, 25 Feb 2006 04:30:24 +0530
Original-To: bug-gnu-emacs@gnu.org
X-BeenThere: bug-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:14876
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/14876>

Gentlemen, with three plain text files of coded in my most commonly
encountered coding systems, I did this test:
$ for c in zh_CN.gb2312 zh_TW.utf8 zh_TW.big5;
do LC_CTYPE=$c LANG=$c LC_ALL=$c emacs -q gb2312 utf8 big5; done
Well, zh_CN.gb2312 guessed right each time!
With zh_TW.utf8, emacs guessed the big5 and gb files were latin-1, as
seen by the 1 in the modeline and the jumble on the screen.
With zh_TW.big5, the gb2312 file was seen jumbled as type big5.
In .emacs I can do
(set-language-environment "UTF-8")
(prefer-coding-system 'utf-8-unix)
(set-coding-priority ;So that big5 is still guessed right after utf-8.
 (reverse ;Found these lisp thingies and it works.
  (delete-duplicates
   (reverse;no lisp pro me
    (append(list 'coding-category-utf-8
		 'coding-category-big5)coding-category-list)))))
to detect all but gb2312 OK. What should I do, make my whole
environment CN even though I only visit those kind of files once a
week, and plan to live in UTF-8 / big5 land ... BTW, firefox guessed
right each time even though they were plain text files with no
charset= hints.

By the way the Info node "Language Environments"
only mentions
     The supported language environments include:
     Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO,
     Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Ethiopic, German,
     Greek, Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2,
     Latin-3, Latin-4, Latin-5, Latin-8 (Celtic), Latin-9 (updated
     Latin-1, with the Euro sign), Polish, Romanian, Slovak, Slovenian,
     Spanish, Thai, Tibetan, Turkish, and Vietnamese.
But not utf-8 (or UTF-8, Utf-8?), (but maybe that's intentional?)