From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dan Jacobson Newsgroups: gmane.emacs.bugs Subject: detect big5, utf8, gb2312 as good as firefox Date: Sat, 25 Feb 2006 06:43:25 +0800 Message-ID: <87lkw04bua.fsf@jidanni.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1141197478 5833 80.91.229.2 (1 Mar 2006 07:17:58 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 1 Mar 2006 07:17:58 +0000 (UTC) Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 01 08:17:58 2006 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FELaq-00068F-HC for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 Mar 2006 08:17:52 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FELap-0003lW-Rx for geb-bug-gnu-emacs@m.gmane.org; Wed, 01 Mar 2006 02:17:51 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FEEVR-0007wV-UF for bug-gnu-emacs@gnu.org; Tue, 28 Feb 2006 18:43:50 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FClvN-0001FZ-Bk for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:34 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FClvM-0001F4-Tn for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:33 -0500 Original-Received: from [204.74.68.40] (helo=frodo.hserus.net) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FClvZ-0004pp-KA for bug-gnu-emacs@gnu.org; Fri, 24 Feb 2006 18:00:45 -0500 Original-Received: from tc218-187-20-146.dialup.dynamic.apol.com.tw ([218.187.20.146]:4529 helo=jidanni1) by frodo.hserus.net with esmtpsa (Cipher TLSv1:AES256-SHA:256) (Exim 4.60 #0) id 1FClvD-0005fp-5c by authid with plain for ; Sat, 25 Feb 2006 04:30:24 +0530 Original-To: bug-gnu-emacs@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:14876 Archived-At: Gentlemen, with three plain text files of coded in my most commonly encountered coding systems, I did this test: $ for c in zh_CN.gb2312 zh_TW.utf8 zh_TW.big5; do LC_CTYPE=$c LANG=$c LC_ALL=$c emacs -q gb2312 utf8 big5; done Well, zh_CN.gb2312 guessed right each time! With zh_TW.utf8, emacs guessed the big5 and gb files were latin-1, as seen by the 1 in the modeline and the jumble on the screen. With zh_TW.big5, the gb2312 file was seen jumbled as type big5. In .emacs I can do (set-language-environment "UTF-8") (prefer-coding-system 'utf-8-unix) (set-coding-priority ;So that big5 is still guessed right after utf-8. (reverse ;Found these lisp thingies and it works. (delete-duplicates (reverse;no lisp pro me (append(list 'coding-category-utf-8 'coding-category-big5)coding-category-list))))) to detect all but gb2312 OK. What should I do, make my whole environment CN even though I only visit those kind of files once a week, and plan to live in UTF-8 / big5 land ... BTW, firefox guessed right each time even though they were plain text files with no charset= hints. By the way the Info node "Language Environments" only mentions The supported language environments include: Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Ethiopic, German, Greek, Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4, Latin-5, Latin-8 (Celtic), Latin-9 (updated Latin-1, with the Euro sign), Polish, Romanian, Slovak, Slovenian, Spanish, Thai, Tibetan, Turkish, and Vietnamese. But not utf-8 (or UTF-8, Utf-8?), (but maybe that's intentional?)