From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.bugs,gmane.spam.detected Subject: Re: detect big5, utf8, gb2312 as good as firefox Date: Wed, 01 Mar 2006 09:16:31 -0700 Message-ID: References: <87lkw04bua.fsf@jidanni.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1141497857 24729 80.91.229.2 (4 Mar 2006 18:44:17 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 4 Mar 2006 18:44:17 +0000 (UTC) Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 04 19:44:16 2006 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FFbjh-0000KY-1n for geb-bug-gnu-emacs@m.gmane.org; Sat, 04 Mar 2006 19:44:14 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FFbjl-0004xj-1L for geb-bug-gnu-emacs@m.gmane.org; Sat, 04 Mar 2006 13:44:17 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FFZPL-0005tu-K7 for bug-gnu-emacs@gnu.org; Sat, 04 Mar 2006 11:15:03 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FFZPC-0005mh-VG for bug-gnu-emacs@gnu.org; Sat, 04 Mar 2006 11:14:58 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FEi8i-0004mv-4w for bug-gnu-emacs@gnu.org; Thu, 02 Mar 2006 02:22:20 -0500 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FEU4D-00039Z-T9 for bug-gnu-emacs@gnu.org; Wed, 01 Mar 2006 11:20:46 -0500 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1FEU1Q-0008Si-9c for bug-gnu-emacs@gnu.org; Wed, 01 Mar 2006 17:17:53 +0100 Original-Received: from 207.167.42.60 ([207.167.42.60]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 01 Mar 2006 17:17:52 +0100 Original-Received: from ihs_4664 by 207.167.42.60 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 01 Mar 2006 17:17:52 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-To: bug-gnu-emacs@gnu.org Original-Lines: 49 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 207.167.42.60 User-Agent: Mozilla Thunderbird 0.9 (X11/20041105) X-Accept-Language: en-us, en In-Reply-To: <87lkw04bua.fsf@jidanni.org> X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org X-Spam-Report: 6.7 points; * 0.1 RCVD_BY_IP Received by mail server with no name * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record * -0.0 SPF_PASS SPF: sender matches SPF record * 4.0 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO * -0.4 BAYES_05 BODY: Bayesian spam probability is 1 to 5% * [score: 0.0170] * 3.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net * [Blocked - see ] Xref: news.gmane.org gmane.emacs.bugs:14926 gmane.spam.detected:1395403 Archived-At: Dan Jacobson wrote: > Gentlemen, with three plain text files of coded in my most commonly > encountered coding systems, I did this test: > $ for c in zh_CN.gb2312 zh_TW.utf8 zh_TW.big5; > do LC_CTYPE=$c LANG=$c LC_ALL=$c emacs -q gb2312 utf8 big5; done > Well, zh_CN.gb2312 guessed right each time! > With zh_TW.utf8, emacs guessed the big5 and gb files were latin-1, as > seen by the 1 in the modeline and the jumble on the screen. > With zh_TW.big5, the gb2312 file was seen jumbled as type big5. > In .emacs I can do > (set-language-environment "UTF-8") > (prefer-coding-system 'utf-8-unix) > (set-coding-priority ;So that big5 is still guessed right after utf-8. > (reverse ;Found these lisp thingies and it works. > (delete-duplicates > (reverse;no lisp pro me > (append(list 'coding-category-utf-8 > 'coding-category-big5)coding-category-list))))) > to detect all but gb2312 OK. What should I do, make my whole > environment CN even though I only visit those kind of files once a > week, and plan to live in UTF-8 / big5 land ... BTW, firefox guessed > right each time even though they were plain text files with no > charset= hints. I would instead try this first: ;; From least- to most-preferred: (prefer-coding-system 'gb2312) (prefer-coding-system 'big5) (prefer-coding-system 'utf-8) If you insist on frobbing coding-category-list: (set-coding-priority ;; Insert big5 after utf-8: (apply 'nconc (mapcar (lambda (coding-category) (if (eq coding-category 'coding-category-utf-8) (list 'coding-category-utf-8 'coding-category-big5) (list coding-category))) coding-category-list))) Note that you can replace (apply 'nconc (mapcar ...)) with (require 'cl)(mapcan ...) Do you get better or worse results with those approaches? -- Kevin Rodgers