From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.help
Subject: Re: Questions on charset encoding detection and keyboard layout
Date: Fri, 11 Dec 2009 22:54:04 +0200
Message-ID: <833a3h8c5v.fsf@gnu.org>
References: <4B2108C2.8000506@gmail.com> <83k4wu8xdy.fsf@gnu.org>
	<4B21DBCF.1020607@gmail.com> <83fx7h9a0z.fsf@gnu.org>
	<4B22A2D6.9080302@gmail.com>
NNTP-Posting-Host: lo.gmane.org
X-Trace: ger.gmane.org 1260564911 20468 80.91.229.12 (11 Dec 2009 20:55:11 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 11 Dec 2009 20:55:11 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Dec 11 21:55:03 2009
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1NJCVy-0005RW-8x
	for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 21:55:02 +0100
Original-Received: from localhost ([127.0.0.1]:41300 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NJCVy-0000v6-81
	for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 15:55:02 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NJCVW-0000tV-Gq
	for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:34 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NJCVR-0000qJ-Iw
	for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:33 -0500
Original-Received: from [199.232.76.173] (port=43474 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NJCVR-0000qA-Bh
	for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:29 -0500
Original-Received: from mtaout22.012.net.il ([80.179.55.172]:34814)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <eliz@gnu.org>) id 1NJCVQ-0004LY-LL
	for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:28 -0500
Original-Received: from conversion-daemon.a-mtaout22.012.net.il by
	a-mtaout22.012.net.il (HyperSendmail v2007.08) id
	<0KUI00A009RZN200@a-mtaout22.012.net.il> for
	help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 22:53:20 +0200 (IST)
Original-Received: from HOME-C4E4A596F7 ([77.126.213.252]) by a-mtaout22.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0KUI005LKA0S34K0@a-mtaout22.012.net.il> for
	help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 22:53:17 +0200 (IST)
In-reply-to: <4B22A2D6.9080302@gmail.com>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:70607
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/70607>

> Date: Sat, 12 Dec 2009 03:51:50 +0800
> From: "Hou, Ruoyu" <phoenixhou@gmail.com>
> 
> Before switching to Emacs I've been using EmEditor, a proprietary editor 
> under Windows. It could auto-detect those files with different encodings 
> and prompt a coding list in statistical confidence order for me to 
> determine the most likely file encoding. So I guess it may implements 
> certain statistical algorithm to detect the proper encoding.

This feature still awaits a volunteer to be added to Emacs.  It
shouldn't be too hard, I think.

> A friend of mine, a Vim user, showed me handling those different 
> encodings by ":set fencs=(a list of possible encodings, the point is to 
> put euc-jp before gbk)".

The customization I suggested, i.e.

  (prefer-coding-system 'euc-jp)

was supposed to make euc-jp of higher priority than GBK (and
everything else).  However, I understand it did you more harm than
good.

For more fine-grain control, try calling set-coding-system-priority
for every encoding you need to deal with, and in such an order that
the resulting list returned by coding-system-priority-list would show
the encodings in the order you want them.  (These two functions are
documented in the ELisp manual.)  I'm not sure this will have the same
effect as ":set fencs" in vim, though.

> The classification for document 
> storage is a good idea and habit, only if I had the foresight. It's a 
> bit unrealistic when facing a large quantity of unsorted documents in 
> different encodings already on the disk and constantly increasing (as I 
> always complain, why can't those guys just use UTF-8?). Is it possible 
> to for example write a script to distinguish and sort those documents?

I would try to find a program that could print a file's
encoding. `file' does not do that, but maybe there's something else
out there.