From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Questions on charset encoding detection and keyboard layout Date: Fri, 11 Dec 2009 22:54:04 +0200 Message-ID: <833a3h8c5v.fsf@gnu.org> References: <4B2108C2.8000506@gmail.com> <83k4wu8xdy.fsf@gnu.org> <4B21DBCF.1020607@gmail.com> <83fx7h9a0z.fsf@gnu.org> <4B22A2D6.9080302@gmail.com> NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1260564911 20468 80.91.229.12 (11 Dec 2009 20:55:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 11 Dec 2009 20:55:11 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Dec 11 21:55:03 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NJCVy-0005RW-8x for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 21:55:02 +0100 Original-Received: from localhost ([127.0.0.1]:41300 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NJCVy-0000v6-81 for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 15:55:02 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NJCVW-0000tV-Gq for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:34 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NJCVR-0000qJ-Iw for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:33 -0500 Original-Received: from [199.232.76.173] (port=43474 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NJCVR-0000qA-Bh for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:29 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:34814) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NJCVQ-0004LY-LL for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 15:54:28 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0KUI00A009RZN200@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 22:53:20 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([77.126.213.252]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KUI005LKA0S34K0@a-mtaout22.012.net.il> for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 22:53:17 +0200 (IST) In-reply-to: <4B22A2D6.9080302@gmail.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70607 Archived-At: > Date: Sat, 12 Dec 2009 03:51:50 +0800 > From: "Hou, Ruoyu" > > Before switching to Emacs I've been using EmEditor, a proprietary editor > under Windows. It could auto-detect those files with different encodings > and prompt a coding list in statistical confidence order for me to > determine the most likely file encoding. So I guess it may implements > certain statistical algorithm to detect the proper encoding. This feature still awaits a volunteer to be added to Emacs. It shouldn't be too hard, I think. > A friend of mine, a Vim user, showed me handling those different > encodings by ":set fencs=(a list of possible encodings, the point is to > put euc-jp before gbk)". The customization I suggested, i.e. (prefer-coding-system 'euc-jp) was supposed to make euc-jp of higher priority than GBK (and everything else). However, I understand it did you more harm than good. For more fine-grain control, try calling set-coding-system-priority for every encoding you need to deal with, and in such an order that the resulting list returned by coding-system-priority-list would show the encodings in the order you want them. (These two functions are documented in the ELisp manual.) I'm not sure this will have the same effect as ":set fencs" in vim, though. > The classification for document > storage is a good idea and habit, only if I had the foresight. It's a > bit unrealistic when facing a large quantity of unsorted documents in > different encodings already on the disk and constantly increasing (as I > always complain, why can't those guys just use UTF-8?). Is it possible > to for example write a script to distinguish and sort those documents? I would try to find a program that could print a file's encoding. `file' does not do that, but maybe there's something else out there.