From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Hou, Ruoyu" Newsgroups: gmane.emacs.help Subject: Re: Questions on charset encoding detection and keyboard layout Date: Sat, 12 Dec 2009 03:51:50 +0800 Message-ID: <4B22A2D6.9080302@gmail.com> References: <4B2108C2.8000506@gmail.com> <83k4wu8xdy.fsf@gnu.org> <4B21DBCF.1020607@gmail.com> <83fx7h9a0z.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1260562117 11771 80.91.229.12 (11 Dec 2009 20:08:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 11 Dec 2009 20:08:37 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Dec 11 21:08:30 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NJBmp-0001yI-UK for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 21:08:24 +0100 Original-Received: from localhost ([127.0.0.1]:57907 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NJBmp-0006g6-SE for geh-help-gnu-emacs@m.gmane.org; Fri, 11 Dec 2009 15:08:23 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NJBX9-0006Xc-21 for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 14:52:11 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NJBX0-0006KQ-A2 for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 14:52:06 -0500 Original-Received: from [199.232.76.173] (port=36547 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NJBWz-0006K4-Jk for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 14:52:01 -0500 Original-Received: from mail-ew0-f209.google.com ([209.85.219.209]:36592) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NJBWz-00068z-6G for help-gnu-emacs@gnu.org; Fri, 11 Dec 2009 14:52:01 -0500 Original-Received: by ewy1 with SMTP id 1so1387982ewy.8 for ; Fri, 11 Dec 2009 11:51:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=6SaWDt1ywMwjydmn58DT20MZ/rCe2uMdgTLluUbeOdE=; b=E3xLxMBfiroHXwaYnPo3imGbBy9jtGO+rddOY3bG/SQVawT66E3e79uQ6/D+1uAs0r 76Xrm8em3VElR8+b8b3UdMDySf92J+NRMSub9WI/iydp8mmvlwyLzCfZE2DosUgeZdDO IrywpJF7W2jj9ICyQyEmZ3JtqccT90uJayZ7w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=VAVZ6yz74Ck3LgNfTjwJUouG6UUdqAbCFfOErESQtvaJZheU8SdaE22EVPaixP8P/O gwu0JGjpDXTvq1IwibmzcXc4e+FzUxK7THkUw0iHDc2Bg5ZOPiDeO2+FBlsqNNr/o54v s2eBeAhoYpJWw6cwplAnJ99VLVvVwhLV8cIvw= Original-Received: by 10.216.86.65 with SMTP id v43mr741231wee.118.1260561118716; Fri, 11 Dec 2009 11:51:58 -0800 (PST) Original-Received: from ?192.168.1.105? ([219.228.126.137]) by mx.google.com with ESMTPS id m5sm7726083gve.12.2009.12.11.11.51.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 11 Dec 2009 11:51:57 -0800 (PST) User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) In-Reply-To: <83fx7h9a0z.fsf@gnu.org> X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:70605 Archived-At: Dear Zaretskii, Before switching to Emacs I've been using EmEditor, a proprietary editor under Windows. It could auto-detect those files with different encodings and prompt a coding list in statistical confidence order for me to determine the most likely file encoding. So I guess it may implements certain statistical algorithm to detect the proper encoding. I also tried MadEdit, an open source cross-platform editor. So far it could automatically decode files it handled even without the need for me to choose a likely one. I am not skilled to read its source code so I can't tell how it is done. Also I don't know how MULE handles the coding detection case. A friend of mine, a Vim user, showed me handling those different encodings by ":set fencs=(a list of possible encodings, the point is to put euc-jp before gbk)". It seems to be done by calling libiconv and libintl(or gettext, I'm not sure). I just thought that my Emacs should perform better or at least equivalent with these softwares. Thanks for your help. I am actually using the commands you mentioned to set encodings for viewing or saving. The classification for document storage is a good idea and habit, only if I had the foresight. It's a bit unrealistic when facing a large quantity of unsorted documents in different encodings already on the disk and constantly increasing (as I always complain, why can't those guys just use UTF-8?). Is it possible to for example write a script to distinguish and sort those documents? Regards, Eli Zaretskii wrote: >> Date: Fri, 11 Dec 2009 13:42:39 +0800 >> From: "Hou, Ruoyu" >> >> I tried the tip you gave me, but now I've got my GBK-encoded files >> unreadable. How you would solve the problem? >> >> Moreover, as I mentioned in the previous post, how could I set a >> prefer-coding-system without beforehand knowledge about the encoding I >> am supposed to encounter? > > If you have many documents in different encodings that Emacs cannot > distinguish by itself, then I'm afraid there's no good solution except > "C-x RET c", which requires that you know the encoding in advance. At > least I'm not aware of any better way. What do other applications do? > > Of course, if you inadvertently visit a file without knowing the > encoding, and want to re-visit it with the correct encoding, after you > notice that Emacs didn't properly decode it, then typing "C-x RET c > CORRECT-ENCODING RET M-x revert-buffer RET" will fix the problem. > Here CORRECT-ENCODING is the correct encoding of the file. > > Also, if you could somehow manage to have documents in different > encodings to reside in different directories, then perhaps you could > set up the directory-local variables to cause Emacs decode the files > in each directory correctly. See the node "Directory Variables" in > the Emacs user manual for details about this feature. > > > -- Hou, Ruoyu Laboratory of Reproductive & Stem Cell Biology, College of Life Science & Biotech., Shanghai Jiao Tong University, Shanghai 200240, P.R.China.