From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Sergio Newsgroups: gmane.emacs.help Subject: Re: Coding system prefer Date: Thu, 5 Mar 2009 00:06:14 -0800 (PST) Organization: http://groups.google.com Message-ID: <04ac2466-1f6f-4911-99db-37ed5754341e@s38g2000prg.googlegroups.com> References: <830bd086-fb17-4d24-97a0-7f394bd705a1@x38g2000yqj.googlegroups.com> <08d040d9-1324-4d34-bc6f-9c77bb7a282f@f1g2000prb.googlegroups.com> <87wsb6k2af.fsf@iki.fi> <95acf349-49db-4128-96d3-7e8a25405371@r34g2000vba.googlegroups.com> <88e6f980-ff61-4fcb-b82b-75091391ba8f@n33g2000pri.googlegroups.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1236242502 6224 80.91.229.12 (5 Mar 2009 08:41:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 5 Mar 2009 08:41:42 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Mar 05 09:42:59 2009 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Lf99m-0003yq-1s for geh-help-gnu-emacs@m.gmane.org; Thu, 05 Mar 2009 09:42:18 +0100 Original-Received: from localhost ([127.0.0.1]:56534 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Lf98Q-0001wL-QM for geh-help-gnu-emacs@m.gmane.org; Thu, 05 Mar 2009 03:40:54 -0500 Original-Path: news.stanford.edu!newsfeed.stanford.edu!postnews.google.com!s38g2000prg.googlegroups.com!not-for-mail Original-Newsgroups: gnu.emacs.help Original-Lines: 29 Original-NNTP-Posting-Host: 195.161.50.69 Original-X-Trace: posting.google.com 1236240374 4123 127.0.0.1 (5 Mar 2009 08:06:14 GMT) Original-X-Complaints-To: groups-abuse@google.com Original-NNTP-Posting-Date: Thu, 5 Mar 2009 08:06:14 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: s38g2000prg.googlegroups.com; posting-host=195.161.50.69; posting-account=exrZLAoAAABFy4TCoZNdKd2oG1nld6Pb User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6,gzip(gfe),gzip(gfe) X-HTTP-Via: 1.1 msfwpr02.ims.intel.com:911 (squid/2.6.STABLE18) Original-Xref: news.stanford.edu gnu.emacs.help:167312 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:62604 Archived-At: On Mar 5, 10:19 am, Miles Bader wrote: > Sergio writes: >> The FAR file manager,http://en.wikipedia.org/wiki/FAR_Managerdoes it >> quite reliably using statistics about the character frequency >> distribution. > Does that work for anything except text files containing prose? Yes, it does. Of course it does not work for a binary file; but it works fine for a text file in formal language, like C program with Russian strings or a text with HTML markup. I never explored the internals, but I guess that normally one can ignore the ASCII part; only codes greater than 127 really matter. Of these, one can easily detect utf-8 or other unicode encoding (at least for the alphabetic planes; I never need the CJK part). And there are 8-bit codes, in which the higher part is characteristic. And usually the noise part (like markup or formal language statements) is in ASCII. I never needed EBCDIC or any other encoding which is not a superset of ASCII. -- Sergei