From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: Displaying bytes (was: Inadequate documentation of silly Date: Mon, 30 Nov 2009 00:01:29 +0200 Organization: JURTA Message-ID: <87d431f2uy.fsf@mail.jurta.org> References: <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> <20091119212550.GG1314@muc.de> <87my2ign8u.fsf@lola.goethe.zz> <912155b0911231334s2b52e8eq864251c9aed386b3@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1259533230 19651 80.91.229.12 (29 Nov 2009 22:20:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 29 Nov 2009 22:20:30 +0000 (UTC) Cc: per.starback@gmail.com, dak@gnu.org, rms@gnu.org, Kenichi Handa , emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 29 23:20:23 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NEs7y-0007GJ-C8 for ged-emacs-devel@m.gmane.org; Sun, 29 Nov 2009 23:20:22 +0100 Original-Received: from localhost ([127.0.0.1]:43928 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NEs7x-0001rX-Re for ged-emacs-devel@m.gmane.org; Sun, 29 Nov 2009 17:20:21 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NEs73-0001MY-Ps for emacs-devel@gnu.org; Sun, 29 Nov 2009 17:19:25 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NEs6z-0001LB-8t for emacs-devel@gnu.org; Sun, 29 Nov 2009 17:19:25 -0500 Original-Received: from [199.232.76.173] (port=36916 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NEs6z-0001L7-5T for emacs-devel@gnu.org; Sun, 29 Nov 2009 17:19:21 -0500 Original-Received: from smtp-out3.starman.ee ([85.253.0.5]:52421 helo=mx1.starman.ee) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NEs6v-0001H0-Ow; Sun, 29 Nov 2009 17:19:17 -0500 X-Virus-Scanned: by Amavisd-New at mx1.starman.ee Original-Received: from mail.starman.ee (82.131.32.20.cable.starman.ee [82.131.32.20]) by mx1.starman.ee (Postfix) with ESMTP id 942DB3F410E; Mon, 30 Nov 2009 00:19:09 +0200 (EET) In-Reply-To: (Stefan Monnier's message of "Sun, 29 Nov 2009 11:31:55 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (x86_64-pc-linux-gnu) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117934 Archived-At: >> Here's another idea. We could employ some heuristics to see if the >> distribution of those characters seems typical for the way those >> characters are used. For instance, some of the punctuation characters > > Using such heursitics might be a good idea in general to automatically > detect which encoding is used, or which language is used. Unicad (http://www.emacswiki.org/emacs/Unicad) uses statistic models to auto-detect windows-1252 and many many other coding systems (auto-detecting windows-1252 is not advertised on the main page, but actually can be observed in source code). The theory is described at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html I hope sometime this will be added to Emacs. -- Juri Linkov http://www.jurta.org/emacs/