From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Dominic Cronin Newsgroups: gmane.emacs.help Subject: Re: How to make emacs auto-recognize utf-8 encoded files upon visiting Date: Tue, 24 Sep 2002 20:57:01 +0200 Organization: Posted via Supernews, http://www.supernews.com Sender: help-gnu-emacs-admin@gnu.org Message-ID: References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: main.gmane.org 1032894685 2784 127.0.0.1 (24 Sep 2002 19:11:25 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 24 Sep 2002 19:11:25 +0000 (UTC) Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17tv5g-0000im-00 for ; Tue, 24 Sep 2002 21:11:24 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17tv5l-00030E-00; Tue, 24 Sep 2002 15:11:29 -0400 Original-Path: shelby.stanford.edu!nntp.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-xit-03!sn-xit-06!sn-post-01!supernews.com!corp.supernews.com!not-for-mail Original-Newsgroups: gnu.emacs.help X-Newsreader: Forte Agent 1.8/32.553 Original-X-Complaints-To: abuse@supernews.com Original-Lines: 30 Original-Xref: nntp.stanford.edu gnu.emacs.help:105254 Original-To: help-gnu-emacs@gnu.org Errors-To: help-gnu-emacs-admin@gnu.org X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.help:1809 X-Report-Spam: http://spam.gmane.org/gmane.emacs.help:1809 On 23 Sep 2002 18:39:19 +0200, Gerald Wildgruber wrote: > >Hello, > >I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize >the right encoding when visiting files with utf-8 encoding. The emacs info >help entry says on the topic: > >"Some coding systems can be recognized or distinguished by which byte >sequences appear in the data. However, there are coding systems that cannot >be distinguished, not even potentially." > >Does this also apply to utf-8 encoded files? Is it impossible for emacs to >auto-recognize them (as for example the `file' command on the shell does)? The RFC for UTF-8 (see http://www.ietf.org/rfc/rfc2279.txt) states: UTF-8 strings can be fairly reliably recognized as such by a simple algorithm, i.e. the probability that a string of characters in any other encoding appears as valid UTF-8 is low, diminishing with increasing string length. BTW - the RFC is quite an interesting read: an elegant solution to a problem. -- Dominic Cronin Amsterdam