From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Werner LEMBERG Newsgroups: gmane.emacs.devel Subject: coding tags and utf-16 Date: Wed, 21 Dec 2005 09:00:33 +0100 (CET) Message-ID: <20051221.090033.182620434.wl@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1135194546 326 80.91.229.2 (21 Dec 2005 19:49:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 21 Dec 2005 19:49:06 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 21 20:49:05 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Ep9wY-0004wI-20 for ged-emacs-devel@m.gmane.org; Wed, 21 Dec 2005 20:48:10 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ep9xV-0001LK-QE for ged-emacs-devel@m.gmane.org; Wed, 21 Dec 2005 14:49:09 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ep9r0-0006wg-Ma for emacs-devel@gnu.org; Wed, 21 Dec 2005 14:42:26 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ep9qy-0006tj-1H for emacs-devel@gnu.org; Wed, 21 Dec 2005 14:42:25 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ep9qx-0006tQ-IW for emacs-devel@gnu.org; Wed, 21 Dec 2005 14:42:23 -0500 Original-Received: from [212.227.126.188] (helo=moutng.kundenserver.de) by monty-python.gnu.org with esmtp (Exim 4.34) id 1Ep9uI-00024u-VS for emacs-devel@gnu.org; Wed, 21 Dec 2005 14:45:51 -0500 Original-Received: from [62.143.170.23] (helo=rigel.site) by mrelayeu.kundenserver.de (node=mrelayeu5) with ESMTP (Nemesis), id 0ML25U-1Ep9px2vlm-0008G8; Wed, 21 Dec 2005 20:41:21 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by rigel.site (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBL80XT5021889 for ; Wed, 21 Dec 2005 09:00:33 +0100 Original-To: emacs-devel@gnu.org X-Mailer: Mew version 4.2.54 on Emacs 22.0.50.1 / Mule 5.0 (SAKAKI) X-Provags-ID: kundenserver.de abuse@kundenserver.de login:2dc398bc694a1e60948148ba0a42c0da X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:48172 Archived-At: There is a serious problem with coding tags and utf-16 encodings of any flavour: Emacs simply can't recognize the tag. This is a non-trivial problem. Right now I'm working on a groff preprocessor which tries to handle this. I'm doing the following to find the tag in an encoding-independent way: . Check whether the file starts with the BOM (Byte Order Mark) -- this is one of the following byte sequences: UTF-8: 0xEFBBBF UTF-16: 0xFEFF or 0xFFFE Skip it. . Ignore zero bytes while looking for the -*- coding: ... -*- stuff. This heuristic algorithm might not give correct results in all cases but it should be sufficiently reliable for normal use. Werner