From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Werner LEMBERG Newsgroups: gmane.emacs.devel Subject: Re: coding tags and utf-16 Date: Sat, 24 Dec 2005 00:43:29 +0100 (CET) Message-ID: <20051224.004329.71165760.wl@gnu.org> References: <20051221.090033.182620434.wl@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1135381429 16467 80.91.229.2 (23 Dec 2005 23:43:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 23 Dec 2005 23:43:49 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Dec 24 00:43:48 2005 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EpwZf-0006Rv-It for ged-emacs-devel@m.gmane.org; Sat, 24 Dec 2005 00:43:47 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Epwak-0004fj-1m for ged-emacs-devel@m.gmane.org; Fri, 23 Dec 2005 18:44:54 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Epwab-0004en-DY for emacs-devel@gnu.org; Fri, 23 Dec 2005 18:44:45 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EpwaZ-0004eP-Vy for emacs-devel@gnu.org; Fri, 23 Dec 2005 18:44:45 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EpwaZ-0004eM-T3 for emacs-devel@gnu.org; Fri, 23 Dec 2005 18:44:43 -0500 Original-Received: from [212.227.126.188] (helo=moutng.kundenserver.de) by monty-python.gnu.org with esmtp (Exim 4.34) id 1EpwZw-0005Re-Vp for emacs-devel@gnu.org; Fri, 23 Dec 2005 18:44:05 -0500 Original-Received: from [62.143.170.23] (helo=rigel.site) by mrelayeu.kundenserver.de (node=mrelayeu4) with ESMTP (Nemesis), id 0ML21M-1EpwZT0o8w-000552; Sat, 24 Dec 2005 00:43:35 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by rigel.site (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBNNhTT5016948 for ; Sat, 24 Dec 2005 00:43:30 +0100 Original-To: emacs-devel@gnu.org In-Reply-To: <20051221.090033.182620434.wl@gnu.org> X-Mailer: Mew version 4.2.54 on Emacs 22.0.50.1 / Mule 5.0 (SAKAKI) X-Provags-ID: kundenserver.de abuse@kundenserver.de login:2dc398bc694a1e60948148ba0a42c0da X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:48276 Archived-At: > There is a serious problem with coding tags and utf-16 encodings of > any flavour: Emacs simply can't recognize the tag. [...] Surprisingly, I saw no response on the list which either means that my mail hasn't come through, nobody is interested in this problem, or that it is a non-issue. In case it won't get fixed I suggest to add it to the TODO list, together with a not in the emacs manual that coding tags don't work with utf-16 encoding flavours. Werner > This is a non-trivial problem. Right now I'm working on a groff > preprocessor which tries to handle this. I'm doing the following to > find the tag in an encoding-independent way: > > . Check whether the file starts with the BOM (Byte Order Mark) -- > this is one of the following byte sequences: > > UTF-8: 0xEFBBBF > UTF-16: 0xFEFF or 0xFFFE > > Skip it. > > . Ignore zero bytes while looking for the -*- coding: ... -*- > stuff. > > This heuristic algorithm might not give correct results in all cases > but it should be sufficiently reliable for normal use.