From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: coding tags and utf-16 Date: Wed, 04 Jan 2006 15:42:23 +0900 Message-ID: References: <20051221.090033.182620434.wl@gnu.org> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1136358264 1782 80.91.229.2 (4 Jan 2006 07:04:24 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 4 Jan 2006 07:04:24 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 04 08:04:18 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Eu2gx-0004XV-Uh for ged-emacs-devel@m.gmane.org; Wed, 04 Jan 2006 08:04:16 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Eu2ib-0004gD-L4 for ged-emacs-devel@m.gmane.org; Wed, 04 Jan 2006 02:05:58 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Eu2Nd-0007Wd-Vn for emacs-devel@gnu.org; Wed, 04 Jan 2006 01:44:18 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Eu2Nb-0007UD-4D for emacs-devel@gnu.org; Wed, 04 Jan 2006 01:44:16 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Eu2NZ-0007TM-OT for emacs-devel@gnu.org; Wed, 04 Jan 2006 01:44:13 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1Eu2Ot-0007YL-0w; Wed, 04 Jan 2006 01:45:35 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k046gOw4031939; Wed, 4 Jan 2006 15:42:24 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k046gOUp031635; Wed, 4 Jan 2006 15:42:24 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1Eu2Ln-00032e-00; Wed, 04 Jan 2006 15:42:23 +0900 Original-To: Werner LEMBERG In-reply-to: <20051221.090033.182620434.wl@gnu.org> (message from Werner LEMBERG on Wed, 21 Dec 2005 09:00:33 +0100 (CET)) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:48686 Archived-At: In article <20051221.090033.182620434.wl@gnu.org>, Werner LEMBERG writes: > There is a serious problem with coding tags and utf-16 encodings of > any flavour: Emacs simply can't recognize the tag. This is a > non-trivial problem. Sorry for the late reply, but I think coding tag is useless for a file encoded in some of utf-16 variants. If a file has BOM at the head, BOM should tell the exact encoding whatever is specified in coding tag. If a file is encoded without BOM, we must use the less reliable heuristics to guess utf-16be or utf-16le. If you find a coding-tag spec by ignoring all zero bytes at even byte indexes, it means that the file is, in high possibility, utf-16be whatever the tag value is. If you find a coding-tag spec by ignoring all zero bytes at odd byte indexes, it means that the file is utf-16le whatever the tag value is. So, in any cases, a tag value itself is useless. Then how to detect utf-16 more reliably? In the current Emacs (i.e. Ver.22), I think we can use auto-coding-regexp-alist or auto-coding-alist. In the former case, we can register BOM patterns and also something like "\\`\\(\0[\0-\177]\\)+" for utf-16be. In the latter case, you can use more complicated heuristics in a registered function. But, those are anyway just heuristics; not 100% reliable. So I think we need a user option to turn it on and off, or perhaps a user option to select which kind of heuristics. --- Kenichi Handa handa@m17n.org