From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: coding tags and utf-16 Date: Mon, 06 Mar 2006 22:04:32 +0900 Message-ID: References: <20051221.090033.182620434.wl@gnu.org> <85vewxodk2.fsf@lola.goethe.zz> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1141682269 22818 80.91.229.2 (6 Mar 2006 21:57:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 6 Mar 2006 21:57:49 +0000 (UTC) Cc: ihs_4664@yahoo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 06 22:57:46 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FGNi4-00089j-2H for ged-emacs-devel@m.gmane.org; Mon, 06 Mar 2006 22:57:44 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGNiE-0007nI-N8 for ged-emacs-devel@m.gmane.org; Mon, 06 Mar 2006 16:57:54 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FGMxJ-0002lN-RA for emacs-devel@gnu.org; Mon, 06 Mar 2006 16:09:25 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FGMxD-0002h0-80 for emacs-devel@gnu.org; Mon, 06 Mar 2006 16:09:22 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGJy9-0003oK-Ri for emacs-devel@gnu.org; Mon, 06 Mar 2006 12:58:06 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FGFQX-00084F-3q for emacs-devel@gnu.org; Mon, 06 Mar 2006 08:07:05 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k26D4YFG020362; Mon, 6 Mar 2006 22:04:34 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k26D4Xc0013043; Mon, 6 Mar 2006 22:04:34 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1FGFO4-0003nG-00; Mon, 06 Mar 2006 22:04:32 +0900 Original-To: Benjamin Riefenstahl In-reply-to: (message from Benjamin Riefenstahl on Sat, 04 Mar 2006 21:34:37 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51294 Archived-At: In article , Benjamin Riefenstahl writes: > Kenichi Handa writes: >>> ("\\`\xEF\xBB\xBF" . utf-8) >> >> As far as I know, UTF-8 should not start with this sequence unless >> the text really starts with ZWNBSP (very unlikely). > UTF-8 can start with a BOM. See > . That's why I wrote "unless ..." part. For decoding UTF-8, we should not delete that BOM but treat it as the content of the text. For UTF-16, Unicode explicitly says that "The BOM is not considered part of the content of the text", but for UTF-8, it doesn't say such a thing. Anyway, as Unicode doesn't recommend but doesn't inhibit BOM in UTF-8 either, if people agree, I'll add it too. >>> ("\\`\xFE\xFF" . utf-16-be) >>> ("\\`\xFF\xFE" . utf-16-le) >> >> Although it's not clear how safe they are, if no one objects, >> I'll add them in auto-coding-regexp-alist. > Shouldn't those be utf-16-[bl]e-with-signature? Or has the naming > convention changed? Actually utf-16-be is an alias of utf-16be-with-signature (more precisely, an alias of mule-utf-16be-with-signature) and is different from utf-16be (and we don't have utf-16-be-with-signature). I have a responsibility for this confusing naming. I long ago mistakenly accepted and committed those names (utf-16-[bl]e), and now keeping them for backward compatibility. Anyway I agree that using utf-16[bl]e-with-signature here is better. --- Kenichi Handa handa@m17n.org