From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: coding tags and utf-16 Date: Tue, 07 Mar 2006 10:02:05 +0900 Message-ID: References: <20051221.090033.182620434.wl@gnu.org> <85vewxodk2.fsf@lola.goethe.zz> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1141695909 3037 80.91.229.2 (7 Mar 2006 01:45:09 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 7 Mar 2006 01:45:09 +0000 (UTC) Cc: ihs_4664@yahoo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 07 02:45:04 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FGRG2-0001Pc-Oe for ged-emacs-devel@m.gmane.org; Tue, 07 Mar 2006 02:45:03 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGRFy-0003l5-5h for ged-emacs-devel@m.gmane.org; Mon, 06 Mar 2006 20:44:58 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FGQan-0001dA-HP for emacs-devel@gnu.org; Mon, 06 Mar 2006 20:02:25 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FGQag-0001Za-Q3 for emacs-devel@gnu.org; Mon, 06 Mar 2006 20:02:22 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGQaf-0001ZR-3B for emacs-devel@gnu.org; Mon, 06 Mar 2006 20:02:17 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FGQd8-0000vo-0G for emacs-devel@gnu.org; Mon, 06 Mar 2006 20:04:51 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k271265Y015827; Tue, 7 Mar 2006 10:02:06 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id k27126RY028164; Tue, 7 Mar 2006 10:02:06 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1FGQaT-0001Qy-00; Tue, 07 Mar 2006 10:02:05 +0900 Original-To: Benjamin Riefenstahl In-reply-to: (message from Benjamin Riefenstahl on Mon, 06 Mar 2006 20:35:15 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51311 Archived-At: In article , Benjamin Riefenstahl writes: > Kenichi Handa writes: >> For decoding UTF-8, we should not delete that BOM but treat it as >> the content of the text. For UTF-16, Unicode explicitly says that >> "The BOM is not considered part of the content of the text", but for >> UTF-8, it doesn't say such a thing. > NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing > UTF-8 files. When I saw that and tried to discuss it on their > newsgroups, I learned that it seems to be Microsoft's POV that this is > a good thing. > Which means files like that exist. Treating the BOM as content means > that U+FEFF creeps into the regular content of documents through > cut-and-paste and through components of template systems. I have > already seen that happening in real life and of course it leads to > stupid bugs. I think Emacs should do better. But, it's simply a bug to delete the leading U+FEFF from the content while decoding utf-8. Perhaps we should add some customizable flag to control that behavior after the release. >> utf-16-be [==] utf-16be-with-signature [!=] utf-16be > ;-) ^.^;;; --- Kenichi Handa handa@m17n.org