From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Benjamin Riefenstahl Newsgroups: gmane.emacs.devel Subject: Re: coding tags and utf-16 Date: Mon, 06 Mar 2006 20:35:15 +0100 Message-ID: References: <20051221.090033.182620434.wl@gnu.org> <85vewxodk2.fsf@lola.goethe.zz> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1141678462 5053 80.91.229.6 (6 Mar 2006 20:54:22 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 6 Mar 2006 20:54:22 +0000 (UTC) Cc: ihs_4664@yahoo.com, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 06 21:54:13 2006 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1FGLyp-0000it-00 for ; Mon, 06 Mar 2006 21:06:56 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGLyz-0001IT-Cc for ged-emacs-devel@m.gmane.org; Mon, 06 Mar 2006 15:07:05 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FGLVa-0003vd-AM for emacs-devel@gnu.org; Mon, 06 Mar 2006 14:36:42 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FGLVW-0003u0-Fo for emacs-devel@gnu.org; Mon, 06 Mar 2006 14:36:41 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGLVW-0003tt-8i for emacs-devel@gnu.org; Mon, 06 Mar 2006 14:36:38 -0500 Original-Received: from [193.99.153.30] (helo=seneca.benny.turtle-trading.net) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FGLXm-0000XN-Jt for emacs-devel@gnu.org; Mon, 06 Mar 2006 14:38:58 -0500 Original-Received: from seneca.benny.turtle-trading.net.turtle-trading.net (seneca.benny.turtle-trading.net [127.0.0.1]) (authenticated bits=0) by seneca.benny.turtle-trading.net (8.12.8/8.12.8) with ESMTP id k26JZGLo005272; Mon, 6 Mar 2006 20:36:08 +0100 Original-To: Kenichi Handa In-Reply-To: (Kenichi Handa's message of "Mon, 06 Mar 2006 22:04:32 +0900") User-Agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:51285 Archived-At: Hi, Kenichi Handa writes: > For decoding UTF-8, we should not delete that BOM but treat it as > the content of the text. For UTF-16, Unicode explicitly says that > "The BOM is not considered part of the content of the text", but for > UTF-8, it doesn't say such a thing. NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing UTF-8 files. When I saw that and tried to discuss it on their newsgroups, I learned that it seems to be Microsoft's POV that this is a good thing. Which means files like that exist. Treating the BOM as content means that U+FEFF creeps into the regular content of documents through cut-and-paste and through components of template systems. I have already seen that happening in real life and of course it leads to stupid bugs. I think Emacs should do better. > utf-16-be [==] utf-16be-with-signature [!=] utf-16be ;-) benny