From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; [nxml] BOM and utf-8 Date: Sun, 18 May 2008 11:30:05 +0900 Message-ID: <878wy8ny36.fsf@catnip.gol.com> References: <87od75kt78.fsf@pdrechsler.de> <87mymofip6.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Miles Bader NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1211077837 2465 80.91.229.12 (18 May 2008 02:30:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 18 May 2008 02:30:37 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Patrick Drechsler To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun May 18 04:31:13 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JxYg4-0001UV-9z for ged-emacs-devel@m.gmane.org; Sun, 18 May 2008 04:31:12 +0200 Original-Received: from localhost ([127.0.0.1]:42864 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JxYfK-0000Qs-HB for ged-emacs-devel@m.gmane.org; Sat, 17 May 2008 22:30:26 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JxYfG-0000QV-Hp for emacs-devel@gnu.org; Sat, 17 May 2008 22:30:22 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JxYfF-0000Q7-2j for emacs-devel@gnu.org; Sat, 17 May 2008 22:30:22 -0400 Original-Received: from [199.232.76.173] (port=33485 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JxYfE-0000Py-Q2 for emacs-devel@gnu.org; Sat, 17 May 2008 22:30:20 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:36487) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JxYfE-0005Cq-IK for emacs-devel@gnu.org; Sat, 17 May 2008 22:30:20 -0400 Original-Received: from mail.gnu.org ([199.232.76.166]:60671 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JxYe6-0006Iw-7E for emacs-pretest-bug@gnu.org; Sat, 17 May 2008 22:29:10 -0400 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JxYfA-0005CA-30 for emacs-pretest-bug@gnu.org; Sat, 17 May 2008 22:30:20 -0400 Original-Received: from smtp12.dentaku.gol.com ([203.216.5.74]:43952) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JxYf9-0005Bw-My; Sat, 17 May 2008 22:30:15 -0400 Original-Received: from 203-216-99-243.dsl.gol.ne.jp ([203.216.99.243] helo=catnip.gol.com) by smtp12.dentaku.gol.com with esmtpa (Dentaku) id 1JxYf2-0004Tv-QL; Sun, 18 May 2008 11:30:08 +0900 Original-Received: by catnip.gol.com (Postfix, from userid 1000) id E10AB2F46; Sun, 18 May 2008 11:30:05 +0900 (JST) System-Type: i686-pc-linux-gnu In-Reply-To: <87mymofip6.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Sun, 18 May 2008 11:29:41 +0900") Original-Lines: 26 X-Virus-Scanned: ClamAV GOL (outbound) X-Abuse-Complaints: abuse@gol.com X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97341 gmane.emacs.pretest.bugs:22365 Archived-At: "Stephen J. Turnbull" writes: > > is the attached xml file (simple.xml) really invalid (as indicated by > > nxhtml) or is this a bug in nxhtml? > > Neither. Emacs is (arguably) reading it incorrectly. By "arguably" I presume you're referring to the "Microsoft does , therefore everybody who doesn't do is thing>incorrect" tactic. I think think it would be a lot _more_ arguable that microsoft apps which randomly add BOM to the beginning of files where it is invalid are broken. In general, other apps that read such files are not expecting the BOM, and won't be able to deal with it. So Emacs wouldn't be doing the user any favors by hiding the BOM from him. BOM is not part of UTF-8. UTF-8 files that contain "BOM" are simply UTF-8 files with a random weird character at the beginning. -Miles -- In New York, most people don't have cars, so if you want to kill a person, you have to take the subway to their house. And sometimes on the way, the train is delayed and you get impatient, so you have to kill someone on the subway. [George Carlin]