From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: 23.0.60; [nxml] BOM and utf-8 Date: Sun, 18 May 2008 11:29:41 +0900 Message-ID: <87mymofip6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87od75kt78.fsf@pdrechsler.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1211077114 817 80.91.229.12 (18 May 2008 02:18:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 18 May 2008 02:18:34 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org To: Patrick Drechsler Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun May 18 04:19:10 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JxYUP-000883-KZ for ged-emacs-devel@m.gmane.org; Sun, 18 May 2008 04:19:09 +0200 Original-Received: from localhost ([127.0.0.1]:39787 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JxYTf-0003ZN-B7 for ged-emacs-devel@m.gmane.org; Sat, 17 May 2008 22:18:23 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JxYTb-0003Z2-2l for emacs-devel@gnu.org; Sat, 17 May 2008 22:18:19 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JxYTZ-0003YV-O2 for emacs-devel@gnu.org; Sat, 17 May 2008 22:18:18 -0400 Original-Received: from [199.232.76.173] (port=56164 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JxYTZ-0003YO-AU for emacs-devel@gnu.org; Sat, 17 May 2008 22:18:17 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:37513) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JxYTZ-0003kM-14 for emacs-devel@gnu.org; Sat, 17 May 2008 22:18:17 -0400 Original-Received: from mx10.gnu.org ([199.232.76.166]:60185) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JxYSQ-0005u4-Nx for emacs-pretest-bug@gnu.org; Sat, 17 May 2008 22:17:06 -0400 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JxYTV-0003k2-Ko for emacs-pretest-bug@gnu.org; Sat, 17 May 2008 22:18:16 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:56428) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JxYTV-0003jW-8F for emacs-pretest-bug@gnu.org; Sat, 17 May 2008 22:18:13 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 7CFF08007; Sun, 18 May 2008 11:18:08 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 9F6B41A25C3; Sun, 18 May 2008 11:29:41 +0900 (JST) In-Reply-To: <87od75kt78.fsf@pdrechsler.de> X-Mailer: VM ?bug? under XEmacs 21.5.21 (x86_64-unknown-linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97340 gmane.emacs.pretest.bugs:22364 Archived-At: Patrick Drechsler writes: > is the attached xml file (simple.xml) really invalid (as indicated by > nxhtml) or is this a bug in nxhtml? Neither. Emacs is (arguably) reading it incorrectly. > describe-char on the first symbol gives (I replaced the BOM part with > XXX): The signature is *not* part of the text according to the Unicode standard, and if recognized as a signature should be removed by the I/O system (here, Emacs) before passing it to the XML processor. > | file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix) There should be an Emacs coding system that removes the BOM. The XML standard requires that the XML declaration, if present, be the first thing in the file. XML does not recognize the BOM as part of the prolog, optional or otherwise. The BOM signals the encoding of the document, but in XML the atomic constituents are characters; there is no encoding, and thus no place for a BOM. (The standard recognizes that encoding varies from context to context, and provides means for specifying it, but that's a different issue.) See Mark Hershberger's reply for more detail on the syntax of an XML file.