From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Patrick Drechsler Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; [nxml] BOM and utf-8 Date: Thu, 22 May 2008 00:37:11 +0200 Organization: none Message-ID: <87mymjs2qw.fsf@pdrechsler.de> References: <87od75kt78.fsf@pdrechsler.de> <87d4nk8y3q.fsf@everybody.org> <87r6bvs3jj.fsf@pdrechsler.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1211409451 27728 80.91.229.12 (21 May 2008 22:37:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 21 May 2008 22:37:31 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu May 22 00:38:08 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Jywwh-0008Ij-BZ for ged-emacs-devel@m.gmane.org; Thu, 22 May 2008 00:38:07 +0200 Original-Received: from localhost ([127.0.0.1]:57932 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jywvx-00055I-10 for ged-emacs-devel@m.gmane.org; Wed, 21 May 2008 18:37:21 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Jywvs-00054s-AJ for emacs-devel@gnu.org; Wed, 21 May 2008 18:37:16 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jywvq-00054K-Lz for emacs-devel@gnu.org; Wed, 21 May 2008 18:37:15 -0400 Original-Received: from [199.232.76.173] (port=57693 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jywvq-00054A-9X for emacs-devel@gnu.org; Wed, 21 May 2008 18:37:14 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:32989 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Jywvp-0006Bb-L5 for emacs-devel@gnu.org; Wed, 21 May 2008 18:37:14 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Jywvk-0000Hn-F2 for emacs-devel@gnu.org; Wed, 21 May 2008 22:37:08 +0000 Original-Received: from dslb-088-065-089-160.pools.arcor-ip.net ([88.65.89.160]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 21 May 2008 22:37:08 +0000 Original-Received: from patrick by dslb-088-065-089-160.pools.arcor-ip.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 21 May 2008 22:37:08 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 72 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: dslb-088-065-089-160.pools.arcor-ip.net Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAFVBMVEUAAACtra19fX3y8vKA gID///9AQEBawxxrAAABeUlEQVR4nHXSwXKDIBAA0G3a8Vw05JxY9E4MObeK3h1Cz2qt//8J3UWj ppIdBxmfu4sC2CcB070+PQHDuAcyhQPzwLVFuSlfqQK7eOFirfQ2ly7JC8a3Kip1sx6QLOaRB0wo IUg8gFUa2D2BIFVb+OYGhC23cB4aUDbaQt4AgifjCACt9HwgQfrjWZWDwNPjEwHbtBu44kPC5D9Y FknMgFf8aZFawDDGIuoTkAYndYczzd5hjqAYwXzQ/QCraB2401HvHyQhKMb2tbw0M7whuBNoIs4M W/Xh056fQzwLPF/g14FkrcLN4nKp9WLByIMImSwyycLjDDsOF93pVFdC6yofHkEjxDhWumtWsMdH edc7SOf2/QiYQiByUU0gSldKD4LyBLUbK+kCaoI+diNlxgOGmIFWhVDpe5RgqEiPRR6Buwx8PxXE G3CGzeMVmAnctUB5h3Hsthljp34Ne/wjbu4+cIovBVbqfH4vFdOkoI2SzEWUZdMMQ9k/5csXnnRd prMAAAAASUVORK5CYII= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) Cancel-Lock: sha1:YlHtPGId8DS57shjUftS05EFHCs= X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97489 gmane.emacs.pretest.bugs:22404 Archived-At: Patrick Drechsler writes: > mah@everybody.org (Mark A. Hershberger) writes: > >> Patrick Drechsler writes: >> >>> is the attached xml file (simple.xml) really invalid (as indicated by >>> nxhtml) or is this a bug in nxhtml? > > s/nxhtml/nxml/ > >> The file simple.xml is really invalid. >> >> http://www.w3.org/TR/2006/REC-xml-20060816/#sec-prolog-dtd >> >> The XML spec gives the following syntax description for the prolog of an >> XML file (I've only copied the relevant parts): >> >> [3] S ::= (#x20 | #x9 | #xD | #xA)+ >> [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? >> [23] XMLDecl ::= '' >> >> Note that there is no S before the literal “> optional. >> >> So, yes, an file that contains whitespace before " > Finally having some spare time I read the specs from above and I have a > followup question concerning your last sentence: > > The BOM in my example file is not whitespace, it is xEF xBB xBF (it is > only displayed as whitespace by Emacs). According to the W3C site this > is valid: > > ,----[ http://www.w3.org/TR/2006/REC-xml-20060816/#charencoding ] > | Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY > | begin with the Byte Order Mark described by Annex H of [ISO/IEC > | 10646:2000], section 2.4 of [Unicode], and section 2.7 of [Unicode3] > | (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding > | signature, not part of either the markup or the character data of the > | XML document. XML processors MUST be able to use this character to > | differentiate between UTF-8 and UTF-16 encoded documents. > `---- > > Also http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info sorry, wrong link: http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing > and > > ,----[ http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-with-ext-info ] > | If an XML entity is in a file, the Byte-Order Mark and encoding > | declaration are used (if present) to determine the character encoding. > `---- > > sound like a BOM is a legal (although optional) part of a xml file coded > in utf-8. > > But I am not an expert, so please correct my potentially incorrect > interpretation. > > In case my interpretation is correct, this is a bug in emacs' nxml mode. > > Cheers, > > Patrick -- ._q0p_. '=(_)=' / V \ (_/^\_)