From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; [nxml] BOM and utf-8 Date: Sat, 24 May 2008 06:23:18 +0900 Message-ID: <87fxs8n29l.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87od75kt78.fsf@pdrechsler.de> <87d4nk8y3q.fsf@everybody.org> <87r6bvs3jj.fsf@pdrechsler.de> <87mymjs2qw.fsf@pdrechsler.de> <20080522041745.GA29437@tomas> <87mymixmx5.fsf@uwakimon.sk.tsukuba.ac.jp> <20080523090511.GA12796@tomas> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1211577138 24465 80.91.229.12 (23 May 2008 21:12:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 23 May 2008 21:12:18 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Patrick Drechsler , emacs-devel@gnu.org To: tomas@tuxteam.de Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 23 23:12:45 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JzeZ8-0002Z6-MQ for ged-emacs-devel@m.gmane.org; Fri, 23 May 2008 23:12:42 +0200 Original-Received: from localhost ([127.0.0.1]:37217 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JzeYO-000159-0E for ged-emacs-devel@m.gmane.org; Fri, 23 May 2008 17:11:56 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JzeYG-00014P-AM for emacs-devel@gnu.org; Fri, 23 May 2008 17:11:48 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JzeYE-000144-Ft for emacs-devel@gnu.org; Fri, 23 May 2008 17:11:47 -0400 Original-Received: from [199.232.76.173] (port=49044 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JzeYE-000141-9q for emacs-devel@gnu.org; Fri, 23 May 2008 17:11:46 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:46308) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JzeY7-0001QT-Ex; Fri, 23 May 2008 17:11:39 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 69FFD7FFA; Sat, 24 May 2008 06:11:29 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id AAA371A25C3; Sat, 24 May 2008 06:23:18 +0900 (JST) In-Reply-To: <20080523090511.GA12796@tomas> X-Mailer: VM ?bug? under XEmacs 21.5.21 (x86_64-unknown-linux) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97617 gmane.emacs.pretest.bugs:22435 Archived-At: tomas@tuxteam.de writes: > As for whether Emacs or nxml has the burden of skipping the BOM -- that > would correspond to whether nxml "within" Emacs is "seeing" a piece of > XML or a whole XML document, right? No, I don't think so. First, as I tried to explain, I don't think that Emacs can reliably "know" that the BOM needs to be skipped at decoding time. Second, if the "piece" is what XML calls a "parsed external entity" (analogous to an include file), it must be subjected to BOM processing according to section 4.3.3 of the XML standard. On the other hand, if the fragment is generated internally to Emacs, then there should be no BOM, because the BOM is not part of the text of an XML document: "This is an encoding signature, not part of either the markup or the character data of the XML document." While on the other hand the BOM will not be produced with character semantics (as ZWNBSP) in modern (since Unicode 3.2) Unicode processes. So I think there is almost never going to be harm in nxml stripping the BOM, whereas Emacs has to be much more careful.