From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8) Date: Wed, 20 Feb 2008 07:02:21 +0900 Message-ID: <87y79gwqoi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87odaifv16.fsf@mundaneum.com> <87r6fd5q12.fsf@uwakimon.sk.tsukuba.ac.jp> <47B6B3CC.1080101@gnu.org> <87wsp5xhz4.fsf@uwakimon.sk.tsukuba.ac.jp> <47B6D1ED.6020601@gmail.com> <87r6fcxms7.fsf@uwakimon.sk.tsukuba.ac.jp> <47B8452F.6090504@gmail.com> <87odafxluc.fsf@uwakimon.sk.tsukuba.ac.jp> <87bq6fnrpt.fsf@catnip.gol.com> <87hcg7xh2h.fsf@uwakimon.sk.tsukuba.ac.jp> <87ablyye32.fsf@uwakimon.sk.tsukuba.ac.jp> <8763wlxvn3.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1203458141 6383 80.91.229.12 (19 Feb 2008 21:55:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 19 Feb 2008 21:55:41 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Miles Bader , "Lennart Borgman \(gmail\)" , Edward O'Connor , emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Feb 19 22:56:03 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JRaRS-0005dA-FW for ged-emacs-devel@m.gmane.org; Tue, 19 Feb 2008 22:55:58 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JRaQx-0003rQ-OM for ged-emacs-devel@m.gmane.org; Tue, 19 Feb 2008 16:55:27 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JRaQt-0003pq-1A for emacs-devel@gnu.org; Tue, 19 Feb 2008 16:55:23 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JRaQr-0003oY-9T for emacs-devel@gnu.org; Tue, 19 Feb 2008 16:55:22 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JRaQr-0003oM-2n for emacs-devel@gnu.org; Tue, 19 Feb 2008 16:55:21 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JRaQq-0006Sj-W2 for emacs-devel@gnu.org; Tue, 19 Feb 2008 16:55:21 -0500 Original-Received: from mx10.gnu.org ([199.232.76.166]) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JRaQq-000786-Dy for emacs-pretest-bug@gnu.org; Tue, 19 Feb 2008 16:55:20 -0500 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JRaQm-0006Rq-MK for emacs-pretest-bug@gnu.org; Tue, 19 Feb 2008 16:55:20 -0500 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JRaQm-0006RK-17; Tue, 19 Feb 2008 16:55:16 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 04ACA7FFA; Wed, 20 Feb 2008 06:55:14 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id E68621A29E5; Wed, 20 Feb 2008 07:02:21 +0900 (JST) In-Reply-To: X-Mailer: VM 7.17 under 21.5 (beta28) "fuki" 42711a251efd XEmacs Lucid X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:89601 gmane.emacs.pretest.bugs:21231 Archived-At: Stefan Monnier writes: > My understanding of the OP's situation is that his files are not XML > files, but plaintext files that happen to contain XML fragments. Interpreting the XML 1.0 standard, if those XML fragments are intended to be parsed by the XML processor as part of the document, they are (conceptually) "external entities". How that affects XML processing will depend on exactly what you mean by "text-concatenation". ISTM there are two possibilities. First, use the XML facilities (ie, an entity reference). That looks like this (there's also a "PUBLIC" entity version): Blah blah blah &open-hatch; foo bar baz. Entity reference has the advantage of using XML catalogs and the like to find the entity (similar to the way C's #include allows cpp to use an include path). The XML specification requires entities to declare their own encoding using a text declaration, unless it is UTF-8 or can be detected using the Byte Order Mark. IMO this is the obvious way to do things if your XML processor supports external entity reference. Second, use some kind of preprocessor for concatenation, such as cat or cpp. In this case, a text declaration can't be used because it must appear as the first thing in the entity, but the XML process will see only a single entity, the whole document. In that case the XML specification says nothing about the fragments. However, because the XML specification mandates a fatal error[1] when a processor detects any encoding inconsistency or ambiguity, to users the risks of guessing about fragment encodings are potentially high (at least in annoyance). So I advocate using a multientity framework (for this purpose among others) where some sort of master document is available to check consistency, rather than Mule guesswork on a file-by-file basis. > I don't know much about XML: The XML specification is rather short (especially compared to the SGML specification), yet self-contained. Footnotes: [1] Not necessarily termination of the process, but normal processing must terminate, and the XML processor permanently enters an error mode. Very annoying at best.