unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-pretest-bug@gnu.org, Miles Bader <miles@gnu.org>,
	"Lennart Borgman \(gmail\)" <lennart.borgman@gmail.com>,
	Edward O'Connor <hober0@gmail.com>,
	emacs-devel@gnu.org
Subject: Re: 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8)
Date: Wed, 20 Feb 2008 07:02:21 +0900	[thread overview]
Message-ID: <87y79gwqoi.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <jwvejb9ne38.fsf-monnier+emacs@gnu.org>

Stefan Monnier writes:

 > My understanding of the OP's situation is that his files are not XML
 > files, but plaintext files that happen to contain XML fragments.

Interpreting the XML 1.0 standard, if those XML fragments are intended
to be parsed by the XML processor as part of the document, they are
(conceptually) "external entities".  How that affects XML processing
will depend on exactly what you mean by "text-concatenation".

ISTM there are two possibilities.  First, use the XML facilities (ie,
an entity reference).  That looks like this (there's also a "PUBLIC"
entity version):

<!ENTITY open-hatch
         SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">

Blah blah blah
&open-hatch;
foo bar baz.

Entity reference has the advantage of using XML catalogs and the like
to find the entity (similar to the way C's #include allows cpp to use
an include path).  The XML specification requires entities to declare
their own encoding using a text declaration, unless it is UTF-8 or can
be detected using the Byte Order Mark.  IMO this is the obvious way to
do things if your XML processor supports external entity reference.

Second, use some kind of preprocessor for concatenation, such as cat
or cpp.  In this case, a text declaration can't be used because it
must appear as the first thing in the entity, but the XML process will
see only a single entity, the whole document.  In that case the XML
specification says nothing about the fragments.

However, because the XML specification mandates a fatal error[1] when
a processor detects any encoding inconsistency or ambiguity, to users
the risks of guessing about fragment encodings are potentially high
(at least in annoyance).  So I advocate using a multientity framework
(for this purpose among others) where some sort of master document is
available to check consistency, rather than Mule guesswork on a
file-by-file basis.

 > I don't know much about XML:

The XML specification is rather short (especially compared to the
SGML specification), yet self-contained.


Footnotes: 
[1]  Not necessarily termination of the process, but normal processing
must terminate, and the XML processor permanently enters an error mode.
Very annoying at best.





  reply	other threads:[~2008-02-19 22:02 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-15  9:06 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8) Sébastien Vauban
2008-02-15 22:32 ` Edward O'Connor
2008-02-15 22:54   ` Jason Rumney
2008-02-15 23:24     ` Miles Bader
2008-02-15 23:34       ` Jason Rumney
2008-02-15 23:42         ` Miles Bader
2008-02-16  3:42           ` Miles Bader
2008-02-18  2:49       ` Jason Rumney
2008-02-18  3:01         ` Jason Rumney
2008-02-16  4:03   ` Stefan Monnier
2008-02-16  7:17     ` Stephen J. Turnbull
2008-02-16  9:58       ` Jason Rumney
2008-02-16 11:23         ` Stephen J. Turnbull
2008-02-16 12:07           ` Lennart Borgman (gmail)
2008-02-17  3:52             ` Stephen J. Turnbull
2008-02-17 14:31               ` Lennart Borgman (gmail)
2008-02-17 22:24                 ` Stephen J. Turnbull
2008-02-17 22:27                   ` Miles Bader
2008-02-18  0:07                     ` Stephen J. Turnbull
2008-02-18  3:16                       ` Miles Bader
2008-02-18  6:26                         ` Stephen J. Turnbull
2008-02-18  6:40                           ` Miles Bader
2008-02-19  7:17                             ` Stephen J. Turnbull
2008-02-19  7:19                               ` Miles Bader
2008-02-19 21:03                                 ` Stephen J. Turnbull
2008-02-19 22:47                                   ` Jason Rumney
2008-02-19 22:58                                   ` Miles Bader
2008-02-20  0:43                                     ` Stephen J. Turnbull
2008-02-19 15:50                               ` Stefan Monnier
2008-02-19 22:02                                 ` Stephen J. Turnbull [this message]
2008-02-18 14:59                           ` Projects and multi-file documents (was: 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8)) Stefan Monnier
2008-02-18 18:51                             ` Projects and multi-file documents Ralf Angeli
2008-02-18 16:35                   ` 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8) Lennart Borgman (gmail)
2008-02-16 17:03           ` Jason Rumney
2008-02-16 17:31             ` David Kastrup
2008-02-17  3:53             ` Stephen J. Turnbull
2008-02-18  3:22               ` Miles Bader
2008-02-18  6:01                 ` Stephen J. Turnbull

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y79gwqoi.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=stephen@xemacs.org \
    --cc=emacs-devel@gnu.org \
    --cc=emacs-pretest-bug@gnu.org \
    --cc=hober0@gmail.com \
    --cc=lennart.borgman@gmail.com \
    --cc=miles@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).