* and nXML mode
@ 2021-08-09 2:22 Jean-Christophe Helary
2021-08-09 5:57 ` Yuri Khan
0 siblings, 1 reply; 5+ messages in thread
From: Jean-Christophe Helary @ 2021-08-09 2:22 UTC (permalink / raw)
To: help-gnu-emacs
Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
--
Jean-Christophe Helary @brandelune
https://mac4translators.blogspot.com
https://sr.ht/~brandelune/omegat-as-a-book/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode
2021-08-09 2:22 and nXML mode Jean-Christophe Helary
@ 2021-08-09 5:57 ` Yuri Khan
2021-08-11 0:41 ` Jean-Christophe Helary
2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor
0 siblings, 2 replies; 5+ messages in thread
From: Yuri Khan @ 2021-08-09 5:57 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: help-gnu-emacs
On Mon, 9 Aug 2021 at 09:22, Jean-Christophe Helary
<lists@traduction-libre.org> wrote:
> Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with:
>
> <!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml">
If you view that as an XML document (which is what nXML deals with),
without any preconceived knowledge of HTML5, there is nothing to
suggest that is legit.
In XML, an entity can be defined inline within the doctype declaration:
<!DOCTYPE html [
<!ENTITY nbsp "&#a0;">
]>
or by reference to an external entity definition:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
SYSTEM "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
(In the HTML5 spec, this is referred to as “obsolete permitted DOCTYPE
string”, and the obsoletion is from the HTML5 point of view. I.e. if
you use an HTML5-aware parser, <!DOCTYPE html> is sufficient to
declare an HTML5 document.)
If you fetch that url, you will see that it references a number of
modules, and if you chase references far enough, you will get to
http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent which contains this as its
first significant line:
<!ENTITY nbsp " " ><!-- no-break space = non-breaking
space, U+00A0 ISOnum -->
and that’s what makes a valid entity reference in an XHTML document.
(XML processors normally have some shortcuts, such as DTD pre-cached
in the so-called XML catalog, so that they don’t have to fetch them
from the network each time. XML catalog is keyed by the PUBLIC and/or
SYSTEM identifiers but not by the doctype root element name.)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode
2021-08-09 5:57 ` Yuri Khan
@ 2021-08-11 0:41 ` Jean-Christophe Helary
2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor
1 sibling, 0 replies; 5+ messages in thread
From: Jean-Christophe Helary @ 2021-08-11 0:41 UTC (permalink / raw)
To: help-gnu-emacs
> On Aug 9, 2021, at 14:57, Yuri Khan <yuri.v.khan@gmail.com> wrote:
>
> On Mon, 9 Aug 2021 at 09:22, Jean-Christophe Helary
> <lists@traduction-libre.org> wrote:
>
>> Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with:
>>
>> <!DOCTYPE html>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>
> If you view that as an XML document (which is what nXML deals with),
> without any preconceived knowledge of HTML5, there is nothing to
> suggest that is legit.
>
> In XML, an entity can be defined inline within the doctype declaration:
>
> <!DOCTYPE html [
> <!ENTITY nbsp "&#a0;">
> ]>
>
> or by reference to an external entity definition:
>
> <!DOCTYPE html
> PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> SYSTEM "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
When I put that at the top of my file, nxml says "unexpected token".
> (In the HTML5 spec, this is referred to as “obsolete permitted DOCTYPE
> string”, and the obsoletion is from the HTML5 point of view. I.e. if
> you use an HTML5-aware parser, <!DOCTYPE html> is sufficient to
> declare an HTML5 document.)
>
> If you fetch that url, you will see that it references a number of
> modules, and if you chase references far enough, you will get to
> http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent which contains this as its
> first significant line:
>
> <!ENTITY nbsp " " ><!-- no-break space = non-breaking
> space, U+00A0 ISOnum -->
>
> and that’s what makes a valid entity reference in an XHTML document.
>
> (XML processors normally have some shortcuts, such as DTD pre-cached
> in the so-called XML catalog, so that they don’t have to fetch them
> from the network each time. XML catalog is keyed by the PUBLIC and/or
> SYSTEM identifiers but not by the doctype root element name.)
Thank you for explaining the process. I was not aware of how processors handled the thing.
But I guess trying to make nxml be aware all this goes well beyond the scope of my work, so I'll just use html-mode.
--
Jean-Christophe Helary @brandelune
https://mac4translators.blogspot.com
https://sr.ht/~brandelune/omegat-as-a-book/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode
2021-08-09 5:57 ` Yuri Khan
2021-08-11 0:41 ` Jean-Christophe Helary
@ 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor
2021-08-11 6:19 ` Yuri Khan
1 sibling, 1 reply; 5+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2021-08-11 5:45 UTC (permalink / raw)
To: help-gnu-emacs
>> Is there a reason why nXML mode refuses to consider entities as
>> legit in a document that starts with:
>>
>> <!DOCTYPE html>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>
> If you view that as an XML document (which is what nXML deals with),
> without any preconceived knowledge of HTML5, there is nothing to
> suggest that is legit.
>
> In XML, an entity can be defined inline within the doctype declaration:
>
> <!DOCTYPE html [
> <!ENTITY nbsp "&#a0;">
> ]>
My understanding is that XML wants to be "parsable" without knowing
anything about the schema being used, and that this notion of parsing
includes conversion of `&<foo>;` entities, so basically XML only allows
the 4 or 5 predefined/builtin entities and that's it.
> <!ENTITY nbsp " " ><!-- no-break space = non-breaking
> space, U+00A0 ISOnum -->
I thought the recommended way to "do &bnsp;" in XML is to use an
actual NBSP character (because XML can use utf-8).
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode
2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2021-08-11 6:19 ` Yuri Khan
0 siblings, 0 replies; 5+ messages in thread
From: Yuri Khan @ 2021-08-11 6:19 UTC (permalink / raw)
To: Stefan Monnier; +Cc: help-gnu-emacs
On Wed, 11 Aug 2021 at 12:46, Stefan Monnier via Users list for the
GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote:
> I thought the recommended way to "do &bnsp;" in XML is to use an
> actual NBSP character (because XML can use utf-8).
It can, but it might be desirable to have visible and typeable[*]
representation for unusual space characters, especially if there’s not
much nbsp awareness among those who are going to collaborate on a
document.
[*]: I mean visibility and typeability by muggles, i.e. without
relying on Emacs fontification, C-x 8 RET, or Xkb options. Otherwise:
Co-worker: (sends you a patch)
You: This space here should be non-breaking, we don’t want
prepositions hanging on the previous line when the next word gets
wrapped.
Co-worker: What's a non-breaking space?
You: (explain)
Co-worker: But how do I type one? BTW I'm on Windows.
You: (try to explain but don’t remember how to configure Windows
keyboard layouts) Here, just copy one of those already in the text.
Co-worker: But how do I tell which are which? They look exactly the same to me.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-11 6:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-09 2:22 and nXML mode Jean-Christophe Helary
2021-08-09 5:57 ` Yuri Khan
2021-08-11 0:41 ` Jean-Christophe Helary
2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor
2021-08-11 6:19 ` Yuri Khan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).