* and nXML mode @ 2021-08-09 2:22 Jean-Christophe Helary 2021-08-09 5:57 ` Yuri Khan 0 siblings, 1 reply; 5+ messages in thread From: Jean-Christophe Helary @ 2021-08-09 2:22 UTC (permalink / raw) To: help-gnu-emacs Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with: <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> -- Jean-Christophe Helary @brandelune https://mac4translators.blogspot.com https://sr.ht/~brandelune/omegat-as-a-book/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode 2021-08-09 2:22 and nXML mode Jean-Christophe Helary @ 2021-08-09 5:57 ` Yuri Khan 2021-08-11 0:41 ` Jean-Christophe Helary 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor 0 siblings, 2 replies; 5+ messages in thread From: Yuri Khan @ 2021-08-09 5:57 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: help-gnu-emacs On Mon, 9 Aug 2021 at 09:22, Jean-Christophe Helary <lists@traduction-libre.org> wrote: > Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with: > > <!DOCTYPE html> > <html xmlns="http://www.w3.org/1999/xhtml"> If you view that as an XML document (which is what nXML deals with), without any preconceived knowledge of HTML5, there is nothing to suggest that is legit. In XML, an entity can be defined inline within the doctype declaration: <!DOCTYPE html [ <!ENTITY nbsp "&#a0;"> ]> or by reference to an external entity definition: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" SYSTEM "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> (In the HTML5 spec, this is referred to as “obsolete permitted DOCTYPE string”, and the obsoletion is from the HTML5 point of view. I.e. if you use an HTML5-aware parser, <!DOCTYPE html> is sufficient to declare an HTML5 document.) If you fetch that url, you will see that it references a number of modules, and if you chase references far enough, you will get to http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent which contains this as its first significant line: <!ENTITY nbsp " " ><!-- no-break space = non-breaking space, U+00A0 ISOnum --> and that’s what makes a valid entity reference in an XHTML document. (XML processors normally have some shortcuts, such as DTD pre-cached in the so-called XML catalog, so that they don’t have to fetch them from the network each time. XML catalog is keyed by the PUBLIC and/or SYSTEM identifiers but not by the doctype root element name.) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode 2021-08-09 5:57 ` Yuri Khan @ 2021-08-11 0:41 ` Jean-Christophe Helary 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor 1 sibling, 0 replies; 5+ messages in thread From: Jean-Christophe Helary @ 2021-08-11 0:41 UTC (permalink / raw) To: help-gnu-emacs > On Aug 9, 2021, at 14:57, Yuri Khan <yuri.v.khan@gmail.com> wrote: > > On Mon, 9 Aug 2021 at 09:22, Jean-Christophe Helary > <lists@traduction-libre.org> wrote: > >> Is there a reason why nXML mode refuses to consider entities as legit in a document that starts with: >> >> <!DOCTYPE html> >> <html xmlns="http://www.w3.org/1999/xhtml"> > > If you view that as an XML document (which is what nXML deals with), > without any preconceived knowledge of HTML5, there is nothing to > suggest that is legit. > > In XML, an entity can be defined inline within the doctype declaration: > > <!DOCTYPE html [ > <!ENTITY nbsp "&#a0;"> > ]> > > or by reference to an external entity definition: > > <!DOCTYPE html > PUBLIC "-//W3C//DTD XHTML 1.1//EN" > SYSTEM "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> When I put that at the top of my file, nxml says "unexpected token". > (In the HTML5 spec, this is referred to as “obsolete permitted DOCTYPE > string”, and the obsoletion is from the HTML5 point of view. I.e. if > you use an HTML5-aware parser, <!DOCTYPE html> is sufficient to > declare an HTML5 document.) > > If you fetch that url, you will see that it references a number of > modules, and if you chase references far enough, you will get to > http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent which contains this as its > first significant line: > > <!ENTITY nbsp " " ><!-- no-break space = non-breaking > space, U+00A0 ISOnum --> > > and that’s what makes a valid entity reference in an XHTML document. > > (XML processors normally have some shortcuts, such as DTD pre-cached > in the so-called XML catalog, so that they don’t have to fetch them > from the network each time. XML catalog is keyed by the PUBLIC and/or > SYSTEM identifiers but not by the doctype root element name.) Thank you for explaining the process. I was not aware of how processors handled the thing. But I guess trying to make nxml be aware all this goes well beyond the scope of my work, so I'll just use html-mode. -- Jean-Christophe Helary @brandelune https://mac4translators.blogspot.com https://sr.ht/~brandelune/omegat-as-a-book/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode 2021-08-09 5:57 ` Yuri Khan 2021-08-11 0:41 ` Jean-Christophe Helary @ 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor 2021-08-11 6:19 ` Yuri Khan 1 sibling, 1 reply; 5+ messages in thread From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2021-08-11 5:45 UTC (permalink / raw) To: help-gnu-emacs >> Is there a reason why nXML mode refuses to consider entities as >> legit in a document that starts with: >> >> <!DOCTYPE html> >> <html xmlns="http://www.w3.org/1999/xhtml"> > > If you view that as an XML document (which is what nXML deals with), > without any preconceived knowledge of HTML5, there is nothing to > suggest that is legit. > > In XML, an entity can be defined inline within the doctype declaration: > > <!DOCTYPE html [ > <!ENTITY nbsp "&#a0;"> > ]> My understanding is that XML wants to be "parsable" without knowing anything about the schema being used, and that this notion of parsing includes conversion of `&<foo>;` entities, so basically XML only allows the 4 or 5 predefined/builtin entities and that's it. > <!ENTITY nbsp " " ><!-- no-break space = non-breaking > space, U+00A0 ISOnum --> I thought the recommended way to "do &bnsp;" in XML is to use an actual NBSP character (because XML can use utf-8). Stefan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: and nXML mode 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor @ 2021-08-11 6:19 ` Yuri Khan 0 siblings, 0 replies; 5+ messages in thread From: Yuri Khan @ 2021-08-11 6:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs On Wed, 11 Aug 2021 at 12:46, Stefan Monnier via Users list for the GNU Emacs text editor <help-gnu-emacs@gnu.org> wrote: > I thought the recommended way to "do &bnsp;" in XML is to use an > actual NBSP character (because XML can use utf-8). It can, but it might be desirable to have visible and typeable[*] representation for unusual space characters, especially if there’s not much nbsp awareness among those who are going to collaborate on a document. [*]: I mean visibility and typeability by muggles, i.e. without relying on Emacs fontification, C-x 8 RET, or Xkb options. Otherwise: Co-worker: (sends you a patch) You: This space here should be non-breaking, we don’t want prepositions hanging on the previous line when the next word gets wrapped. Co-worker: What's a non-breaking space? You: (explain) Co-worker: But how do I type one? BTW I'm on Windows. You: (try to explain but don’t remember how to configure Windows keyboard layouts) Here, just copy one of those already in the text. Co-worker: But how do I tell which are which? They look exactly the same to me. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-11 6:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-09 2:22 and nXML mode Jean-Christophe Helary 2021-08-09 5:57 ` Yuri Khan 2021-08-11 0:41 ` Jean-Christophe Helary 2021-08-11 5:45 ` Stefan Monnier via Users list for the GNU Emacs text editor 2021-08-11 6:19 ` Yuri Khan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).