* bug#38269: SSAX incorrect handling of > in CDATA
@ 2019-11-19 13:41 Andrew Gierth
0 siblings, 0 replies; only message in thread
From: Andrew Gierth @ 2019-11-19 13:41 UTC (permalink / raw)
To: 38269
The bug:
> (xml->sxml "<e><![CDATA[>]]></e>")
$2 = (*TOP* (e ">"))
The expected result is (*TOP* (e ">")).
In upstream/SSAX.scm:
; procedure+: ssax:read-cdata-body PORT STR-HANDLER SEED
[...]
; Within a CDATA section all characters are taken at their face value,
; with only three exceptions:
[..]
; > is treated as an embedded #\> character
This handling of > is contrary to the XML specification, in which
there are no special character sequences inside CDATA except newline and
the "]]>" closing tag. I have confirmed this by checking other XML
parsers. The code seems to be based on a wild misreading of another
section of the specification that does not apply here. (And
unfortunately, the W3C validation suite for XML happens not to contain
any instances of > inside CDATA.)
I believe the fix should be as simple as removing the entire (#\&) case
from the function (and fixing the test cases).
This bug seems to exist in all versions of SSAX.
--
Andrew.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-11-19 13:41 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 13:41 bug#38269: SSAX incorrect handling of > in CDATA Andrew Gierth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).