unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#38269: SSAX incorrect handling of > in CDATA
@ 2019-11-19 13:41 Andrew Gierth
  0 siblings, 0 replies; only message in thread
From: Andrew Gierth @ 2019-11-19 13:41 UTC (permalink / raw)
  To: 38269

The bug:

> (xml->sxml "<e><![CDATA[&gt;]]></e>")
$2 = (*TOP* (e ">"))

The expected result is (*TOP* (e "&gt;")).

In upstream/SSAX.scm:

; procedure+: 	ssax:read-cdata-body PORT STR-HANDLER SEED
[...]
; Within a CDATA section all characters are taken at their face value,
; with only three exceptions:
[..]
;	&gt; is treated as an embedded #\> character

This handling of &gt; is contrary to the XML specification, in which
there are no special character sequences inside CDATA except newline and
the "]]>" closing tag. I have confirmed this by checking other XML
parsers. The code seems to be based on a wild misreading of another
section of the specification that does not apply here. (And
unfortunately, the W3C validation suite for XML happens not to contain
any instances of &gt; inside CDATA.)

I believe the fix should be as simple as removing the entire (#\&) case
from the function (and fixing the test cases).

This bug seems to exist in all versions of SSAX.

-- 
Andrew.





^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-11-19 13:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 13:41 bug#38269: SSAX incorrect handling of &gt; in CDATA Andrew Gierth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).