From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Andrew Gierth Newsgroups: gmane.lisp.guile.bugs Subject: bug#38269: SSAX incorrect handling of > in CDATA Date: Tue, 19 Nov 2019 13:41:54 +0000 Message-ID: <87zhgsyost.fsf@news-spur.riddles.org.uk> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="54162"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) To: 38269@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Nov 19 15:50:11 2019 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iX4pW-000Dsy-D9 for guile-bugs@m.gmane.org; Tue, 19 Nov 2019 15:50:10 +0100 Original-Received: from localhost ([::1]:46254 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iX4pU-0005V8-Vn for guile-bugs@m.gmane.org; Tue, 19 Nov 2019 09:50:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46712) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iX4pP-0005V2-5J for bug-guile@gnu.org; Tue, 19 Nov 2019 09:50:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iX4pO-0007on-25 for bug-guile@gnu.org; Tue, 19 Nov 2019 09:50:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:38017) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1iX4pN-0007oh-VA for bug-guile@gnu.org; Tue, 19 Nov 2019 09:50:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1iX4pN-0005wN-R5 for bug-guile@gnu.org; Tue, 19 Nov 2019 09:50:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Andrew Gierth Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 19 Nov 2019 14:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 38269 X-GNU-PR-Package: guile X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.157417499422816 (code B ref -1); Tue, 19 Nov 2019 14:50:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 19 Nov 2019 14:49:54 +0000 Original-Received: from localhost ([127.0.0.1]:46838 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX4pG-0005vv-0N for submit@debbugs.gnu.org; Tue, 19 Nov 2019 09:49:54 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:53971) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iX3lk-0002OB-P0 for submit@debbugs.gnu.org; Tue, 19 Nov 2019 08:42:13 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37959) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iX3lj-0004sy-6w for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iX3li-0008Vp-54 for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:11 -0500 Original-Received: from lungold.riddles.org.uk ([82.68.208.19]:57560) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iX3lh-0008R4-TM for bug-guile@gnu.org; Tue, 19 Nov 2019 08:42:10 -0500 Original-Received: from [192.168.127.1] (port=38258 helo=caithnard.riddles.org.uk) by lungold.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lT-0006Pt-2v for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:55 +0000 Original-Received: from localhost ([127.0.0.1]:23006 helo=caithnard.riddles.org.uk) by caithnard.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD)) (envelope-from ) id 1iX3lS-000286-Qa for bug-guile@gnu.org; Tue, 19 Nov 2019 13:41:54 +0000 X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Mailman-Approved-At: Tue, 19 Nov 2019 09:49:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9459 Archived-At: The bug: > (xml->sxml "") $2 = (*TOP* (e ">")) The expected result is (*TOP* (e ">")). In upstream/SSAX.scm: ; procedure+: ssax:read-cdata-body PORT STR-HANDLER SEED [...] ; Within a CDATA section all characters are taken at their face value, ; with only three exceptions: [..] ; > is treated as an embedded #\> character This handling of > is contrary to the XML specification, in which there are no special character sequences inside CDATA except newline and the "]]>" closing tag. I have confirmed this by checking other XML parsers. The code seems to be based on a wild misreading of another section of the specification that does not apply here. (And unfortunately, the W3C validation suite for XML happens not to contain any instances of > inside CDATA.) I believe the fix should be as simple as removing the entire (#\&) case from the function (and fixing the test cases). This bug seems to exist in all versions of SSAX. -- Andrew.