From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Ricardo Wurmus Newsgroups: gmane.lisp.guile.bugs Subject: bug#20339: sxml simple: sxml->xml mishandles namespaces? Date: Tue, 05 Feb 2019 13:57:11 +0100 Message-ID: <87r2cmgzq0.fsf@elephly.net> References: <20150415194714.GA30295@tuxteam.de> <87y45vln0f.fsf@pobox.com> <20160713132403.GA2349@tuxteam.de> <87furc1qeu.fsf@pobox.com> <87a7jbi8rx.fsf@elephly.net> <874l9iiopl.fsf@elephly.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="256088"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: mu4e 1.0; emacs 26.1 Cc: 20339@debbugs.gnu.org To: John Cowan Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Feb 06 05:45:11 2019 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1grF5B-0014MK-ES for guile-bugs@m.gmane.org; Wed, 06 Feb 2019 05:45:10 +0100 Original-Received: from localhost ([127.0.0.1]:44959 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1grF5A-0003uR-BB for guile-bugs@m.gmane.org; Tue, 05 Feb 2019 23:45:08 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:44997) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1grF55-0003uJ-79 for bug-guile@gnu.org; Tue, 05 Feb 2019 23:45:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1grF54-0006HE-8I for bug-guile@gnu.org; Tue, 05 Feb 2019 23:45:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34643) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1grF53-0006H4-R2 for bug-guile@gnu.org; Tue, 05 Feb 2019 23:45:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1grF53-0000bw-OC for bug-guile@gnu.org; Tue, 05 Feb 2019 23:45:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ricardo Wurmus Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 06 Feb 2019 04:45:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20339 X-GNU-PR-Package: guile Original-Received: via spool by 20339-submit@debbugs.gnu.org id=B20339.15494282682283 (code B ref 20339); Wed, 06 Feb 2019 04:45:01 +0000 Original-Received: (at 20339) by debbugs.gnu.org; 6 Feb 2019 04:44:28 +0000 Original-Received: from localhost ([127.0.0.1]:33924 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1grF4V-0000al-CD for submit@debbugs.gnu.org; Tue, 05 Feb 2019 23:44:27 -0500 Original-Received: from sender-of-o51.zoho.com ([135.84.80.216]:21058) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1grF4S-0000aZ-Ip for 20339@debbugs.gnu.org; Tue, 05 Feb 2019 23:44:25 -0500 ARC-Seal: i=1; a=rsa-sha256; t=1549371437; cv=none; d=zoho.com; s=zohoarc; b=QvC9tolm152kf1f/XXQR9an58hT30JiReaKmWp2o4BPfkHyCmuIvPrUnQkkpgWooRwiXQrtRsetxYrr+885dpo6h41PiR4wNgEc+IXa/MZ3vdEOP26N58O5j1uE+UIU8szZMhF++vxuEnhx8+aZVwvZI6YxoHk63COnwjSoPK+4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1549371437; h=Content-Type:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=FvyclqO+OIQmmvjA2uzVA79hbzeOuP1ZDkiOv4qVpys=; b=fvlDG+x9OxlpANi6g0bHefRJ0UT/esKc5t6YPPF/JNtGvLrWkQdJ0ZJVUsatP5NKgUrcGQYfdtZANQ7OLNKVyh5DaEHfAyXUNJqdseRmBNePwmt98dt2hHx7rg6IELHE1/2/YHTBiVPag1Ms+yF69n9mRihZvjE8LScBsUkWD+U= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1549371437; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type; l=3449; bh=FvyclqO+OIQmmvjA2uzVA79hbzeOuP1ZDkiOv4qVpys=; b=bBtnED2NwAp8U1v48HfmCcB47tdCJriQ1bmFcc9tBCIzi/lvBLrKF4irUA6iz+Bq 3ZEVWYRNhHcng6MF0kWBJRJQu2/ni4Ph7qUG+FQSRFC2TBOkyHT9/7VeQFYM+w6V0wX IKq+NKmIqpZSJKvbfqc36sXnrfW9YI9CO0QSjbC8= Original-Received: from localhost (141.80.247.165 [141.80.247.165]) by mx.zohomail.com with SMTPS id 1549371435332230.93305749332058; Tue, 5 Feb 2019 04:57:15 -0800 (PST) In-reply-to: <874l9iiopl.fsf@elephly.net> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC X-ZohoMailClient: External X-Zoho-Virus-Status: 1 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9301 Archived-At: --=-=-= Content-Type: text/plain Ricardo Wurmus writes: > In that case we coud have FINISH-ELEMENT add all namespace declarations > that are in scope to the current node that is about to be returned. It > would be a little verbose, but more correct. Like this: --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-sxml-xml-sxml-Record-and-use-namespace-abbreviations.patch >From d44c702718baea4c4557d12ca8dd7dab724c7fb6 Mon Sep 17 00:00:00 2001 From: Ricardo Wurmus Date: Mon, 4 Feb 2019 21:39:06 +0100 Subject: [PATCH] sxml: xml->sxml: Record and use namespace abbreviations. * module/sxml/simple.scm (xml->sxml) [name->sxml]: Accept namespaces argument to look up abbreviation. Return name with abbreviation prefix. [parser]: Let FINISH-ELEMENT procedure return namespaces in addition to the SXML tree's attributes. --- module/sxml/simple.scm | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm index 703ad9137..2bb332c83 100644 --- a/module/sxml/simple.scm +++ b/module/sxml/simple.scm @@ -1,7 +1,8 @@ ;;;; (sxml simple) -- a simple interface to the SSAX parser ;;;; -;;;; Copyright (C) 2009, 2010, 2013 Free Software Foundation, Inc. +;;;; Copyright (C) 2009, 2010, 2013, 2019 Free Software Foundation, Inc. ;;;; Modified 2004 by Andy Wingo . +;;;; Modified 2019 by Ricardo Wurmus . ;;;; Originally written by Oleg Kiselyov as SXML-to-HTML.scm. ;;;; ;;;; This library is free software; you can redistribute it and/or @@ -30,6 +31,7 @@ #:use-module (sxml ssax) #:use-module (sxml transform) #:use-module (ice-9 match) + #:use-module (srfi srfi-1) #:use-module (srfi srfi-13) #:export (xml->sxml sxml->xml sxml->string)) @@ -123,10 +125,15 @@ port." (acons '*DEFAULT* default-entity-handler entities) entities)) - (define (name->sxml name) + (define (name->sxml name namespaces) (match name ((prefix . local-part) - (symbol-append prefix (string->symbol ":") local-part)) + (let ((abbrev (and=> (find (match-lambda + ((abbrev uri . rest) + (and (eq? uri prefix) abbrev))) + namespaces) + first))) + (symbol-append abbrev (string->symbol ":") local-part))) (_ name))) (define (doctype-continuation seed) @@ -150,12 +157,21 @@ port." (let ((seed (if trim-whitespace? (ssax:reverse-collect-str-drop-ws seed) (ssax:reverse-collect-str seed))) - (attrs (attlist-fold - (lambda (attr accum) - (cons (list (name->sxml (car attr)) (cdr attr)) - accum)) - '() attributes))) - (acons (name->sxml elem-gi) + (attrs (append + ;; Namespace declarations + (filter-map (match-lambda + (('*DEFAULT* . _) #f) + ((abbrev uri . _) + (list (symbol-append 'xmlns: abbrev) + (symbol->string uri)))) + namespaces) + (attlist-fold + (lambda (attr accum) + (cons (list (name->sxml (car attr) namespaces) + (cdr attr)) + accum)) + '() attributes)))) + (acons (name->sxml elem-gi namespaces) (if (null? attrs) seed (cons (cons '@ attrs) seed)) -- 2.20.1 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable It=E2=80=99s quite verbose because it doesn=E2=80=99t check if a namespace = declaration is the same in a parent. -- Ricardo --=-=-=--