From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Newsgroups: gmane.lisp.guile.bugs Subject: bug#20339: [PATCH] sxml->xml and namespaces: updated patch Date: Mon, 20 Apr 2015 09:45:17 +0200 Message-ID: <20150420074517.GA31087@tuxteam.de> References: <20150415194714.GA30295@tuxteam.de> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="da4uJneut+ArUgXk" X-Trace: ger.gmane.org 1429515988 14421 80.91.229.3 (20 Apr 2015 07:46:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 20 Apr 2015 07:46:28 +0000 (UTC) To: 20339@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Mon Apr 20 09:46:15 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Yk6PG-0001pK-HA for guile-bugs@m.gmane.org; Mon, 20 Apr 2015 09:46:14 +0200 Original-Received: from localhost ([::1]:52153 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yk6PF-0000Ae-Ud for guile-bugs@m.gmane.org; Mon, 20 Apr 2015 03:46:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53661) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yk6P9-00006O-Un for bug-guile@gnu.org; Mon, 20 Apr 2015 03:46:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yk6P4-0008SC-R6 for bug-guile@gnu.org; Mon, 20 Apr 2015 03:46:07 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:43158) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yk6P4-0008S8-OR for bug-guile@gnu.org; Mon, 20 Apr 2015 03:46:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Yk6P4-0007An-Ch for bug-guile@gnu.org; Mon, 20 Apr 2015 03:46:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <20150415194714.GA30295@tuxteam.de> Resent-From: Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 20 Apr 2015 07:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20339 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 20339-submit@debbugs.gnu.org id=B20339.142951592327523 (code B ref 20339); Mon, 20 Apr 2015 07:46:02 +0000 Original-Received: (at 20339) by debbugs.gnu.org; 20 Apr 2015 07:45:23 +0000 Original-Received: from localhost ([127.0.0.1]:32934 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yk6OQ-00079q-5X for submit@debbugs.gnu.org; Mon, 20 Apr 2015 03:45:23 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:51574 helo=tomasium.tuxteam.de) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yk6ON-00079g-KW for 20339@debbugs.gnu.org; Mon, 20 Apr 2015 03:45:21 -0400 Original-Received: from tomas by tomasium.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1Yk6OL-0008LR-7s for 20339@debbugs.gnu.org; Mon, 20 Apr 2015 09:45:17 +0200 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7771 Archived-At: --da4uJneut+ArUgXk Content-Type: multipart/mixed; boundary="l76fUT7nc3MelDdI" Content-Disposition: inline --l76fUT7nc3MelDdI Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, I've embellished my proposed patch a bit: - use values resp. call-with-values instead of passing around lists. This was one thing I didn't like about my first patch candidate: the namespace --> ns abbreviation lookup had two things to return, for noe the abbreviation, and whether this abbreviation was "new" (for convenience in the form of a (namespace . abbreviation) pair). Instead of returning a list, now it returns multiple values. - patch is now against current stable instead of against "whatever Debian stable packages", i.e. against d680713 2015-04-03 16:35:54 +0200 Ludovic Court=E8s (stable-2.0) doc: Up= date libgc URL. I'm still not sure whether this is the way to go (i.e. mixing the abbreviation stuff into the serialization), or whether a pre-pass (replacing namespaces by abbreviations and generating the namespace declaration "attributes") would be the way to go. Besides, I'd like to have some input on whether it'd be worth to follow the usual convention and to put the namespace declarations before regular attributes (forcing us to do two passes on a tag node's attribute list). The generated XML looks pretty weird as is now. What I'd still like to introduce is a "mapping preference" as an optional argument by the user, possibly per-node (like "I'd like 'http://www.w3.org/1999/xlink' to be abbreviated as 'xlink' or something like that). Other XML serializers offer that. I envision this as a function, the library would fall back to generate the abbreviation whenever the function returns #f. The question on whether this patch (or whatever it evolves into) has a chance of getting into Guile is still open: I'd have to get my papers from the FSF in this case. Inputs? --l76fUT7nc3MelDdI Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="abbreviate-and-declare-namespaces.patch" Content-Transfer-Encoding: quoted-printable diff --git a/module/sxml/simple.scm b/module/sxml/simple.scm index 703ad91..86b0784 100644 --- a/module/sxml/simple.scm +++ b/module/sxml/simple.scm @@ -215,29 +215,38 @@ port." (elements (reverse (parser port '())))) `(*TOP* ,@elements))) =20 -(define check-name - (let ((*good-cache* (make-hash-table))) - (lambda (name) - (if (not (hashq-ref *good-cache* name)) - (let* ((str (symbol->string name)) - (i (string-index str #\:)) - (head (or (and i (substring str 0 i)) str)) - (tail (and i (substring str (1+ i))))) - (and i (string-index (substring str (1+ i)) #\:) - (error "Invalid QName: more than one colon" name)) - (for-each - (lambda (s) - (and s - (or (char-alphabetic? (string-ref s 0)) - (eq? (string-ref s 0) #\_) - (error "Invalid name starting character" s name)) - (string-for-each - (lambda (c) - (or (char-alphabetic? c) (string-index "0123456789.= -_" c) - (error "Invalid name character" c s name))) - s))) - (list head tail)) - (hashq-set! *good-cache* name #t)))))) +(define (ns-lookup ns nsmap) + "Look up namespace ns in nsmap. Return its abbreviation or #f" + (assoc-ref nsmap ns)) + +(define ns-abbr-new + (let ((*nscounter* 0)) + (lambda () + (set! *nscounter* (1+ *nscounter*)) + (string-append "ns" (number->string *nscounter*))))) + +(define (ns-abbr name nsmap) + "Takes a QName, SXML style (i.e a symbol whose string value is either a +clean local name or a colon-concatenated pair of namespace:name, and retur= ns +two values: the string : and either a pair ( . +nsabbrev) whenever wasn't in nsmap, or #f when it was" + ;; FIXME check for empty ns (e.g ":foo") + ;; check (worse!) for empty locname (e.g. "foo:") + (let* ((str (symbol->string name)) + (i (string-rindex str #\:)) + (ns (and i (substring str 0 i))) + (locname (or (and i (substring str (1+ i))) str))) + (if ns + (let ((nsabbr (ns-lookup ns nsmap))) + (if nsabbr + ;; known namespace: + (values (string-append nsabbr ":" locname) #f) + ;; unknown namespace + (let ((nsabbr (ns-abbr-new))) + (values (string-append nsabbr ":" locname) + (cons ns nsabbr))))) + ;; empty namespace: clean local-name: + (values locname #f)))) =20 ;; The following two functions serialize tags and attributes. They are ;; being used in the node handlers for the post-order function, see @@ -260,42 +269,58 @@ port." port)))) =20 (define (attribute->xml attr value port) - (check-name attr) (display attr port) (display "=3D\"" port) (attribute-value->xml value port) (display #\" port)) =20 -(define (element->xml tag attrs body port) - (check-name tag) - (display #\< port) - (display tag port) - (if attrs - (let lp ((attrs attrs)) - (if (pair? attrs) - (let ((attr (car attrs))) +(define (element->xml tag attrs body port nsmap) + (let ((new-namespaces '())) + (call-with-values (lambda () (ns-abbr tag nsmap)) + (lambda (abname new-ns) + (when new-ns + (set! new-namespaces (cons new-ns new-namespaces))) + (display #\< port) + (display abname port) + (if attrs + (let lp ((attrs attrs)) + (if (pair? attrs) + (let ((attr (car attrs))) + (display #\space port) + (if (pair? attr) + (call-with-values (lambda () (ns-abbr (car attr) n= smap)) + (lambda (abname new-ns) + (when new-ns + (set! new-namespaces (cons new-ns new-namesp= aces))) + (attribute->xml abname (cdr attr) port))) + (error "bad attribute" tag attr)) + (lp (cdr attrs))) + (if (not (null? attrs)) + (error "bad attributes" tag attrs))))) + ;; Output namespace declarations + (let lp ((new-namespaces new-namespaces)) + (unless (null? new-namespaces) + ;; remember: car is namespace, cdr is abbrev + (let ((ns (caar new-namespaces)) + (nsabbr (cdar new-namespaces))) (display #\space port) - (if (pair? attr) - (attribute->xml (car attr) (cdr attr) port) - (error "bad attribute" tag attr)) - (lp (cdr attrs))) - (if (not (null? attrs)) - (error "bad attributes" tag attrs))))) - (if (pair? body) - (begin - (display #\> port) - (let lp ((body body)) - (cond - ((pair? body) - (sxml->xml (car body) port) - (lp (cdr body))) - ((null? body) - (display "" port)) - (else - (error "bad element body" tag body))))) - (display " />" port))) + (attribute->xml (string-append "xmlns:" nsabbr) ns port)) + (lp (cdr new-namespaces)))) + (if (pair? body) + (begin + (display #\> port) + (let lp ((body body)) + (cond + ((pair? body) + (sxml->xml (car body) port (append new-namespaces nsmap)) + (lp (cdr body))) + ((null? body) + (display "" port)) + (else + (error "bad element body" tag body))))) + (display " />" port)))))) =20 ;; FIXME: ensure name is valid (define (entity->xml name port) @@ -311,7 +336,8 @@ port." (display str port) (display "?>" port)) =20 -(define* (sxml->xml tree #:optional (port (current-output-port))) +(define* (sxml->xml tree #:optional (port (current-output-port)) + (nsmap '())) "Serialize the sxml tree @var{tree} as XML. The output will be written to the current output port, unless the optional argument @var{port} is present." @@ -322,7 +348,7 @@ present." (let ((tag (car tree))) (case tag ((*TOP*) - (sxml->xml (cdr tree) port)) + (sxml->xml (cdr tree) port nsmap)) ((*ENTITY*) (if (and (list? (cdr tree)) (=3D (length (cdr tree)) 1)) (entity->xml (cadr tree) port) @@ -336,9 +362,9 @@ present." (attrs (and (pair? elems) (pair? (car elems)) (eq? '@ (caar elems)) (cdar elems)))) - (element->xml tag attrs (if attrs (cdr elems) elems) port))= ))) + (element->xml tag attrs (if attrs (cdr elems) elems) port n= smap))))) ;; A nodelist. - (for-each (lambda (x) (sxml->xml x port)) tree))) + (for-each (lambda (x) (sxml->xml x port nsmap)) tree))) ((string? tree) (string->escaped-xml tree port)) ((null? tree) *unspecified*) --l76fUT7nc3MelDdI-- --da4uJneut+ArUgXk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlU0rowACgkQBcgs9XrR2kZRwACffTrZx5cCTIr7pMETu2kLbqvZ H8kAnAq9DYpMgKjL7sRpox496i/QN7Dl =Yxx8 -----END PGP SIGNATURE----- --da4uJneut+ArUgXk--