Gentle guile folks,
I'm playing around with (sxml simple) and stumbled upon something
I think might be a bug. Consider the following snippet:
#!/usr/bin/guile -s
!#
(use-modules (sxml simple))
;; An XML with two namespaces (one default)
(define the-svg "")
;; Note how SXML handles QNames (just concatenating NS and
;; local-name with a colon):
(define the-sxml
(with-input-from-string the-svg xml->sxml))
(format #t "~A\n" the-sxml)
;; If we try to serialize this: kaboom!
(sxml->xml the-sxml)
The parsing into SXML goes well, the (format ...) outputs what
I'd expect. But the (sxml->xml ...) dies with:
ERROR: In procedure scm-error:
ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg
I had a look at sxml simple and think the problem is that the
function check-name (which is the one throwing the error) expects
the name to be a QName (i.e. either a Name or a namespace abbreviation
plus a colon plus a Name).
But SXML tacks the whole namespaces to names (i.e. the whole
"http://www.w3.org/1999/xlink", for example -- not the "xlink").
When serializing to XML, we should go the way back, finding abbreviations
for the namespaces used, prefixing the names with those abbreviations
and issuing namespace declarations for those abbreviations (those funny
xmlns:foo attributes).
I've tried my hand at a patch which "works for me". Basically, what it
does is to thread an extra parameter "nsmap", representing a mapping
(namespace -> ns-abbreviation) valid at "this" position and below in
the tree. When new, unseen namespaces come up, new abbreviations are
"invented" (ns-abbrev-new), collected and the corresponding declarations
printed. When recursing to sub-elements, the new mappings are added to
the nsmap passed down.
The result after the patch for the above example (a bit embellished)
looks like this:
Pretty clumsy, but basically correct.
The attached patch is against "GNU Guile 2.0.5-deb+1-3". The relevant
code hasn't changed up to the current development version.
I'm not very happy with the patch as-is. Among other things,
- I had a hard time doing what I wanted in a non-clumsy way.
Especially, ns-abbr is a strange function and not very clear
because it tries to do several things at once: replace the
namespace by its abbreviation, signal a new mapping item
whenever this abbreviation was new. But how to achieve this
elegantly without doing several look-ups?
- The namespace declarations are tacked at the end of the attribute
list. This is plain opportunism: the tag may carry a namespace,
and each of the attribute names too. Thus, it's very handy to
collect all the unseen mappings (new-namespaces in element->xml)
and output them at the end of the attribute list.
But in XML it is usual to put the namespace declarations before
the attributes (the "canonical" XML order even prescribes that).
- The sxml code is pretty careful to not munge around too much
with strings, but to output things ASAP to the port. I think
I might be a bit more careful in that department.
- In other XML libraries the user gets a choice on preferred
namespace mappings (e.g. I'd like http://www.w3.org/2000/svg
to be the default namespace -- or http://www.w3.org/1999/xlink
to be abbreviated as 'xlink'). This could be achieved by
passing a function as an optional parameter which gets a try
at a new namespace before ns-abbr-new gets at it.
I'd be happy to prepare a patch against whatever version makes
sense once we get some consensus on how to do it right.
Thanks & regards
-- tomás