unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* sxml simple, sxml->xml and namespaces
@ 2015-04-08 20:55 tomas
  2016-06-20  8:56 ` Andy Wingo
  0 siblings, 1 reply; 8+ messages in thread
From: tomas @ 2015-04-08 20:55 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 4496 bytes --]

Gentle guile folks,

I'm playing around with (sxml simple) and stumbled upon something
I think might be a bug. Consider the following snippet:

  #!/usr/bin/guile -s
  !#
  (use-modules (sxml simple))
  
  ;; An XML with two namespaces (one default)
  (define the-svg "<svg xmlns='http://www.w3.org/2000/svg'
       xmlns:xlink='http://www.w3.org/1999/xlink'>
    <rect x='5' y='5' width='20' height='20'
          stroke-width='2' stroke='purple' fill='yellow'
          id='rect1' />
    <rect x='30' y='5' width='20' height='20'
          ry='5' rx='8' stroke-width='2' stroke='purple' fill='blue'
          xlink:href='#rect1' />
  </svg>")
  
  ;; Note how SXML handles QNames (just concatenating NS and
  ;; local-name with a colon):
  (define the-sxml
    (with-input-from-string the-svg xml->sxml))
  (format #t "~A\n" the-sxml)
  
  ;; If we try to serialize this: kaboom!
  (sxml->xml the-sxml)
  
The parsing into SXML goes well, the (format ...) outputs what
I'd expect. But the (sxml->xml ...) dies with:

  ERROR: In procedure scm-error:
  ERROR: Invalid QName: more than one colon http://www.w3.org/2000/svg:svg

I had a look at sxml simple and think the problem is that the
function check-name (which is the one throwing the error) expects
the name to be a QName (i.e. either a Name or a namespace abbreviation
plus a colon plus a Name).

But SXML tacks the whole namespaces to names (i.e. the whole
"http://www.w3.org/1999/xlink", for example -- not the "xlink").

When serializing to XML, we should go the way back, finding abbreviations
for the namespaces used, prefixing the names with those abbreviations
and issuing namespace declarations for those abbreviations (those funny
xmlns:foo attributes).

I've tried my hand at a patch which "works for me". Basically, what it
does is to thread an extra parameter "nsmap", representing a mapping
(namespace -> ns-abbreviation) valid at "this" position and below in
the tree. When new, unseen namespaces come up, new abbreviations are
"invented" (ns-abbrev-new), collected and the corresponding declarations
printed. When recursing to sub-elements, the new mappings are added to
the nsmap passed down.

The result after the patch for the above example (a bit embellished)
looks like this:

  <ns1:svg xmlns:ns1="http://www.w3.org/2000/svg">
    <ns1:rect y="5" x="5" width="20" stroke-width="2"
              stroke="purple" id="rect1" height="20" fill="yellow" />
    <ns1:rect ns2:href="#rect1" y="5" x="30" width="20" stroke-width="2"
              stroke="purple" ry="5" rx="8" height="20" fill="blue"
              xmlns:ns2="http://www.w3.org/1999/xlink" />
  </ns1:svg>
  
Pretty clumsy, but basically correct.

The attached patch is against "GNU Guile 2.0.5-deb+1-3". The relevant
code hasn't changed up to the current development version.

I'm not very happy with the patch as-is. Among other things,

 - I had a hard time doing what I wanted in a non-clumsy way.
   Especially, ns-abbr is a strange function and not very clear
   because it tries to do several things at once: replace the
   namespace by its abbreviation, signal a new mapping item
   whenever this abbreviation was new. But how to achieve this
   elegantly without doing several look-ups?

 - The namespace declarations are tacked at the end of the attribute
   list. This is plain opportunism: the tag may carry a namespace,
   and each of the attribute names too. Thus, it's very handy to
   collect all the unseen mappings (new-namespaces in element->xml)
   and output them at the end of the attribute list.

   But in XML it is usual to put the namespace declarations before
   the attributes (the "canonical" XML order even prescribes that).

 - The sxml code is pretty careful to not munge around too much
   with strings, but to output things ASAP to the port. I think
   I might be a bit more careful in that department.

 - In other XML libraries the user gets a choice on preferred
   namespace mappings (e.g. I'd like http://www.w3.org/2000/svg
   to be the default namespace -- or http://www.w3.org/1999/xlink
   to be abbreviated as 'xlink'). This could be achieved by
   passing a function as an optional parameter which gets a try
   at a new namespace before ns-abbr-new gets at it.

I'd be happy to prepare a patch against whatever version makes
sense once we get some consensus on how to do it right.

Thanks & regards
-- tomás

[-- Attachment #1.2: simple.diff --]
[-- Type: text/x-diff, Size: 7416 bytes --]

--- /usr/share/guile/2.0/sxml/simple.scm	2012-03-18 20:16:21.000000000 +0100
+++ /home/tomas/lib/guile/sxml/simple.scm	2015-04-08 22:29:30.049277842 +0200
@@ -37,29 +37,38 @@
 argument, @var{port}, which defaults to the current input port."
   (ssax:xml->sxml port '()))
 
-(define check-name
-  (let ((*good-cache* (make-hash-table)))
-    (lambda (name)
-      (if (not (hashq-ref *good-cache* name))
-          (let* ((str (symbol->string name))
-                 (i (string-index str #\:))
-                 (head (or (and i (substring str 0 i)) str))
-                 (tail (and i (substring str (1+ i)))))
-            (and i (string-index (substring str (1+ i)) #\:)
-                 (error "Invalid QName: more than one colon" name))
-            (for-each
-             (lambda (s)
-               (and s
-                    (or (char-alphabetic? (string-ref s 0))
-                        (eq? (string-ref s 0) #\_)
-                        (error "Invalid name starting character" s name))
-                    (string-for-each
-                     (lambda (c)
-                       (or (char-alphabetic? c) (string-index "0123456789.-_" c)
-                           (error "Invalid name character" c s name)))
-                     s)))
-             (list head tail))
-            (hashq-set! *good-cache* name #t))))))
+(define (ns-lookup ns nsmap)
+  "Look up namespace ns in nsmap. Return its abbreviation or #f"
+  (assoc-ref nsmap ns))
+
+(define ns-abbr-new
+  (let ((*nscounter* 0))
+    (lambda ()
+      (set! *nscounter* (1+ *nscounter*))
+      (string-append "ns" (number->string *nscounter*)))))
+
+(define (ns-abbr name nsmap)
+  "Takes a QName, SXML style (i.e a symbol whose string value is either a
+clean local name or a colon-concatenated pair of namespace:name, and returns
+a list whose car is a string <nsabbrev>:<local-name> and which has as cdr
+a pair (<namespace> . nsabbrev) whenever <namespace> wasn't found in nsmap"
+  ;; FIXME check for empty ns (e.g ":foo")
+  ;; check (worse!) for empty locname (e.g. "foo:")
+  (let* ((str (symbol->string name))
+         (i (string-rindex str #\:))
+         (ns (and i (substring str 0 i)))
+         (locname (or (and i (substring str (1+ i))) str)))
+    (if ns
+        (let ((nsabbr (ns-lookup ns nsmap)))
+          (if nsabbr
+              ;; known namespace:
+              (list (string-append nsabbr ":" locname))
+              ;; unknown namespace
+              (let ((nsabbr (ns-abbr-new)))
+                (list (string-append nsabbr ":" locname)
+                      (cons ns nsabbr)))))
+        ;; empty namespace: clean local-name:
+        (list locname))))
 
 ;; The following two functions serialize tags and attributes. They are
 ;; being used in the node handlers for the post-order function, see
@@ -82,42 +91,58 @@
      port))))
 
 (define (attribute->xml attr value port)
-  (check-name attr)
   (display attr port)
   (display "=\"" port)
   (attribute-value->xml value port)
   (display #\" port))
 
-(define (element->xml tag attrs body port)
-  (check-name tag)
-  (display #\< port)
-  (display tag port)
-  (if attrs
-      (let lp ((attrs attrs))
-        (if (pair? attrs)
-            (let ((attr (car attrs)))
-              (display #\space port)
-              (if (pair? attr)
-                  (attribute->xml (car attr) (cdr attr) port)
-                  (error "bad attribute" tag attr))
-              (lp (cdr attrs)))
-            (if (not (null? attrs))
-                (error "bad attributes" tag attrs)))))
-  (if (pair? body)
-      (begin
-        (display #\> port)
-        (let lp ((body body))
-          (cond
-           ((pair? body)
-            (sxml->xml (car body) port)
-            (lp (cdr body)))
-           ((null? body)
-            (display "</" port)
-            (display tag port)
-            (display ">" port))
-           (else
-            (error "bad element body" tag body)))))
-      (display " />" port)))
+(define (element->xml tag attrs body port nsmap)
+  (let* ((ab (ns-abbr tag  nsmap))
+         (abname (car ab))
+         (new-namespaces (cdr ab)))
+    (display #\< port)
+    (display abname port)
+    (if attrs
+        (let lp ((attrs attrs))
+          (if (pair? attrs)
+              (let ((attr (car attrs)))
+                (display #\space port)
+                (if (pair? attr)
+                    (let* ((ab (ns-abbr (car attr) nsmap))
+                           (abname (car ab))
+                           (nsplus (cdr ab)))
+                      (unless (null? nsplus)
+                        (set! new-namespaces
+                              (cons (car nsplus) new-namespaces)))
+                      (attribute->xml abname (cdr attr) port))
+                    (error "bad attribute" tag attr))
+                (lp (cdr attrs)))
+              (if (not (null? attrs))
+                  (error "bad attributes" tag attrs)))))
+    ;; Output namespace declarations
+    (let lp ((new-namespaces new-namespaces))
+      (unless (null? new-namespaces)
+        ;; remember: car is namespace, cdr is abbrev
+        (let ((ns (caar new-namespaces))
+              (nsabbr (cdar new-namespaces)))
+          (display #\space port)
+          (attribute->xml (string-append "xmlns:" nsabbr) ns port))
+        (lp (cdr new-namespaces))))
+    (if (pair? body)
+        (begin
+          (display #\> port)
+          (let lp ((body body))
+            (cond
+             ((pair? body)
+              (sxml->xml (car body) port (append new-namespaces nsmap))
+              (lp (cdr body)))
+             ((null? body)
+              (display "</" port)
+              (display abname port)
+              (display ">" port))
+             (else
+              (error "bad element body" tag body)))))
+        (display " />" port))))
 
 ;; FIXME: ensure name is valid
 (define (entity->xml name port)
@@ -133,7 +158,8 @@
   (display str port)
   (display "?>" port))
 
-(define* (sxml->xml tree #:optional (port (current-output-port)))
+(define* (sxml->xml tree #:optional (port (current-output-port))
+                    (nsmap '()))
   "Serialize the sxml tree @var{tree} as XML. The output will be written
 to the current output port, unless the optional argument @var{port} is
 present."
@@ -144,7 +170,7 @@
         (let ((tag (car tree)))
           (case tag
             ((*TOP*)
-             (sxml->xml (cdr tree) port))
+             (sxml->xml (cdr tree) port nsmap))
             ((*ENTITY*)
              (if (and (list? (cdr tree)) (= (length (cdr tree)) 1))
                  (entity->xml (cadr tree) port)
@@ -158,9 +184,9 @@
                     (attrs (and (pair? elems) (pair? (car elems))
                                 (eq? '@ (caar elems))
                                 (cdar elems))))
-               (element->xml tag attrs (if attrs (cdr elems) elems) port)))))
+               (element->xml tag attrs (if attrs (cdr elems) elems) port nsmap)))))
         ;; A nodelist.
-        (for-each (lambda (x) (sxml->xml x port)) tree)))
+        (for-each (lambda (x) (sxml->xml x port nsmap)) tree)))
    ((string? tree)
     (string->escaped-xml tree port))
    ((null? tree) *unspecified*)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-21 20:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-08 20:55 sxml simple, sxml->xml and namespaces tomas
2016-06-20  8:56 ` Andy Wingo
2016-06-20 10:52   ` Ricardo Wurmus
2016-06-20 11:20     ` tomas
2016-06-20 12:11     ` Andy Wingo
2016-06-21 20:36       ` Ricardo Wurmus
2016-06-21 20:58         ` Andy Wingo
2016-06-20 11:18   ` tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).