unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* more capable xml->sxml
@ 2013-01-28 11:05 Andy Wingo
  2013-01-29 13:13 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Andy Wingo @ 2013-01-28 11:05 UTC (permalink / raw)
  To: guile-devel

Hi,

I just pushed some changes to (sxml simple)'s xml->sxml.  Basically it
has keyword arguments now that do most of what people have been
requesting for a while (non-significant whitespace, easier handling of
entities, declaration of namespaces).  We can add handling of some
doctype fragments as well, perhaps (internal or external).  Anyway here
are the docs; comments are welcome.  This should not introduce any
incompatibilities.

Andy

7.22.2 Reading and Writing XML
------------------------------

The `(sxml simple)' module presents a basic interface for parsing XML
from a port into the Scheme SXML format, and for serializing it back to
text.

     (use-modules (sxml simple))

 -- Scheme Procedure: xml->sxml [string-or-port] [#:namespaces='()]
          [#:declare-namespaces?=#t] [#:trim-whitespace?=#f]
          [#:entities='()] [#:default-entity-handler=#f]
     Use SSAX to parse an XML document into SXML. Takes one optional
     argument, STRING-OR-PORT, which defaults to the current input
     port.  Returns the resulting SXML document.  If STRING-OR-PORT is
     a port, it will be left pointing at the next available character
     in the port.

   As is normal in SXML, XML elements parse as tagged lists.
Attributes, if any, are placed after the tag, within an `@' element.
The root of the resulting XML will be contained in a special tag,
`*TOP*'.  This tag will contain the root element of the XML, but also
any prior processing instructions.

     (xml->sxml "<foo/>")
     => (*TOP* (foo))
     (xml->sxml "<foo>text</foo>")
     => (*TOP* (foo "text"))
     (xml->sxml "<foo kind=\"bar\">text</foo>")
     => (*TOP* (foo (@ (kind "bar")) "text"))
     (xml->sxml "<?xml version=\"1.0\"?><foo/>")
     => (*TOP* (*PI* xml "version=\"1.0\"") (foo))

   All namespaces in the XML document must be declared, via `xmlns'
attributes.  SXML elements built from non-default namespaces will have
their tags prefixed with their URI.  Users can specify custom prefixes
for certain namespaces with the `#:namespaces' keyword argument to
`xml->sxml'.

     (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
     => (*TOP* (http://example.org/ns1:foo "text"))
     (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
                #:namespaces '((ns1 . "http://example.org/ns1")))
     => (*TOP* (ns1:foo "text"))
     (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
                #:namespaces '((ns2 . "http://example.org/ns2")))
     => (*TOP* (foo (ns2:baz)))

   Passing a true `#:declare-namespaces?' argument will cause the
user-given `#:namespaces' to be treated as if they were declared on the
root element.

     (xml->sxml "<foo><ns2:baz/></foo>"
                #:namespaces '((ns2 . "http://example.org/ns2")))
     => error: undeclared namespace: `bar'
     (xml->sxml "<foo><ns2:baz/></foo>"
                #:namespaces '((ns2 . "http://example.org/ns2"))
                #:declare-namespaces? #t)
     => (*TOP* (foo (ns2:baz)))

   By default, all whitespace in XML is significant.  Passing the
`#:trim-whitespace?' keyword argument to `xml->sxml' will trim
whitespace in front, behind and between elements, treating it as
"unsignificant".  Whitespace in text fragments is left alone.

     (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
     => (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")
     (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
                #:trim-whitespace? #t)
     => (*TOP* (foo (bar " Alfie the parrot! "))

   Parsed entities may be declared with the `#:entities' keyword
argument, or handled with the `#:default-entity-handler'.  By default,
only the standard `&lt;', `&gt;', `&amp;', `&apos;' and `&quot;'
entities are defined, as well as the `&#N;' and `&#xN;' (decimal and
hexadecimal) numeric character entities.

     (xml->sxml "<foo>&amp;</foo>")
     => (*TOP* (foo "&"))
     (xml->sxml "<foo>&nbsp;</foo>")
     => error: undefined entity: nbsp
     (xml->sxml "<foo>&#xA0;</foo>")
     => (*TOP* (foo "\xa0"))
     (xml->sxml "<foo>&nbsp;</foo>"
                #:entities '((nbsp . "\xa0")))
     => (*TOP* (foo "\xa0"))
     (xml->sxml "<foo>&nbsp; &foo;</foo>"
                #:default-entity-handler
                (lambda (port name)
                  (case name
                    ((nbsp) "\xa0")
                    (else
                     (format (current-warning-port)
                             "~a:~a:~a: undefined entitity: ~a\n"
                             (or (port-filename port) "<unknown file>")
                             (port-line port) (port-column port)
                             name)
                     (symbol->string name)))))
     -| <unknown file>:0:17: undefined entitity: foo
     => (*TOP* (foo "\xa0 foo"))

-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: more capable xml->sxml
  2013-01-28 11:05 more capable xml->sxml Andy Wingo
@ 2013-01-29 13:13 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2013-01-29 13:13 UTC (permalink / raw)
  To: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> I just pushed some changes to (sxml simple)'s xml->sxml.  Basically it
> has keyword arguments now that do most of what people have been
> requesting for a while (non-significant whitespace, easier handling of
> entities, declaration of namespaces).  We can add handling of some
> doctype fragments as well, perhaps (internal or external).  Anyway here
> are the docs; comments are welcome.  This should not introduce any
> incompatibilities.

Excellent, thanks for doing this!

>    Parsed entities may be declared with the `#:entities' keyword
> argument, or handled with the `#:default-entity-handler'.  By default,
> only the standard `&lt;', `&gt;', `&amp;', `&apos;' and `&quot;'
> entities are defined, as well as the `&#N;' and `&#xN;' (decimal and
> hexadecimal) numeric character entities.

This one should be of particular interest to ttn’s IXIN (among others.)

Ludo’.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-01-29 13:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-28 11:05 more capable xml->sxml Andy Wingo
2013-01-29 13:13 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).