From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: more capable xml->sxml Date: Mon, 28 Jan 2013 12:05:40 +0100 Message-ID: <87sj5l7ekb.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1359371158 10745 80.91.229.3 (28 Jan 2013 11:05:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Jan 2013 11:05:58 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Jan 28 12:06:18 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TzmXY-0002cG-Lq for guile-devel@m.gmane.org; Mon, 28 Jan 2013 12:06:16 +0100 Original-Received: from localhost ([::1]:43232 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TzmXG-0001t3-Hy for guile-devel@m.gmane.org; Mon, 28 Jan 2013 06:05:58 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:41722) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TzmX6-0001st-7s for guile-devel@gnu.org; Mon, 28 Jan 2013 06:05:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TzmX4-0007ZR-B3 for guile-devel@gnu.org; Mon, 28 Jan 2013 06:05:48 -0500 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:57319 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TzmX4-0007YO-5X for guile-devel@gnu.org; Mon, 28 Jan 2013 06:05:46 -0500 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id 5E4DA9126 for ; Mon, 28 Jan 2013 06:05:44 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to :subject:date:message-id:mime-version:content-type; s=sasl; bh=h c9J9j7hAJHyIOmYVpNga6hFcRs=; b=pKTdI4ftm/shd5WMT3lPxg+PmeAR7IHbL +RXfqskEJZ5Mn1fctPZgKUC2g9sHi5tNa7GX7fuLdnf4c3+ipAz6xgEbD8zeM62x Q1DpMIX0ZoSte44etbUFuSKKU1MCVh7dwrGJbgdsZpuBOnG2OaQ80qphe2EUvwKX Rka3aDPTEU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sasl; b=HdW /JBcV+Zn9h3SHdFm6DMfvo/yf/B681PBOXQHjg5SDdb+/fo2qWZ8/3ceJ+Ug1uSI RXMCHe4jir5ZSZIapetEcjwDscnVFU8O6kOWiH6oDKzxXJxO5WpfT+ueNEX0T+f3 8CuZDBQVOLDtFQRulOC+F96X9pWmouscMIptPHgE= Original-Received: from a-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id 571159125 for ; Mon, 28 Jan 2013 06:05:44 -0500 (EST) Original-Received: from badger (unknown [88.160.190.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id BFC449124 for ; Mon, 28 Jan 2013 06:05:43 -0500 (EST) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) X-Pobox-Relay-ID: A903BE80-693A-11E2-BB2B-0A4F0E5B5709-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 208.72.237.25 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15607 Archived-At: Hi, I just pushed some changes to (sxml simple)'s xml->sxml. Basically it has keyword arguments now that do most of what people have been requesting for a while (non-significant whitespace, easier handling of entities, declaration of namespaces). We can add handling of some doctype fragments as well, perhaps (internal or external). Anyway here are the docs; comments are welcome. This should not introduce any incompatibilities. Andy 7.22.2 Reading and Writing XML ------------------------------ The `(sxml simple)' module presents a basic interface for parsing XML from a port into the Scheme SXML format, and for serializing it back to text. (use-modules (sxml simple)) -- Scheme Procedure: xml->sxml [string-or-port] [#:namespaces='()] [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] [#:entities='()] [#:default-entity-handler=#f] Use SSAX to parse an XML document into SXML. Takes one optional argument, STRING-OR-PORT, which defaults to the current input port. Returns the resulting SXML document. If STRING-OR-PORT is a port, it will be left pointing at the next available character in the port. As is normal in SXML, XML elements parse as tagged lists. Attributes, if any, are placed after the tag, within an `@' element. The root of the resulting XML will be contained in a special tag, `*TOP*'. This tag will contain the root element of the XML, but also any prior processing instructions. (xml->sxml "") => (*TOP* (foo)) (xml->sxml "text") => (*TOP* (foo "text")) (xml->sxml "text") => (*TOP* (foo (@ (kind "bar")) "text")) (xml->sxml "") => (*TOP* (*PI* xml "version=\"1.0\"") (foo)) All namespaces in the XML document must be declared, via `xmlns' attributes. SXML elements built from non-default namespaces will have their tags prefixed with their URI. Users can specify custom prefixes for certain namespaces with the `#:namespaces' keyword argument to `xml->sxml'. (xml->sxml "text") => (*TOP* (http://example.org/ns1:foo "text")) (xml->sxml "text" #:namespaces '((ns1 . "http://example.org/ns1"))) => (*TOP* (ns1:foo "text")) (xml->sxml "" #:namespaces '((ns2 . "http://example.org/ns2"))) => (*TOP* (foo (ns2:baz))) Passing a true `#:declare-namespaces?' argument will cause the user-given `#:namespaces' to be treated as if they were declared on the root element. (xml->sxml "" #:namespaces '((ns2 . "http://example.org/ns2"))) => error: undeclared namespace: `bar' (xml->sxml "" #:namespaces '((ns2 . "http://example.org/ns2")) #:declare-namespaces? #t) => (*TOP* (foo (ns2:baz))) By default, all whitespace in XML is significant. Passing the `#:trim-whitespace?' keyword argument to `xml->sxml' will trim whitespace in front, behind and between elements, treating it as "unsignificant". Whitespace in text fragments is left alone. (xml->sxml "\n Alfie the parrot! \n") => (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n") (xml->sxml "\n Alfie the parrot! \n" #:trim-whitespace? #t) => (*TOP* (foo (bar " Alfie the parrot! ")) Parsed entities may be declared with the `#:entities' keyword argument, or handled with the `#:default-entity-handler'. By default, only the standard `<', `>', `&', `'' and `"' entities are defined, as well as the `&#N;' and `&#xN;' (decimal and hexadecimal) numeric character entities. (xml->sxml "&") => (*TOP* (foo "&")) (xml->sxml " ") => error: undefined entity: nbsp (xml->sxml " ") => (*TOP* (foo "\xa0")) (xml->sxml " " #:entities '((nbsp . "\xa0"))) => (*TOP* (foo "\xa0")) (xml->sxml "  &foo;" #:default-entity-handler (lambda (port name) (case name ((nbsp) "\xa0") (else (format (current-warning-port) "~a:~a:~a: undefined entitity: ~a\n" (or (port-filename port) "") (port-line port) (port-column port) name) (symbol->string name))))) -| :0:17: undefined entitity: foo => (*TOP* (foo "\xa0 foo")) -- http://wingolog.org/