From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: tomas@tuxteam.de Newsgroups: gmane.lisp.guile.bugs Subject: bug#20339: sxml simple: sxml->xml mishandles namespaces? Date: Wed, 13 Jul 2016 15:24:03 +0200 Message-ID: <20160713132403.GA2349@tuxteam.de> References: <20150415194714.GA30295@tuxteam.de> <87y45vln0f.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; x-action=pgp-signed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1468416847 25997 80.91.229.3 (13 Jul 2016 13:34:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 13 Jul 2016 13:34:07 +0000 (UTC) To: 20339@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Jul 13 15:33:56 2016 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bNKIS-00086O-Qt for guile-bugs@m.gmane.org; Wed, 13 Jul 2016 15:33:53 +0200 Original-Received: from localhost ([::1]:47538 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bNKIR-0000Qp-QE for guile-bugs@m.gmane.org; Wed, 13 Jul 2016 09:33:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51672) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bNKA0-00074s-4v for bug-guile@gnu.org; Wed, 13 Jul 2016 09:25:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bNK9u-0002OU-6E for bug-guile@gnu.org; Wed, 13 Jul 2016 09:25:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:36863) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bNK9u-0002OQ-2l for bug-guile@gnu.org; Wed, 13 Jul 2016 09:25:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bNK9t-0001MD-TU for bug-guile@gnu.org; Wed, 13 Jul 2016 09:25:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: tomas@tuxteam.de Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 13 Jul 2016 13:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20339 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 20339-submit@debbugs.gnu.org id=B20339.14684162575157 (code B ref 20339); Wed, 13 Jul 2016 13:25:01 +0000 Original-Received: (at 20339) by debbugs.gnu.org; 13 Jul 2016 13:24:17 +0000 Original-Received: from localhost ([127.0.0.1]:49200 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bNK97-0001L2-4j for submit@debbugs.gnu.org; Wed, 13 Jul 2016 09:24:17 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:57950 helo=tomasium.tuxteam.de) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bNK91-0001Ko-AI for 20339@debbugs.gnu.org; Wed, 13 Jul 2016 09:24:11 -0400 Original-Received: from tomas by tomasium.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1bNK8y-00019w-FF; Wed, 13 Jul 2016 15:24:04 +0200 In-Reply-To: <87y45vln0f.fsf@pobox.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8299 Archived-At: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Jun 23, 2016 at 09:32:16PM +0200, Andy Wingo wrote: > See thread here as well: > http://thread.gmane.org/gmane.lisp.guile.devel/17709 > > I like Ricardo's patch but have some comments here: > http://article.gmane.org/gmane.lisp.guile.devel/18384 (sorry for cc'ing both of you, but I don't know whether you are subscribed to the bug. Two copies seemed more polite than none). Sorry folks for not coming back earlier. Real Life and things. Since I'm going to be off the 'net for one month starting next Friday, I thought I'll write a short note. I'll be back the 15th of August and am really willing to do whatever it takes to bring this forward. OTOH, if any of you decides to pick it up, I'm sure the results will be better :-) Referring to Oleg Kiseliov's paper [1], there are actually three things involved: - the namespace. This is an XML thing and will typically be an URI (I don't quite remember whether it *must* be an URI, but that's irrelevant. It may contain nasty characters (to XML: it isn't an XML "Name", and potentially to Scheme: there may be patentheses and things in there, so some Schemes won't make a symbol of that; Guile doesn't mind) - the namespace prefix. Again, an XML thing, basically giving a non-nasty abbreviation for the namespace, to stick it to the Name, making a "QName". The association prefix -> namespace is scoped to a node and its descendants, and can be shadowed at some node below - the namespace-id, an SXML thing. In [1], this is typically the namespace, but Oleg Kyselyov made provisions in [1] for a similar "abbreviation" (the user-ns-shortcut in [1], page 3), whose mapping can be attached to any node via the pseudo-attribute *NAMESPACES* [2], which can also carry the original (XML) namespace prefix. As far as I understand the paper, most of the time this namespace-id will be identical to the URI, but it is this what will be prefixed to the tag name symbols in the SXML representation. What Ricardo's patch does is to conflate namespace prefix and namespace-id and provide a mapping (namespace-id aka prefix) -> namespace. This is actually quite elegant, since we don't need the distinction between (XML) prefix and (SXML) namespace-id. I think that we can, at least as (sxml simple) is concerned, ignore this distinction. What is missing? From my point of view: - At xml->sxml time, the user doesn't know which namespaces are in the xml. So it would be nice if the XML parser could provide that. - It would be super-nice if the XML parser could put that into the same nodes it found it, as described in [1] (i.e. in the (*NAMESPACES* ...) pseudo-attribute). This way we wouldn't have a global mapping, but one that resembles the original XML, even with the same prefixes. Less surprises overall. The round trip xml -> sxml -> xml would be (nearly) the identity. With Ricardo's patch it would lump all the namespace declarations up in the top node, which formally is correct, but might scare XML people a bit :-) - At sxml->xml time there should be a way to somehow generate prefixex for "new" namespaces. I don't know at the moment how this would work, that depends on how the user is supposed to insert new nodes in the SXML. Does she specify the namespace? Both prefix (aka namespace-id, under my current assumption) *and* namespace? (note that the namespace-id/prefix alone wouldn't be sufficient). Sorry for this wall of text. I hope it makes some sense. Regards [1] http://okmij.org/ftp/papers/SXML-paper.pdf [2] Actually, I'm cheating here: the thing is part of an "annotations" part, which according to the grammar comes *last*, after all the attributes. But it looks a bit like an attribute, with a strange name and a more complex value. - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAleGQPMACgkQBcgs9XrR2kaMfgCeKbA4pWFrCZoxofDF4n9utgnZ IzYAn1gozFwBLPd/rmNkZvJYDTJ9cIvr =etJd -----END PGP SIGNATURE-----