unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: tomas@tuxteam.de
To: 20339@debbugs.gnu.org
Subject: bug#20339: sxml simple: sxml->xml mishandles namespaces?
Date: Wed, 13 Jul 2016 15:24:03 +0200	[thread overview]
Message-ID: <20160713132403.GA2349@tuxteam.de> (raw)
In-Reply-To: <87y45vln0f.fsf@pobox.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Jun 23, 2016 at 09:32:16PM +0200, Andy Wingo wrote:
> See thread here as well:
> http://thread.gmane.org/gmane.lisp.guile.devel/17709
> 
> I like Ricardo's patch but have some comments here:
> http://article.gmane.org/gmane.lisp.guile.devel/18384

(sorry for cc'ing both of you, but I don't know whether you are
subscribed to the bug. Two copies seemed more polite than none).

Sorry folks for not coming back earlier. Real Life and things.

Since I'm going to be off the 'net for one month starting next Friday,
I thought I'll write a short note.

I'll be back the 15th of August and am really willing to do whatever
it takes to bring this forward. OTOH, if any of you decides to pick
it up, I'm sure the results will be better :-)

Referring to Oleg Kiseliov's paper [1], there are actually three
things involved:

 - the namespace. This is an XML thing and will typically be
   an URI (I don't quite remember whether it *must* be an
   URI, but that's irrelevant. It may contain nasty characters
   (to XML: it isn't an XML "Name", and potentially to Scheme:
   there may be patentheses and things in there, so some
   Schemes won't make a symbol of that; Guile doesn't mind)

 - the namespace prefix. Again, an XML thing, basically giving
   a non-nasty abbreviation for the namespace, to stick it to
   the Name, making a "QName". The association prefix -> namespace
   is scoped to a node and its descendants, and can be shadowed
   at some node below

 - the namespace-id, an SXML thing. In [1], this is typically
   the namespace, but Oleg Kyselyov made provisions in [1] for a
   similar "abbreviation" (the user-ns-shortcut in [1], page 3),
   whose mapping can be attached to any node via the
   pseudo-attribute *NAMESPACES* [2], which can also carry the
   original (XML) namespace prefix.

   As far as I understand the paper, most of the time this
   namespace-id will be identical to the URI, but it is this
   what will be prefixed to the tag name symbols in the
   SXML representation.

What Ricardo's patch does is to conflate namespace prefix and
namespace-id and provide a mapping (namespace-id aka prefix) ->
namespace. This is actually quite elegant, since we don't need
the distinction between (XML) prefix and (SXML) namespace-id.

I think that we can, at least as (sxml simple) is concerned,
ignore this distinction.

What is missing? From my point of view:

 - At xml->sxml time, the user doesn't know which namespaces
   are in the xml. So it would be nice if the XML parser
   could provide that.

 - It would be super-nice if the XML parser could put that
   into the same nodes it found it, as described in [1]
   (i.e. in the (*NAMESPACES* ...) pseudo-attribute).
   This way we wouldn't have a global mapping, but one
   that resembles the original XML, even with the same
   prefixes. Less surprises overall. The round trip
   xml -> sxml -> xml would be (nearly) the identity.

   With Ricardo's patch it would lump all the namespace
   declarations up in the top node, which formally is
   correct, but might scare XML people a bit :-)

 - At sxml->xml time there should be a way to somehow
   generate prefixex for "new" namespaces. I don't know
   at the moment how this would work, that depends on
   how the user is supposed to insert new nodes in the
   SXML. Does she specify the namespace? Both prefix
   (aka namespace-id, under my current assumption) *and*
   namespace? (note that the namespace-id/prefix alone
   wouldn't be sufficient).

Sorry for this wall of text. I hope it makes some sense.

Regards

[1] http://okmij.org/ftp/papers/SXML-paper.pdf
[2] Actually, I'm cheating here: the thing is part of an
   "annotations" part, which according to the grammar comes
   *last*, after all the attributes. But it looks a bit
   like an attribute, with a strange name and a more
   complex value.

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAleGQPMACgkQBcgs9XrR2kaMfgCeKbA4pWFrCZoxofDF4n9utgnZ
IzYAn1gozFwBLPd/rmNkZvJYDTJ9cIvr
=etJd
-----END PGP SIGNATURE-----





  reply	other threads:[~2016-07-13 13:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-15 19:47 bug#20339: sxml simple: sxml->xml mishandles namespaces? tomas
2015-04-20  7:45 ` bug#20339: [PATCH] sxml->xml and namespaces: updated patch tomas
2015-04-21  9:24 ` bug#20339: sxml simple: sxml->xml mishandles namespaces? Ricardo Wurmus
2015-04-21  9:44   ` tomas
2015-04-22 14:29     ` Ricardo Wurmus
2015-04-23  6:57       ` tomas
2015-04-23  7:04         ` Ricardo Wurmus
2015-04-23  7:40           ` tomas
2015-04-25 20:25       ` tomas
2015-04-26 10:28         ` tomas
2016-06-23 19:32 ` Andy Wingo
2016-07-13 13:24   ` tomas [this message]
2016-07-13 18:08     ` tomas
2016-07-14 10:10     ` Andy Wingo
2016-07-14 10:26       ` tomas
2019-02-04 20:44       ` Ricardo Wurmus
2019-02-04 22:55         ` John Cowan
2019-02-05  9:12           ` Ricardo Wurmus
2019-02-05 12:57             ` Ricardo Wurmus
2019-04-08 12:14               ` tomas
2019-02-12  9:56         ` tomas
2019-02-12 20:30           ` Ricardo Wurmus
2019-05-03 10:46             ` bug#20339: Taking a step back (was: sxml simple: sxml->xml mishandles namespaces?) tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160713132403.GA2349@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=20339@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).