unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: amirouche <amirouche@hypermove.net>
To: Roel Janssen <roel@gnu.org>
Cc: Guile User <guile-user@gnu.org>
Subject: Re: neon: git for structured data [Was: Functional database]
Date: Wed, 21 Feb 2018 19:41:20 +0100	[thread overview]
Message-ID: <1519238480.19515.0@mail.gandi.net> (raw)
In-Reply-To: <87po4yuy9s.fsf@gnu.org>

Héllo Roel,

Le mer. 21 févr. 2018 à 17:02, Roel Janssen <roel@gnu.org> a écrit :
> Dear Amirouche,
> 
> I'm not exactly sure if this fits in with your plans, but nevertheless
> I'd like to share this code with you.

Thanks for the input.

> 
> I recently looked into using triple stores (actually quad stores)
> and wrote an interface to Redland librdf for Guile.

Indeed quad stores. Triple store are only:

  subject predicate object

whereas quad stores are:

  graph subject predicate object

I did not grasp the difference between triple store and quad stores
until recently. see the definition of the w3c [0]

[0] https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

I somewhat looked at librdf before. In particular this is interesting:

    Storage for graphs in memory and persistently with Oracle Berkeley 
DB,
    MySQL 3-5, PostgreSQL, OpenLink Virtoso, SQLite, files or URIs.

    http://librdf.org/

This is definitely a feature that should be backed into neon.
By the way, wiredtiger is the successor of Oracle Berkley DB.
It was created by the same developers.

The difference between neon and librdf are the following:

- Quads can be version-ed in branches without copy (implemented but
  on triples) making it effectively a quintuple store.

- You can pull / push graphs (called 'world' in librdf, i think)
  ie. you can neon clone part of the remote data repository the
  equivalent of git clone a particular directory (not implemented yet)

- The use of IRIs (or URIs) as 'graph name', 'subject' or 'predicate' 
is not
  enforced, this doesn't break compatibility with existing systems. 
That said,
  right now, I will implement 'object' as literals as the specification 
describe
  them [1] to allow compatibility with existing systems.

[1] https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

Also, the API is I think simpler in neon:

> 
> I attached the source code of the interface.
> With this interface, you can write something like this:
> 
> --8<---------------cut here---------------start------------->8---
> (use-modules (redland rdf) ; The attached module.
>              (system foreign))
> 
> (define world (rdf-world-new))
> (rdf-world-open world)
> 
> (define store (rdf-storage-new
>                world
>                "hashes"
>                "redland"
>                "new=true,hash-type='bdb',dir='path/to/triplestore'"))
> 
> (define model (rdf-model-new world store %null-pointer))
> 
> (define local-uri (rdf-uri-new world 
> "http://localhost:5000/Redland/"))
> (define s (rdf-node-new-from-uri-local-name world local-uri "Test"))
> (define p (rdf-node-new-from-uri-local-name world local-uri 
> "TestPredicate"))
> (define o (rdf-node-new-from-uri-local-name world local-uri 
> "TestObject"))
> 
> (define statement (rdf-statement-new-from-nodes world s p o))
> (rdf-model-add-statement model statement)

The equivalent of this in neon is basically:

   (add context "Test" "TestPredicate" "TestObject")

Where 'context' is the database context somewhat equivalent to a 
'cursor' in
postgresql parlance.

The strings are mapped to 64 bit unsigned integers in the underlying 
storage
to save space and ease comparisons. subjects and predicates are each of 
them
stored in specific tables which hot parts stay in RAM. It makes the 
string
to integer resolution fast. Basically, I rely on the database layer to 
cache
the integer value associated with subjects and predicates, for the time 
being.

Similarly to retrieve a triple right now, it can be done as follow:

   (ref context "Test" "TestPredicate")

It's a minor difference, and librdf API has the advantage of giving the 
choice
to the user to do caching themself.

> (rdf-statement-free statement)
> 
> (rdf-model-size model)
> (rdf-storage-size store)
> 
> ;; Example mime-type: application/rdf+xml
> (define serializer (rdf-serializer-new world %null-pointer 
> "text/turtle" %null-pointer))
> (define serialized (rdf-serializer-serialize-model-to-string 
> serializer local-uri model))
> (format #t "Serialized: ~s~%" (pointer->string serialized))

There is no turtle support yet.

> 
> (rdf-uri-free local-uri)
> (rdf-model-free model)
> (rdf-storage-free store)
> (rdf-world-free world)
> --8<---------------cut here---------------end--------------->8---
> 
> Kind regards,
> Roel Janssen

Thanks Roel!






  reply	other threads:[~2018-02-21 18:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-21 14:49 neon: git for structured data [Was: Functional database] Amirouche Boubekki
2018-02-21 16:02 ` Roel Janssen
2018-02-21 18:41   ` amirouche [this message]
2018-03-05 22:32 ` amirouche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1519238480.19515.0@mail.gandi.net \
    --to=amirouche@hypermove.net \
    --cc=guile-user@gnu.org \
    --cc=roel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).