From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: amirouche Newsgroups: gmane.lisp.guile.user Subject: Re: neon: git for structured data [Was: Functional database] Date: Wed, 21 Feb 2018 19:41:20 +0100 Message-ID: <1519238480.19515.0@mail.gandi.net> References: <3bf20807996ce0bdc4e5ca6ea1d3776f@hypermove.net> <87po4yuy9s.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1519238390 11220 195.159.176.226 (21 Feb 2018 18:39:50 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 21 Feb 2018 18:39:50 +0000 (UTC) Cc: Guile User To: Roel Janssen Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed Feb 21 19:39:46 2018 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eoZIw-0002Gy-4j for guile-user@m.gmane.org; Wed, 21 Feb 2018 19:39:46 +0100 Original-Received: from localhost ([::1]:34225 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoZKw-0005oZ-LI for guile-user@m.gmane.org; Wed, 21 Feb 2018 13:41:50 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47574) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoZKa-0005oH-LS for guile-user@gnu.org; Wed, 21 Feb 2018 13:41:31 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eoZKY-0007Ma-0x for guile-user@gnu.org; Wed, 21 Feb 2018 13:41:28 -0500 Original-Received: from relay2-d.mail.gandi.net ([2001:4b98:c:538::194]:49793) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eoZKX-0007Kk-OB; Wed, 21 Feb 2018 13:41:25 -0500 Original-Received: from ubudec (unknown [IPv6:2a01:e35:2ef3:d930:d91c:d71d:536d:c6eb]) (Authenticated sender: amirouche@hypermove.net) by relay2-d.mail.gandi.net (Postfix) with ESMTPSA id 2E9D1C5A54; Wed, 21 Feb 2018 19:41:22 +0100 (CET) In-Reply-To: <87po4yuy9s.fsf@gnu.org> X-Mailer: geary/0.12-dev X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4b98:c:538::194 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: "guile-user" Xref: news.gmane.org gmane.lisp.guile.user:14457 Archived-At: H=E9llo Roel, Le mer. 21 f=E9vr. 2018 =E0 17:02, Roel Janssen a =E9crit : > Dear Amirouche, >=20 > I'm not exactly sure if this fits in with your plans, but nevertheless > I'd like to share this code with you. Thanks for the input. >=20 > I recently looked into using triple stores (actually quad stores) > and wrote an interface to Redland librdf for Guile. Indeed quad stores. Triple store are only: subject predicate object whereas quad stores are: graph subject predicate object I did not grasp the difference between triple store and quad stores until recently. see the definition of the w3c [0] [0] https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph I somewhat looked at librdf before. In particular this is interesting: Storage for graphs in memory and persistently with Oracle Berkeley=20 DB, MySQL 3-5, PostgreSQL, OpenLink Virtoso, SQLite, files or URIs. http://librdf.org/ This is definitely a feature that should be backed into neon. By the way, wiredtiger is the successor of Oracle Berkley DB. It was created by the same developers. The difference between neon and librdf are the following: - Quads can be version-ed in branches without copy (implemented but on triples) making it effectively a quintuple store. - You can pull / push graphs (called 'world' in librdf, i think) ie. you can neon clone part of the remote data repository the equivalent of git clone a particular directory (not implemented yet) - The use of IRIs (or URIs) as 'graph name', 'subject' or 'predicate'=20 is not enforced, this doesn't break compatibility with existing systems.=20 That said, right now, I will implement 'object' as literals as the specification=20 describe them [1] to allow compatibility with existing systems. [1] https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal Also, the API is I think simpler in neon: >=20 > I attached the source code of the interface. > With this interface, you can write something like this: >=20 > --8<---------------cut here---------------start------------->8--- > (use-modules (redland rdf) ; The attached module. > (system foreign)) >=20 > (define world (rdf-world-new)) > (rdf-world-open world) >=20 > (define store (rdf-storage-new > world > "hashes" > "redland" > "new=3Dtrue,hash-type=3D'bdb',dir=3D'path/to/triplestore'"= )) >=20 > (define model (rdf-model-new world store %null-pointer)) >=20 > (define local-uri (rdf-uri-new world=20 > "http://localhost:5000/Redland/")) > (define s (rdf-node-new-from-uri-local-name world local-uri "Test")) > (define p (rdf-node-new-from-uri-local-name world local-uri=20 > "TestPredicate")) > (define o (rdf-node-new-from-uri-local-name world local-uri=20 > "TestObject")) >=20 > (define statement (rdf-statement-new-from-nodes world s p o)) > (rdf-model-add-statement model statement) The equivalent of this in neon is basically: (add context "Test" "TestPredicate" "TestObject") Where 'context' is the database context somewhat equivalent to a=20 'cursor' in postgresql parlance. The strings are mapped to 64 bit unsigned integers in the underlying=20 storage to save space and ease comparisons. subjects and predicates are each of=20 them stored in specific tables which hot parts stay in RAM. It makes the=20 string to integer resolution fast. Basically, I rely on the database layer to=20 cache the integer value associated with subjects and predicates, for the time=20 being. Similarly to retrieve a triple right now, it can be done as follow: (ref context "Test" "TestPredicate") It's a minor difference, and librdf API has the advantage of giving the=20 choice to the user to do caching themself. > (rdf-statement-free statement) >=20 > (rdf-model-size model) > (rdf-storage-size store) >=20 > ;; Example mime-type: application/rdf+xml > (define serializer (rdf-serializer-new world %null-pointer=20 > "text/turtle" %null-pointer)) > (define serialized (rdf-serializer-serialize-model-to-string=20 > serializer local-uri model)) > (format #t "Serialized: ~s~%" (pointer->string serialized)) There is no turtle support yet. >=20 > (rdf-uri-free local-uri) > (rdf-model-free model) > (rdf-storage-free store) > (rdf-world-free world) > --8<---------------cut here---------------end--------------->8--- >=20 > Kind regards, > Roel Janssen Thanks Roel! =