From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Amirouche Boubekki Newsgroups: gmane.lisp.guile.user Subject: Minikanren based database querying Date: Fri, 27 Nov 2015 14:06:42 +0100 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1448629644 15901 80.91.229.3 (27 Nov 2015 13:07:24 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Nov 2015 13:07:24 +0000 (UTC) To: Guile User Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Fri Nov 27 14:07:09 2015 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a2Ijx-0005i4-MT for guile-user@m.gmane.org; Fri, 27 Nov 2015 14:07:05 +0100 Original-Received: from localhost ([::1]:56632 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a2Ik0-0002nb-EP for guile-user@m.gmane.org; Fri, 27 Nov 2015 08:07:08 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35226) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a2Ijn-0002dz-NY for guile-user@gnu.org; Fri, 27 Nov 2015 08:06:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a2Iji-0005eD-Hw for guile-user@gnu.org; Fri, 27 Nov 2015 08:06:55 -0500 Original-Received: from relay3-d.mail.gandi.net ([2001:4b98:c:538::195]:45858) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a2Iji-0005dG-48 for guile-user@gnu.org; Fri, 27 Nov 2015 08:06:50 -0500 Original-Received: from mfilter46-d.gandi.net (mfilter46-d.gandi.net [217.70.178.177]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 4D764A80C1 for ; Fri, 27 Nov 2015 14:06:45 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter46-d.gandi.net Original-Received: from relay3-d.mail.gandi.net ([IPv6:::ffff:217.70.183.195]) by mfilter46-d.gandi.net (mfilter46-d.gandi.net [::ffff:10.0.15.180]) (amavisd-new, port 10024) with ESMTP id Q9APLIsCdOZH for ; Fri, 27 Nov 2015 14:06:43 +0100 (CET) X-Originating-IP: 10.58.1.144 Original-Received: from webmail.gandi.net (unknown [10.58.1.144]) (Authenticated sender: amirouche@hypermove.net) by relay3-d.mail.gandi.net (Postfix) with ESMTPA id E112FA80EE for ; Fri, 27 Nov 2015 14:06:42 +0100 (CET) X-Sender: amirouche@hypermove.net User-Agent: Roundcube Webmail/1.1.2 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4b98:c:538::195 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:12200 Archived-At: Héllo, My initial plan was to work on multithread support for wiredtiger but this sounds more complex and less fun than doing some minikanren. For those that don't know minikanren [1], it's an embedded Domain Specific Language for logic programming. It's similar in principle to prolog. You might be wondering how this relates to database programming. Well, there's two sides: * First, you might want to store logic facts inside database persisted to disk because your dataset doesn't fit into memory * Second, you might want to take advantage of the declarative nature of minikanren programs For this exercise I've chosen to work on tuple database so called (uid, attribute, identifier) aka. uav database, because it's the database that is the most flexible and can span various usecases from relational, graph or hierarchical problems to key/value or document based kind of problems. And it can be loosely typed. To picture how it looks like imagine a giant list of (u a v) tuples like the following, that stores messages from a blogging application: ``` (define database '(("uid1" type post) ("uid1" blog/post "uav db is a tuple db") ("uid1" blog/create-at 0) ("uid2" type tag) ("uid2" tag/name "scheme") ("uid3" type taglink) ("uid3" taglink/tag "uid2") ("uid3" taglink/post "uid1"))) ``` In wiredtiger, `uid` and `attribute` compose the *key*, and the `value` unique *value* column of the entry. There is three primitive queries that can be run against such database: * `(uav-ref dbc uid attribute)` retrieve the value associated with `(UID . ATTRIBUTE)` pair. * `(uav-ref* dbc uid)` retrieve all the `(attribute . value)` pairs associated with `UID` aka. retrieve the document with `UID` as unique identifier. * `(uav-index-ref dbc attribute value)` retrieve all `uid` for which a `(uid ATTRIBUTE VALUE)` tuple exists in the database. I will skip the translation of the last two procedures into minikanren procedures that makes it possible to solve logic problems backed by a database and go straitgh to the higher level interface that use a pattern matching-like interface to query the tuples. This interface is less powerful that the raw minikanren interface in the sens that it doesn't allow so far to create recursive queries but allows to express SQL queries in a terse and nice syntax. For instance, to query all blog posts that have the a given tag, one can use the following query ``` (define (query:posts-with-tag* name) (query* uid? where ((tag?? tag/name name) (taglink?? tag/link tag??) (taglink?? tag/post uid?)))) ``` The where clause declare the pattern that should be looked for. The symbols that ends with a single question mark e.g. `tag?` are return variables. symbols that ends with two question marks e.g. `taglink??` are internal variables matched during querying. (In the underlying implementation `taglink??` is a minikanren variable instantiated with `fresh`) The above query translates in plain english line by line as: * Given a tag with uid `tag??` and `tag/name` equal to the value `name` * Find a tuple with `taglink??` uid where `tag/link` is associated with `tag??` * Such as there is also a tuple with `taglink??` uid where `tag/post` is associated with `uid?` And return `uid?`. The above query can be written using the primitives like that: ``` (define (query:posts-with-tag name) (let* ((tag?? (uav-index-ref dbc 'tag/name name)) (taglink?? (uav-index-ref dbc 'tag/link tag??))) (map (cut uav-ref dbc <> 'tag/post) taglink??))) ``` There is small webapp that make use of this, that can be found over git [2]. It only requires wiredtiger built from sources (develop branch) [3]. All the best! [1] http://minikanren.org/ [2] https://git.framasoft.org/a-guile-mind/nanoblog.git [3] https://github.com/wiredtiger/wiredtiger.git