unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] Prototype register-path
@ 2017-06-03  8:47 Caleb Ristvedt
  2017-06-05  8:38 ` Caleb Ristvedt
  0 siblings, 1 reply; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-03  8:47 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

So far I've got the main functionality of register-path working: it
successfully puts the passed information in a database (at least in my
manual tests it did), and it even puts it in the right database based on
prefix and state-directory. But there are quite a few side-effects
missing, most of them noted above register-path in store.scm. There's
deduplication, resetting timestamps and permissions and stuff (the C++
code called it "canonicalizing"), actually returning true in the case of
success like the documentation says it should, processing environment
variables, and figuring out if I should be automatically creating the
database if it doesn't exist yet.

Currently the register-path unit test fails due to being unable to open
the database for writing - the test doesn't specify a prefix or
state-directory, so it's trying to open /var/guix/db/db.sqlite, which
requires root access to open for writing. Perhaps the test is using an
environment variable or similar to specify a different database, which I
haven't implemented yet.

Comments welcome.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 15592 bytes --]

From f4af8973a7b41664130b05bbe8a4069f62a087c3 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:26:05 -0500
Subject: [PATCH] Implement prototype register-path in scheme

New file and module sql.scm: mostly utility macros for sqlite-register.
---
 gnu/packages/package-management.scm |   3 +-
 guix/sql.scm                        | 219 ++++++++++++++++++++++++++++++++++++
 guix/store.scm                      |  77 ++++++++++---
 3 files changed, 280 insertions(+), 19 deletions(-)
 create mode 100644 guix/sql.scm

diff --git a/gnu/packages/package-management.scm b/gnu/packages/package-management.scm
index ceaf51b67..80ffe0a2e 100644
--- a/gnu/packages/package-management.scm
+++ b/gnu/packages/package-management.scm
@@ -249,7 +249,8 @@
       (propagated-inputs
        `(("gnutls" ,gnutls/guile-2.2)             ;for 'guix download' & co.
          ("guile-json" ,guile-json)
-         ("guile-ssh" ,guile-ssh)))
+         ("guile-ssh" ,guile-ssh)
+         ("guile-sqlite3" ,guile-sqlite3)))
 
       (home-page "https://www.gnu.org/software/guix/")
       (synopsis "Functional package manager for installed software packages and versions")
diff --git a/guix/sql.scm b/guix/sql.scm
new file mode 100644
index 000000000..ae4a1d27d
--- /dev/null
+++ b/guix/sql.scm
@@ -0,0 +1,219 @@
+(define-module (guix sql)
+  #:use-module (sqlite3)
+  #:use-module (system foreign)
+  #:use-module (rnrs bytevectors)
+  #:use-module (srfi srfi-9)
+  #:export (sqlite-register))
+
+;; Miscellaneous SQL stuff, currently just setup for sqlite-register. Mostly
+;; macros.
+
+;; This really belongs in guile-sqlite3, as can be seen from the @@s.
+(define sqlite-last-insert-rowid
+  "Gives the row id of the last inserted row in DB."
+  (let ((last-rowid (pointer->procedure
+                     int
+                     (dynamic-func "sqlite3_last_insert_rowid"
+                                   (@@ (sqlite3) libsqlite3))
+                     (list '*))))
+    (lambda (db)
+      (last-rowid ((@@ (sqlite3) db-pointer) db)))))
+
+
+;; Should I go from key->index here or try to change that in guile-sqlite3?
+(define-syntax sql-parameters
+  "Converts key-value pairs into sqlite bindings for a specific statement."
+  (syntax-rules ()
+    ((sql-parameters statement (name1 val1) (name2 val2) (name3 val3) ...)
+     (begin (sqlite-bind statement name1 val1)
+            (sql-parameters statement (name2 val2) (name3 val3) ...)))
+    ((sql-parameters statement (name value))
+     (sqlite-bind statement name value))))
+
+(define* (step-all statement #:optional (callback noop))
+  "Step until statement is completed. Return number of rows."
+  ;; Where "number of rows" is assumed to be number of steps taken, excluding
+  ;; the last one.
+  (let maybe-step ((ret (sqlite-step statement))
+                   (count 0))
+    (if ret
+        (maybe-step ret (+ count 1))
+        count)))
+
+;; I get the feeling schemers have probably already got this "with" business
+;; much more automated than this...
+(define-syntax with-sql-statement
+  "Automatically prepares statements and then finalizes statements once the
+scope of this macro is left. Also with built-in sqlite parameter binding via
+key-value pairs."
+  (syntax-rules ()
+    ((with-sql-statement db sql statement-var
+                         ((name1 val1) (name2 val2) ...)
+                         exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement-var
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))
+    ((with-sql-statement db sql statement-var () exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))))
+
+(define-syntax with-sql-database
+  "Automatically closes the database once the scope of this macro is left."
+  (syntax-rules ()
+    ((with-sql-database location db-var exps ...)
+     (let ((db-var (sqlite-open location)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-close db-var)))))))
+
+(define-syntax run-sql
+  (syntax-rules ()
+    "For one-off queries that don't get repeated on the same
+database. Everything after database and sql source should be 2-element lists
+containing the sql placeholder name and the value to use. Returns the number
+of rows."
+    ((run-sql db sql (name1 val1) (name2 val2) ...)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))
+    ((run-sql db sql)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))))
+
+(define-syntax run-statement
+  (syntax-rules ()
+    "For compiled statements that may be run multiple times. Everything after
+database and sql source should be 2-element lists containing the sql
+placeholder name and the value to use. Returns the number of rows."
+    ((run-sql db statement (name1 val1) (name2 val2) ...)
+     (dynamic-wind noop
+                   (lambda ()
+                     (sql-parameters statement
+                                     (name1 val1)
+                                     (name2 val2) ...)
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))
+    ((run-sql db statement)
+     (dynamic-wind noop
+                   (lambda ()
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))))
+
+(define path-id-sql
+  "SELECT id FROM ValidPaths WHERE path = $path")
+
+(define (single-result statement)
+  "Gives the first element of the first row returned by statement."
+  (let ((row (sqlite-step statement)))
+    (if row
+        (vector-ref row 0)
+        #f)))
+
+(define* (path-id db path)
+  "If the path \"path\" exists in the ValidPaths table, return its
+id. Otherwise, return #f. If you already have a compiled statement for this
+purpose, you can give it as statement."
+  (with-sql-statement db path-id-sql statement
+                      (;("$path" path)
+                       (1 path))
+                      (single-result statement)))
+
+
+(define update-sql
+  "UPDATE ValidPaths SET hash = $hash, registrationTime = $time, deriver =
+$deriver, narSize = $size WHERE id = $id")
+
+(define insert-sql
+  "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
+VALUES ($path, $hash, $time, $deriver, $size)")
+
+(define (update-or-insert #:key db path deriver hash nar-size time)
+  "The classic update-if-exists and insert-if-doesn't feature that sqlite
+doesn't exactly have... they've got something close, but it involves deleting
+and re-inserting instead of updating, which causes problems with foreign keys,
+of course. Returns the row id of the row that was modified or inserted."
+  (let ((id (path-id db path)))
+    (if id
+        (begin
+          (run-sql db update-sql
+                   ;; As you may have noticed, sqlite-bind doesn't behave
+                   ;; exactly how I was expecting...
+                   ;; ("$id" id)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (5 id)
+                   (3 deriver)
+                   (1 hash)
+                   (4 nar-size)
+                   (2 time))
+          id)
+        (begin
+          (run-sql db insert-sql
+                   ;; ("$path" path)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (1 path)
+                   (4 deriver)
+                   (2 hash)
+                   (5 nar-size)
+                   (3 time))
+          (sqlite-last-insert-rowid db)))))
+
+(define add-reference-sql
+  "INSERT OR IGNORE INTO Refs (referrer, reference) SELECT $referrer, id
+FROM ValidPaths WHERE path = $reference")
+
+(define (add-references db referrer references)
+  "referrer is the id of the referring store item, references is a list
+containing store item paths being referred to. Note that all of the store
+items in \"references\" should already be registered."
+  (with-sql-statement db add-reference-sql add-reference-statement ()
+                      (for-each (lambda (reference)
+                                  (run-statement db
+                                                 add-reference-statement
+                                                 ;("$referrer" referrer)
+                                                 ;("$reference" reference)
+                                                 (1 referrer)
+                                                 (2 reference)))
+                                references)))
+
+;; XXX figure out caching of statement and database objects... later
+(define* (sqlite-register #:key dbpath path references deriver hash nar-size)
+  "Registers this stuff in a database specified by DBPATH. PATH is the string
+path of some store item, REFERENCES is a list of string paths which the store
+item PATH refers to (they need to be already registered!), DERIVER is a string
+path of the derivation that created the store item PATH, HASH is the
+base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
+\"sha256:\") after being converted to nar form, and nar-size is the size in
+bytes of the store item denoted by PATH after being converted to nar form."
+  (with-sql-database dbpath db
+   (let ((id (update-or-insert db path deriver hash nar-size (current-time))))
+     (add-references db id references))))
diff --git a/guix/store.scm b/guix/store.scm
index c94dfea95..f41856fe4 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -27,6 +27,7 @@
   #:use-module (guix hash)
   #:autoload   (guix build syscalls) (terminal-columns)
   #:use-module (rnrs bytevectors)
+  #:use-module (rnrs io ports)
   #:use-module (ice-9 binary-ports)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-9)
@@ -41,6 +42,8 @@
   #:use-module (ice-9 vlist)
   #:use-module (ice-9 popen)
   #:use-module (web uri)
+  #:use-module (sqlite3)
+  #:use-module (guix sql)
   #:export (%daemon-socket-uri
             %gc-roots-directory
             %default-substitute-urls
@@ -1206,32 +1209,70 @@ The result is always the empty list unless the daemon was started with
 This makes sense only when the daemon was started with '--cache-failures'."
   boolean)
 
+
+;; Would it be better to just make WRITE-FILE give size as well? I question
+;; the general utility of this approach.
+(define (counting-wrapper-port output-port)
+  "Some custom ports don't implement GET-POSITION at all. But if we want to
+figure out how many bytes are being written, we will want to use that. So this
+makes a wrapper around a port which implements GET-POSITION."
+  (let ((byte-count 0))
+    (make-custom-binary-output-port "counting-wrapper"
+                                    (lambda (bytes offset count)
+                                      (set! byte-count
+                                        (+ byte-count count))
+                                      (put-bytevector output-port bytes
+                                                      offset count)
+                                      count)
+                                    (lambda ()
+                                      byte-count)
+                                    #f
+                                    (lambda ()
+                                      (close-port output-port)))))
+
+
+(define (nar-sha256 file)
+  "Gives the sha256 hash of a file and the size of the file in nar form."
+  (let-values (((port get-hash) (open-sha256-port)))
+    (let ((wrapper (counting-wrapper-port port)))
+      (write-file file wrapper)
+      (force-output wrapper)
+      (force-output port)
+      (let ((hash (get-hash))
+            (size (port-position wrapper)))
+        (close-port wrapper)
+        (values hash
+                size)))))
+
+;; TODO: make this canonicalize store items that are registered. This involves
+;; setting permissions and timestamps, I think. Also, run a "deduplication
+;; pass", whatever that involves. Also, honor environment variables. Also,
+;; handle databases not existing yet (what should the default behavior be?
+;; Figuring out how the C++ stuff currently does it sounds like a lot of
+;; grepping for global variables...)
+
 (define* (register-path path
-                        #:key (references '()) deriver prefix
-                        state-directory)
+                        #:key (references '()) deriver (prefix "")
+                        (state-directory
+                         (string-append prefix %state-directory)))
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
-not #f, it must be the name of the directory containing the new store to
-initialize; if STATE-DIRECTORY is not #f, it must be a string containing the
+given, it must be the name of the directory containing the new store to
+initialize; if STATE-DIRECTORY is given, it must be a string containing the
 absolute file name to the state directory of the store being initialized.
 Return #t on success.
 
 Use with care as it directly modifies the store!  This is primarily meant to
 be used internally by the daemon's build hook."
-  ;; Currently this is implemented by calling out to the fine C++ blob.
-  (let ((pipe (apply open-pipe* OPEN_WRITE %guix-register-program
-                     `(,@(if prefix
-                             `("--prefix" ,prefix)
-                             '())
-                       ,@(if state-directory
-                             `("--state-directory" ,state-directory)
-                             '())))))
-    (and pipe
-         (begin
-           (format pipe "~a~%~a~%~a~%"
-                   path (or deriver "") (length references))
-           (for-each (cut format pipe "~a~%" <>) references)
-           (zero? (close-pipe pipe))))))
+  (let* ((to-register (string-append %store-directory "/" (basename path))))
+    (let-values (((hash nar-size)
+                  (nar-sha256 (string-append prefix "/" to-register))))
+      (sqlite-register (string-append state-directory "/db/db.sqlite")
+                       to-register
+                       references
+                       deriver
+                       (string-append "sha256:" (bytevector->base16-string hash))
+                       nar-size))))
 
 \f
 ;;;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-03  8:47 [PATCH] Prototype register-path Caleb Ristvedt
@ 2017-06-05  8:38 ` Caleb Ristvedt
  2017-06-05 20:34   ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-05  8:38 UTC (permalink / raw)
  Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]


I think I may have accidentally left some stuff out of that patch due to
being new to "git format-patch" - I did several fixups which I'm not
sure were included. Anyway, I've changed it so that register-path now
honors the NIX_* environment variables that are relevant to it, which
means it now passes "make check TESTS=tests/store.scm". Still quite a
bit more to do, mainly resetting timestamps/permissions and
deduplication.

I'm especially pleased with how the interaction between environment
variables, parameters, and defaults is now a straightforward priority
list using a cond. Said interactions were spread out across the C++
codebase using global variables (well, one global variable) and lots of
state-changing, so I feel pretty good about getting that in one place.

A question about protocol, though - should followup patches like this be
replies or new top-level posts? And how often should I send them?

Again, comments welcome.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 15596 bytes --]

From 90f250814b1456387b8b0341b1f245a1c4e05f66 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:26:05 -0500
Subject: [PATCH 1/7] Implement prototype register-path in scheme

New file and module sql.scm: mostly utility macros for sqlite-register.
---
 gnu/packages/package-management.scm |   3 +-
 guix/sql.scm                        | 219 ++++++++++++++++++++++++++++++++++++
 guix/store.scm                      |  77 ++++++++++---
 3 files changed, 280 insertions(+), 19 deletions(-)
 create mode 100644 guix/sql.scm

diff --git a/gnu/packages/package-management.scm b/gnu/packages/package-management.scm
index af91ec1d7..50be3a23f 100644
--- a/gnu/packages/package-management.scm
+++ b/gnu/packages/package-management.scm
@@ -250,7 +250,8 @@
       (propagated-inputs
        `(("gnutls" ,gnutls/guile-2.2)             ;for 'guix download' & co.
          ("guile-json" ,guile-json)
-         ("guile-ssh" ,guile-ssh)))
+         ("guile-ssh" ,guile-ssh)
+         ("guile-sqlite3" ,guile-sqlite3)))
 
       (home-page "https://www.gnu.org/software/guix/")
       (synopsis "Functional package manager for installed software packages and versions")
diff --git a/guix/sql.scm b/guix/sql.scm
new file mode 100644
index 000000000..ae4a1d27d
--- /dev/null
+++ b/guix/sql.scm
@@ -0,0 +1,219 @@
+(define-module (guix sql)
+  #:use-module (sqlite3)
+  #:use-module (system foreign)
+  #:use-module (rnrs bytevectors)
+  #:use-module (srfi srfi-9)
+  #:export (sqlite-register))
+
+;; Miscellaneous SQL stuff, currently just setup for sqlite-register. Mostly
+;; macros.
+
+;; This really belongs in guile-sqlite3, as can be seen from the @@s.
+(define sqlite-last-insert-rowid
+  "Gives the row id of the last inserted row in DB."
+  (let ((last-rowid (pointer->procedure
+                     int
+                     (dynamic-func "sqlite3_last_insert_rowid"
+                                   (@@ (sqlite3) libsqlite3))
+                     (list '*))))
+    (lambda (db)
+      (last-rowid ((@@ (sqlite3) db-pointer) db)))))
+
+
+;; Should I go from key->index here or try to change that in guile-sqlite3?
+(define-syntax sql-parameters
+  "Converts key-value pairs into sqlite bindings for a specific statement."
+  (syntax-rules ()
+    ((sql-parameters statement (name1 val1) (name2 val2) (name3 val3) ...)
+     (begin (sqlite-bind statement name1 val1)
+            (sql-parameters statement (name2 val2) (name3 val3) ...)))
+    ((sql-parameters statement (name value))
+     (sqlite-bind statement name value))))
+
+(define* (step-all statement #:optional (callback noop))
+  "Step until statement is completed. Return number of rows."
+  ;; Where "number of rows" is assumed to be number of steps taken, excluding
+  ;; the last one.
+  (let maybe-step ((ret (sqlite-step statement))
+                   (count 0))
+    (if ret
+        (maybe-step ret (+ count 1))
+        count)))
+
+;; I get the feeling schemers have probably already got this "with" business
+;; much more automated than this...
+(define-syntax with-sql-statement
+  "Automatically prepares statements and then finalizes statements once the
+scope of this macro is left. Also with built-in sqlite parameter binding via
+key-value pairs."
+  (syntax-rules ()
+    ((with-sql-statement db sql statement-var
+                         ((name1 val1) (name2 val2) ...)
+                         exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement-var
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))
+    ((with-sql-statement db sql statement-var () exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))))
+
+(define-syntax with-sql-database
+  "Automatically closes the database once the scope of this macro is left."
+  (syntax-rules ()
+    ((with-sql-database location db-var exps ...)
+     (let ((db-var (sqlite-open location)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-close db-var)))))))
+
+(define-syntax run-sql
+  (syntax-rules ()
+    "For one-off queries that don't get repeated on the same
+database. Everything after database and sql source should be 2-element lists
+containing the sql placeholder name and the value to use. Returns the number
+of rows."
+    ((run-sql db sql (name1 val1) (name2 val2) ...)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))
+    ((run-sql db sql)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))))
+
+(define-syntax run-statement
+  (syntax-rules ()
+    "For compiled statements that may be run multiple times. Everything after
+database and sql source should be 2-element lists containing the sql
+placeholder name and the value to use. Returns the number of rows."
+    ((run-sql db statement (name1 val1) (name2 val2) ...)
+     (dynamic-wind noop
+                   (lambda ()
+                     (sql-parameters statement
+                                     (name1 val1)
+                                     (name2 val2) ...)
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))
+    ((run-sql db statement)
+     (dynamic-wind noop
+                   (lambda ()
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))))
+
+(define path-id-sql
+  "SELECT id FROM ValidPaths WHERE path = $path")
+
+(define (single-result statement)
+  "Gives the first element of the first row returned by statement."
+  (let ((row (sqlite-step statement)))
+    (if row
+        (vector-ref row 0)
+        #f)))
+
+(define* (path-id db path)
+  "If the path \"path\" exists in the ValidPaths table, return its
+id. Otherwise, return #f. If you already have a compiled statement for this
+purpose, you can give it as statement."
+  (with-sql-statement db path-id-sql statement
+                      (;("$path" path)
+                       (1 path))
+                      (single-result statement)))
+
+
+(define update-sql
+  "UPDATE ValidPaths SET hash = $hash, registrationTime = $time, deriver =
+$deriver, narSize = $size WHERE id = $id")
+
+(define insert-sql
+  "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
+VALUES ($path, $hash, $time, $deriver, $size)")
+
+(define (update-or-insert #:key db path deriver hash nar-size time)
+  "The classic update-if-exists and insert-if-doesn't feature that sqlite
+doesn't exactly have... they've got something close, but it involves deleting
+and re-inserting instead of updating, which causes problems with foreign keys,
+of course. Returns the row id of the row that was modified or inserted."
+  (let ((id (path-id db path)))
+    (if id
+        (begin
+          (run-sql db update-sql
+                   ;; As you may have noticed, sqlite-bind doesn't behave
+                   ;; exactly how I was expecting...
+                   ;; ("$id" id)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (5 id)
+                   (3 deriver)
+                   (1 hash)
+                   (4 nar-size)
+                   (2 time))
+          id)
+        (begin
+          (run-sql db insert-sql
+                   ;; ("$path" path)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (1 path)
+                   (4 deriver)
+                   (2 hash)
+                   (5 nar-size)
+                   (3 time))
+          (sqlite-last-insert-rowid db)))))
+
+(define add-reference-sql
+  "INSERT OR IGNORE INTO Refs (referrer, reference) SELECT $referrer, id
+FROM ValidPaths WHERE path = $reference")
+
+(define (add-references db referrer references)
+  "referrer is the id of the referring store item, references is a list
+containing store item paths being referred to. Note that all of the store
+items in \"references\" should already be registered."
+  (with-sql-statement db add-reference-sql add-reference-statement ()
+                      (for-each (lambda (reference)
+                                  (run-statement db
+                                                 add-reference-statement
+                                                 ;("$referrer" referrer)
+                                                 ;("$reference" reference)
+                                                 (1 referrer)
+                                                 (2 reference)))
+                                references)))
+
+;; XXX figure out caching of statement and database objects... later
+(define* (sqlite-register #:key dbpath path references deriver hash nar-size)
+  "Registers this stuff in a database specified by DBPATH. PATH is the string
+path of some store item, REFERENCES is a list of string paths which the store
+item PATH refers to (they need to be already registered!), DERIVER is a string
+path of the derivation that created the store item PATH, HASH is the
+base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
+\"sha256:\") after being converted to nar form, and nar-size is the size in
+bytes of the store item denoted by PATH after being converted to nar form."
+  (with-sql-database dbpath db
+   (let ((id (update-or-insert db path deriver hash nar-size (current-time))))
+     (add-references db id references))))
diff --git a/guix/store.scm b/guix/store.scm
index c94dfea95..f41856fe4 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -27,6 +27,7 @@
   #:use-module (guix hash)
   #:autoload   (guix build syscalls) (terminal-columns)
   #:use-module (rnrs bytevectors)
+  #:use-module (rnrs io ports)
   #:use-module (ice-9 binary-ports)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-9)
@@ -41,6 +42,8 @@
   #:use-module (ice-9 vlist)
   #:use-module (ice-9 popen)
   #:use-module (web uri)
+  #:use-module (sqlite3)
+  #:use-module (guix sql)
   #:export (%daemon-socket-uri
             %gc-roots-directory
             %default-substitute-urls
@@ -1206,32 +1209,70 @@ The result is always the empty list unless the daemon was started with
 This makes sense only when the daemon was started with '--cache-failures'."
   boolean)
 
+
+;; Would it be better to just make WRITE-FILE give size as well? I question
+;; the general utility of this approach.
+(define (counting-wrapper-port output-port)
+  "Some custom ports don't implement GET-POSITION at all. But if we want to
+figure out how many bytes are being written, we will want to use that. So this
+makes a wrapper around a port which implements GET-POSITION."
+  (let ((byte-count 0))
+    (make-custom-binary-output-port "counting-wrapper"
+                                    (lambda (bytes offset count)
+                                      (set! byte-count
+                                        (+ byte-count count))
+                                      (put-bytevector output-port bytes
+                                                      offset count)
+                                      count)
+                                    (lambda ()
+                                      byte-count)
+                                    #f
+                                    (lambda ()
+                                      (close-port output-port)))))
+
+
+(define (nar-sha256 file)
+  "Gives the sha256 hash of a file and the size of the file in nar form."
+  (let-values (((port get-hash) (open-sha256-port)))
+    (let ((wrapper (counting-wrapper-port port)))
+      (write-file file wrapper)
+      (force-output wrapper)
+      (force-output port)
+      (let ((hash (get-hash))
+            (size (port-position wrapper)))
+        (close-port wrapper)
+        (values hash
+                size)))))
+
+;; TODO: make this canonicalize store items that are registered. This involves
+;; setting permissions and timestamps, I think. Also, run a "deduplication
+;; pass", whatever that involves. Also, honor environment variables. Also,
+;; handle databases not existing yet (what should the default behavior be?
+;; Figuring out how the C++ stuff currently does it sounds like a lot of
+;; grepping for global variables...)
+
 (define* (register-path path
-                        #:key (references '()) deriver prefix
-                        state-directory)
+                        #:key (references '()) deriver (prefix "")
+                        (state-directory
+                         (string-append prefix %state-directory)))
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
-not #f, it must be the name of the directory containing the new store to
-initialize; if STATE-DIRECTORY is not #f, it must be a string containing the
+given, it must be the name of the directory containing the new store to
+initialize; if STATE-DIRECTORY is given, it must be a string containing the
 absolute file name to the state directory of the store being initialized.
 Return #t on success.
 
 Use with care as it directly modifies the store!  This is primarily meant to
 be used internally by the daemon's build hook."
-  ;; Currently this is implemented by calling out to the fine C++ blob.
-  (let ((pipe (apply open-pipe* OPEN_WRITE %guix-register-program
-                     `(,@(if prefix
-                             `("--prefix" ,prefix)
-                             '())
-                       ,@(if state-directory
-                             `("--state-directory" ,state-directory)
-                             '())))))
-    (and pipe
-         (begin
-           (format pipe "~a~%~a~%~a~%"
-                   path (or deriver "") (length references))
-           (for-each (cut format pipe "~a~%" <>) references)
-           (zero? (close-pipe pipe))))))
+  (let* ((to-register (string-append %store-directory "/" (basename path))))
+    (let-values (((hash nar-size)
+                  (nar-sha256 (string-append prefix "/" to-register))))
+      (sqlite-register (string-append state-directory "/db/db.sqlite")
+                       to-register
+                       references
+                       deriver
+                       (string-append "sha256:" (bytevector->base16-string hash))
+                       nar-size))))
 
 \f
 ;;;
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-fixup-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 970 bytes --]

From 76735427c25118ce4324d286ec6a524ced8bd99d Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:38:53 -0500
Subject: [PATCH 2/7] fixup! Implement prototype register-path in scheme

---
 guix/sql.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/sql.scm b/guix/sql.scm
index ae4a1d27d..9e3815401 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -10,13 +10,13 @@
 
 ;; This really belongs in guile-sqlite3, as can be seen from the @@s.
 (define sqlite-last-insert-rowid
-  "Gives the row id of the last inserted row in DB."
   (let ((last-rowid (pointer->procedure
                      int
                      (dynamic-func "sqlite3_last_insert_rowid"
                                    (@@ (sqlite3) libsqlite3))
                      (list '*))))
     (lambda (db)
+      "Gives the row id of the last inserted row in DB."
       (last-rowid ((@@ (sqlite3) db-pointer) db)))))
 
 
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-fixup-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 1971 bytes --]

From 4e4636cba2d62066de64dbaa88bf23169a0ecfb5 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:41:40 -0500
Subject: [PATCH 3/7] fixup! Implement prototype register-path in scheme

---
 guix/sql.scm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/guix/sql.scm b/guix/sql.scm
index 9e3815401..6d756c0a6 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -22,8 +22,8 @@
 
 ;; Should I go from key->index here or try to change that in guile-sqlite3?
 (define-syntax sql-parameters
-  "Converts key-value pairs into sqlite bindings for a specific statement."
   (syntax-rules ()
+    "Converts key-value pairs into sqlite bindings for a specific statement."
     ((sql-parameters statement (name1 val1) (name2 val2) (name3 val3) ...)
      (begin (sqlite-bind statement name1 val1)
             (sql-parameters statement (name2 val2) (name3 val3) ...)))
@@ -43,10 +43,10 @@
 ;; I get the feeling schemers have probably already got this "with" business
 ;; much more automated than this...
 (define-syntax with-sql-statement
-  "Automatically prepares statements and then finalizes statements once the
+  (syntax-rules ()
+    "Automatically prepares statements and then finalizes statements once the
 scope of this macro is left. Also with built-in sqlite parameter binding via
 key-value pairs."
-  (syntax-rules ()
     ((with-sql-statement db sql statement-var
                          ((name1 val1) (name2 val2) ...)
                          exps ...)
@@ -68,8 +68,8 @@ key-value pairs."
                        (sqlite-finalize statement-var)))))))
 
 (define-syntax with-sql-database
-  "Automatically closes the database once the scope of this macro is left."
   (syntax-rules ()
+    "Automatically closes the database once the scope of this macro is left."
     ((with-sql-database location db-var exps ...)
      (let ((db-var (sqlite-open location)))
        (dynamic-wind noop
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0004-fixup-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 2356 bytes --]

From 2973c3695b9c6f522685a03a059815485b23f580 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:49:28 -0500
Subject: [PATCH 4/7] fixup! Implement prototype register-path in scheme

---
 guix/sql.scm   |  7 ++++++-
 guix/store.scm | 13 +++++++------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/guix/sql.scm b/guix/sql.scm
index 6d756c0a6..16a379f97 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -215,5 +215,10 @@ base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
 \"sha256:\") after being converted to nar form, and nar-size is the size in
 bytes of the store item denoted by PATH after being converted to nar form."
   (with-sql-database dbpath db
-   (let ((id (update-or-insert db path deriver hash nar-size (current-time))))
+                     (let ((id (update-or-insert #:db db
+                                                 #:path path
+                                                 #:deriver deriver
+                                                 #:hash hash
+                                                 #:nar-size nar-size
+                                                 #:time (current-time))))
      (add-references db id references))))
diff --git a/guix/store.scm b/guix/store.scm
index f41856fe4..a62fcf3f1 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1267,12 +1267,13 @@ be used internally by the daemon's build hook."
   (let* ((to-register (string-append %store-directory "/" (basename path))))
     (let-values (((hash nar-size)
                   (nar-sha256 (string-append prefix "/" to-register))))
-      (sqlite-register (string-append state-directory "/db/db.sqlite")
-                       to-register
-                       references
-                       deriver
-                       (string-append "sha256:" (bytevector->base16-string hash))
-                       nar-size))))
+      (sqlite-register #:dbpath (string-append state-directory "/db/db.sqlite")
+                       #:path to-register
+                       #:references references
+                       #:deriver deriver
+                       #:hash (string-append "sha256:"
+                                             (bytevector->base16-string hash))
+                       #:nar-size nar-size))))
 
 \f
 ;;;
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: 0005-fixup-Implement-prototype-register-path-in-scheme.patch --]
[-- Type: text/x-patch, Size: 989 bytes --]

From dd8f345a3c0c6eff2ab9d41e8eae1b7e9c339ade Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 03:04:12 -0500
Subject: [PATCH 5/7] fixup! Implement prototype register-path in scheme

---
 guix/sql.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/sql.scm b/guix/sql.scm
index 16a379f97..b1e0c0aa4 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -151,7 +151,7 @@ $deriver, narSize = $size WHERE id = $id")
   "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
 VALUES ($path, $hash, $time, $deriver, $size)")
 
-(define (update-or-insert #:key db path deriver hash nar-size time)
+(define* (update-or-insert #:key db path deriver hash nar-size time)
   "The classic update-if-exists and insert-if-doesn't feature that sqlite
 doesn't exactly have... they've got something close, but it involves deleting
 and re-inserting instead of updating, which causes problems with foreign keys,
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #7: 0006-Honor-environment-variables-in-guix-register.patch --]
[-- Type: text/x-patch, Size: 4217 bytes --]

From ef71a89cbd1c0d1c7663d40c4770204179100da4 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 01:34:28 -0500
Subject: [PATCH 6/7] Honor environment variables in guix-register

Added environment variable handling to guix-register. Additionally,
interpretation of parameters state-directory and prefix now more closely
follows the way the old implementation did it.
---
 guix/store.scm | 56 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 12 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index a62fcf3f1..e78cfbe41 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1251,10 +1251,16 @@ makes a wrapper around a port which implements GET-POSITION."
 ;; Figuring out how the C++ stuff currently does it sounds like a lot of
 ;; grepping for global variables...)
 
+;; As far as I can tell there are a couple of anticipated use cases for some
+;; of these parameters: prefix is set if you want to "simulate a chroot", as
+;; far as the database is concerned. For example, you could register paths in
+;; /mnt/gnu/store using #:prefix "/mnt"
+
 (define* (register-path path
-                        #:key (references '()) deriver (prefix "")
-                        (state-directory
-                         (string-append prefix %state-directory)))
+                        #:key (references '()) deriver prefix
+                        state-directory)
+  ;; Priority for options: first what is given, then environment variables,
+  ;; then defaults.
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
 given, it must be the name of the directory containing the new store to
@@ -1264,16 +1270,42 @@ Return #t on success.
 
 Use with care as it directly modifies the store!  This is primarily meant to
 be used internally by the daemon's build hook."
-  (let* ((to-register (string-append %store-directory "/" (basename path))))
+  (let* ((db-dir (cond
+                  (state-directory
+                   (string-append state-directory "/db"))
+                  (prefix
+                   (string-append prefix %state-directory "/db"))
+                  ((getenv "NIX_DB_DIR")
+                   (getenv "NIX_DB_DIR"))
+                  ((getenv "NIX_STATE_DIR")
+                   (string-append (getenv "NIX_STATE_DIR") "/db"))
+                  (else
+                   (string-append %state-directory "/db"))))
+         (store-dir (if prefix
+                        (string-append prefix %store-directory)
+                        (or
+                         (getenv "NIX_STORE_DIR")
+                         (getenv "NIX_STORE")
+                         %store-directory)))
+         (to-register (if prefix
+                          ;; note: we assume here that if path is, for example,
+                          ;; /foo/bar/gnu/store/thing.txt, then an environment
+                          ;; variable has been used to change the store
+                          ;; directory to /foo/bar/gnu/store.
+                          (string-append %store-directory "/" (basename path))
+                          path))
+         (real-path (string-append store-dir "/"
+                                   (basename path))))
     (let-values (((hash nar-size)
-                  (nar-sha256 (string-append prefix "/" to-register))))
-      (sqlite-register #:dbpath (string-append state-directory "/db/db.sqlite")
-                       #:path to-register
-                       #:references references
-                       #:deriver deriver
-                       #:hash (string-append "sha256:"
-                                             (bytevector->base16-string hash))
-                       #:nar-size nar-size))))
+                  (nar-sha256 real-path)))
+      (sqlite-register
+       #:dbpath (string-append db-dir "/db.sqlite")
+       #:path to-register
+       #:references references
+       #:deriver deriver
+       #:hash (string-append "sha256:"
+                             (bytevector->base16-string hash))
+       #:nar-size nar-size))))
 
 \f
 ;;;
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #8: 0007-fixup-Honor-environment-variables-in-guix-register.patch --]
[-- Type: text/x-patch, Size: 1634 bytes --]

From 15099fe2489cab4aa814844bb83f5400218c129b Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 03:09:41 -0500
Subject: [PATCH 7/7] fixup! Honor environment variables in guix-register

---
 guix/store.scm | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index e78cfbe41..1334822ff 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1246,15 +1246,11 @@ makes a wrapper around a port which implements GET-POSITION."
 
 ;; TODO: make this canonicalize store items that are registered. This involves
 ;; setting permissions and timestamps, I think. Also, run a "deduplication
-;; pass", whatever that involves. Also, honor environment variables. Also,
-;; handle databases not existing yet (what should the default behavior be?
-;; Figuring out how the C++ stuff currently does it sounds like a lot of
-;; grepping for global variables...)
-
-;; As far as I can tell there are a couple of anticipated use cases for some
-;; of these parameters: prefix is set if you want to "simulate a chroot", as
-;; far as the database is concerned. For example, you could register paths in
-;; /mnt/gnu/store using #:prefix "/mnt"
+;; pass", whatever that involves. Also, handle databases not existing yet
+;; (what should the default behavior be?  Figuring out how the C++ stuff
+;; currently does it sounds like a lot of grepping for global
+;; variables...). Also, return #t on success like the documentation says we
+;; should.
 
 (define* (register-path path
                         #:key (references '()) deriver prefix
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-05  8:38 ` Caleb Ristvedt
@ 2017-06-05 20:34   ` Ludovic Courtès
  2017-06-06  8:59     ` Caleb Ristvedt
  0 siblings, 1 reply; 12+ messages in thread
From: Ludovic Courtès @ 2017-06-05 20:34 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]

Hi reepca!

I gave this patch set a try and looked at the code, and it looks very
good to me!

“make check TESTS=tests/store.scm” passes now with the fixes you posted
today (‘getenv’ & co.)

Like you say, what’s missing from ‘register-path’ now is a timestamp
reset phase (‘reset-timestamps’ in (gnu build install) should cover
that), and a deduplication phase (which you’ll have to implement).  That
would be a good next step IMO.

For deduplication, you already know the (guix hash) API that will allow
you to do that I guess.  The “deduplication database” is simply the
/gnu/store/.links directory.  Each entry there is a hard link to a file;
the name of the entry is the hex representation of the sha256 hash of
that file.  So when inserting a file in the store, all you need is to
look up its hash in /gnu/store/.links and hard-link from there.  See
what I mean?

Some minor suggestions about the code:

  • Please add a copyright line for yourself in files you modify, and
    add the usual GPLv3+ header in new files.  :-)

  • When you add new ‘with-something’ macros, you can tell Emacs (if
    that’s what you use) to ident them like the other ‘with-’ forms by
    modifying .dir-locals.el; there are several examples of that there.

  • I think ‘register-path’ doesn’t have to explicitly do a bunch of
    ‘getenv’ calls.  Instead it should use the variables defined in
    (guix config).  For instance ‘%store-directory’ and
    ‘%state-directory’ are already defined using ‘getenv’.  There’s
    currently nothing for NIX_DB_DIR but you could define
    ‘%store-database-directory’ similarly.

  • It would probably make sense to use the (guix store …) name space
    for modules that implement the store.  So we could have (guix store
    database) for the subset of (guix sql) that deals with
    /var/guix/db/db.sqlite, (guix store deduplication), and so on.

  • You get bonus points if you can squash “fixup” commits with the
    commit you’re conceptually amending.  :-)  Having a single patch
    that implements something makes it easier for others to review.  If
    you’re not familiar with ‘git rebase -i’ & co., that’s OK though,
    don’t bother.

  • Extra bonus points if you follow our commit log conventions as
    discussed at
    <https://www.gnu.org/software/guix/manual/html_node/Submitting-Patches.html>.
    :-)

What about pushing your changes to a WIP branch on Savannah or
elsewhere?  (If you have an account on Savannah we can give you access.)

Thank you, and thumbs up for the quality work so far!

Ludo’.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-05 20:34   ` Ludovic Courtès
@ 2017-06-06  8:59     ` Caleb Ristvedt
  2017-06-08 16:59       ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-06  8:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

ludo@gnu.org (Ludovic Courtès) writes:

> Hi reepca!
>
> I gave this patch set a try and looked at the code, and it looks very
> good to me!
>
> “make check TESTS=tests/store.scm” passes now with the fixes you posted
> today (‘getenv’ & co.)

I end up with 2 failed tests, but neither of them are the register-path
one. 

> For deduplication, you already know the (guix hash) API that will allow
> you to do that I guess.  The “deduplication database” is simply the
> /gnu/store/.links directory.  Each entry there is a hard link to a file;
> the name of the entry is the hex representation of the sha256 hash of
> that file.  So when inserting a file in the store, all you need is to
> look up its hash in /gnu/store/.links and hard-link from there.  See
> what I mean?

And if it isn't in /gnu/store/.links, put a hardlink there? Also, when
you say "when inserting a file", what do you mean by that? I was under
the impression that the paths given to register-path were already in the
store but simply weren't in the database or otherwise "officially" in
the store. Is my assumption incorrect?

Less on-topic, I don't really know how sha256 works, but if it is a
hashing function, there must be a possibility (albeit small) of
collisions, right? Is it worth developing a strategy for handling them?

> Some minor suggestions about the code:

Implemented all of these except maybe half of the commit log one... I'm
not sure what the commit message headline should be, so I sort of
guessed based on some of the ones in recent history, and I'm not sure
I've got the formatting quite right yet, but it should be a lot closer.


> What about pushing your changes to a WIP branch on Savannah or
> elsewhere?  (If you have an account on Savannah we can give you access.)

I just made an account today as "reepca". Should I submit a "request for
inclusion"? In the meantime, here's a much less fixup-riddled patch set
that I think is just about feature-complete as far as register-path is
concerned:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-guix-register-path-Implement-prototype-in-scheme.patch --]
[-- Type: text/x-patch, Size: 16044 bytes --]

From ce4a322446d1865791686b1e4573973573bdcdfc Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Sat, 3 Jun 2017 02:26:05 -0500
Subject: [PATCH 1/7] guix: register-path: Implement prototype in scheme.

* guix/store.scm (register-path): reimplement in scheme.
* guix/sql.scm: New file.
---
 gnu/packages/package-management.scm |   3 +-
 guix/sql.scm                        | 224 ++++++++++++++++++++++++++++++++++++
 guix/store.scm                      |  78 ++++++++++---
 3 files changed, 286 insertions(+), 19 deletions(-)
 create mode 100644 guix/sql.scm

diff --git a/gnu/packages/package-management.scm b/gnu/packages/package-management.scm
index af91ec1d7..50be3a23f 100644
--- a/gnu/packages/package-management.scm
+++ b/gnu/packages/package-management.scm
@@ -250,7 +250,8 @@
       (propagated-inputs
        `(("gnutls" ,gnutls/guile-2.2)             ;for 'guix download' & co.
          ("guile-json" ,guile-json)
-         ("guile-ssh" ,guile-ssh)))
+         ("guile-ssh" ,guile-ssh)
+         ("guile-sqlite3" ,guile-sqlite3)))
 
       (home-page "https://www.gnu.org/software/guix/")
       (synopsis "Functional package manager for installed software packages and versions")
diff --git a/guix/sql.scm b/guix/sql.scm
new file mode 100644
index 000000000..b1e0c0aa4
--- /dev/null
+++ b/guix/sql.scm
@@ -0,0 +1,224 @@
+(define-module (guix sql)
+  #:use-module (sqlite3)
+  #:use-module (system foreign)
+  #:use-module (rnrs bytevectors)
+  #:use-module (srfi srfi-9)
+  #:export (sqlite-register))
+
+;; Miscellaneous SQL stuff, currently just setup for sqlite-register. Mostly
+;; macros.
+
+;; This really belongs in guile-sqlite3, as can be seen from the @@s.
+(define sqlite-last-insert-rowid
+  (let ((last-rowid (pointer->procedure
+                     int
+                     (dynamic-func "sqlite3_last_insert_rowid"
+                                   (@@ (sqlite3) libsqlite3))
+                     (list '*))))
+    (lambda (db)
+      "Gives the row id of the last inserted row in DB."
+      (last-rowid ((@@ (sqlite3) db-pointer) db)))))
+
+
+;; Should I go from key->index here or try to change that in guile-sqlite3?
+(define-syntax sql-parameters
+  (syntax-rules ()
+    "Converts key-value pairs into sqlite bindings for a specific statement."
+    ((sql-parameters statement (name1 val1) (name2 val2) (name3 val3) ...)
+     (begin (sqlite-bind statement name1 val1)
+            (sql-parameters statement (name2 val2) (name3 val3) ...)))
+    ((sql-parameters statement (name value))
+     (sqlite-bind statement name value))))
+
+(define* (step-all statement #:optional (callback noop))
+  "Step until statement is completed. Return number of rows."
+  ;; Where "number of rows" is assumed to be number of steps taken, excluding
+  ;; the last one.
+  (let maybe-step ((ret (sqlite-step statement))
+                   (count 0))
+    (if ret
+        (maybe-step ret (+ count 1))
+        count)))
+
+;; I get the feeling schemers have probably already got this "with" business
+;; much more automated than this...
+(define-syntax with-sql-statement
+  (syntax-rules ()
+    "Automatically prepares statements and then finalizes statements once the
+scope of this macro is left. Also with built-in sqlite parameter binding via
+key-value pairs."
+    ((with-sql-statement db sql statement-var
+                         ((name1 val1) (name2 val2) ...)
+                         exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement-var
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))
+    ((with-sql-statement db sql statement-var () exps ...)
+     (let ((statement-var (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-finalize statement-var)))))))
+
+(define-syntax with-sql-database
+  (syntax-rules ()
+    "Automatically closes the database once the scope of this macro is left."
+    ((with-sql-database location db-var exps ...)
+     (let ((db-var (sqlite-open location)))
+       (dynamic-wind noop
+                     (lambda ()
+                       exps ...)
+                     (lambda ()
+                       (sqlite-close db-var)))))))
+
+(define-syntax run-sql
+  (syntax-rules ()
+    "For one-off queries that don't get repeated on the same
+database. Everything after database and sql source should be 2-element lists
+containing the sql placeholder name and the value to use. Returns the number
+of rows."
+    ((run-sql db sql (name1 val1) (name2 val2) ...)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (sql-parameters statement
+                                            (name1 val1)
+                                            (name2 val2) ...)
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))
+    ((run-sql db sql)
+     (let ((statement (sqlite-prepare db sql)))
+       (dynamic-wind noop
+                     (lambda ()
+                       (step-all statement))
+                     (lambda ()
+                       (sqlite-finalize statement)))))))
+
+(define-syntax run-statement
+  (syntax-rules ()
+    "For compiled statements that may be run multiple times. Everything after
+database and sql source should be 2-element lists containing the sql
+placeholder name and the value to use. Returns the number of rows."
+    ((run-sql db statement (name1 val1) (name2 val2) ...)
+     (dynamic-wind noop
+                   (lambda ()
+                     (sql-parameters statement
+                                     (name1 val1)
+                                     (name2 val2) ...)
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))
+    ((run-sql db statement)
+     (dynamic-wind noop
+                   (lambda ()
+                     (step-all statement))
+                   (lambda ()
+                     (sqlite-reset statement))))))
+
+(define path-id-sql
+  "SELECT id FROM ValidPaths WHERE path = $path")
+
+(define (single-result statement)
+  "Gives the first element of the first row returned by statement."
+  (let ((row (sqlite-step statement)))
+    (if row
+        (vector-ref row 0)
+        #f)))
+
+(define* (path-id db path)
+  "If the path \"path\" exists in the ValidPaths table, return its
+id. Otherwise, return #f. If you already have a compiled statement for this
+purpose, you can give it as statement."
+  (with-sql-statement db path-id-sql statement
+                      (;("$path" path)
+                       (1 path))
+                      (single-result statement)))
+
+
+(define update-sql
+  "UPDATE ValidPaths SET hash = $hash, registrationTime = $time, deriver =
+$deriver, narSize = $size WHERE id = $id")
+
+(define insert-sql
+  "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
+VALUES ($path, $hash, $time, $deriver, $size)")
+
+(define* (update-or-insert #:key db path deriver hash nar-size time)
+  "The classic update-if-exists and insert-if-doesn't feature that sqlite
+doesn't exactly have... they've got something close, but it involves deleting
+and re-inserting instead of updating, which causes problems with foreign keys,
+of course. Returns the row id of the row that was modified or inserted."
+  (let ((id (path-id db path)))
+    (if id
+        (begin
+          (run-sql db update-sql
+                   ;; As you may have noticed, sqlite-bind doesn't behave
+                   ;; exactly how I was expecting...
+                   ;; ("$id" id)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (5 id)
+                   (3 deriver)
+                   (1 hash)
+                   (4 nar-size)
+                   (2 time))
+          id)
+        (begin
+          (run-sql db insert-sql
+                   ;; ("$path" path)
+                   ;; ("$deriver" deriver)
+                   ;; ("$hash" hash)
+                   ;; ("$size" nar-size)
+                   ;; ("$time" time)
+                   (1 path)
+                   (4 deriver)
+                   (2 hash)
+                   (5 nar-size)
+                   (3 time))
+          (sqlite-last-insert-rowid db)))))
+
+(define add-reference-sql
+  "INSERT OR IGNORE INTO Refs (referrer, reference) SELECT $referrer, id
+FROM ValidPaths WHERE path = $reference")
+
+(define (add-references db referrer references)
+  "referrer is the id of the referring store item, references is a list
+containing store item paths being referred to. Note that all of the store
+items in \"references\" should already be registered."
+  (with-sql-statement db add-reference-sql add-reference-statement ()
+                      (for-each (lambda (reference)
+                                  (run-statement db
+                                                 add-reference-statement
+                                                 ;("$referrer" referrer)
+                                                 ;("$reference" reference)
+                                                 (1 referrer)
+                                                 (2 reference)))
+                                references)))
+
+;; XXX figure out caching of statement and database objects... later
+(define* (sqlite-register #:key dbpath path references deriver hash nar-size)
+  "Registers this stuff in a database specified by DBPATH. PATH is the string
+path of some store item, REFERENCES is a list of string paths which the store
+item PATH refers to (they need to be already registered!), DERIVER is a string
+path of the derivation that created the store item PATH, HASH is the
+base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
+\"sha256:\") after being converted to nar form, and nar-size is the size in
+bytes of the store item denoted by PATH after being converted to nar form."
+  (with-sql-database dbpath db
+                     (let ((id (update-or-insert #:db db
+                                                 #:path path
+                                                 #:deriver deriver
+                                                 #:hash hash
+                                                 #:nar-size nar-size
+                                                 #:time (current-time))))
+     (add-references db id references))))
diff --git a/guix/store.scm b/guix/store.scm
index c94dfea95..a62fcf3f1 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -27,6 +27,7 @@
   #:use-module (guix hash)
   #:autoload   (guix build syscalls) (terminal-columns)
   #:use-module (rnrs bytevectors)
+  #:use-module (rnrs io ports)
   #:use-module (ice-9 binary-ports)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-9)
@@ -41,6 +42,8 @@
   #:use-module (ice-9 vlist)
   #:use-module (ice-9 popen)
   #:use-module (web uri)
+  #:use-module (sqlite3)
+  #:use-module (guix sql)
   #:export (%daemon-socket-uri
             %gc-roots-directory
             %default-substitute-urls
@@ -1206,32 +1209,71 @@ The result is always the empty list unless the daemon was started with
 This makes sense only when the daemon was started with '--cache-failures'."
   boolean)
 
+
+;; Would it be better to just make WRITE-FILE give size as well? I question
+;; the general utility of this approach.
+(define (counting-wrapper-port output-port)
+  "Some custom ports don't implement GET-POSITION at all. But if we want to
+figure out how many bytes are being written, we will want to use that. So this
+makes a wrapper around a port which implements GET-POSITION."
+  (let ((byte-count 0))
+    (make-custom-binary-output-port "counting-wrapper"
+                                    (lambda (bytes offset count)
+                                      (set! byte-count
+                                        (+ byte-count count))
+                                      (put-bytevector output-port bytes
+                                                      offset count)
+                                      count)
+                                    (lambda ()
+                                      byte-count)
+                                    #f
+                                    (lambda ()
+                                      (close-port output-port)))))
+
+
+(define (nar-sha256 file)
+  "Gives the sha256 hash of a file and the size of the file in nar form."
+  (let-values (((port get-hash) (open-sha256-port)))
+    (let ((wrapper (counting-wrapper-port port)))
+      (write-file file wrapper)
+      (force-output wrapper)
+      (force-output port)
+      (let ((hash (get-hash))
+            (size (port-position wrapper)))
+        (close-port wrapper)
+        (values hash
+                size)))))
+
+;; TODO: make this canonicalize store items that are registered. This involves
+;; setting permissions and timestamps, I think. Also, run a "deduplication
+;; pass", whatever that involves. Also, honor environment variables. Also,
+;; handle databases not existing yet (what should the default behavior be?
+;; Figuring out how the C++ stuff currently does it sounds like a lot of
+;; grepping for global variables...)
+
 (define* (register-path path
-                        #:key (references '()) deriver prefix
-                        state-directory)
+                        #:key (references '()) deriver (prefix "")
+                        (state-directory
+                         (string-append prefix %state-directory)))
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
-not #f, it must be the name of the directory containing the new store to
-initialize; if STATE-DIRECTORY is not #f, it must be a string containing the
+given, it must be the name of the directory containing the new store to
+initialize; if STATE-DIRECTORY is given, it must be a string containing the
 absolute file name to the state directory of the store being initialized.
 Return #t on success.
 
 Use with care as it directly modifies the store!  This is primarily meant to
 be used internally by the daemon's build hook."
-  ;; Currently this is implemented by calling out to the fine C++ blob.
-  (let ((pipe (apply open-pipe* OPEN_WRITE %guix-register-program
-                     `(,@(if prefix
-                             `("--prefix" ,prefix)
-                             '())
-                       ,@(if state-directory
-                             `("--state-directory" ,state-directory)
-                             '())))))
-    (and pipe
-         (begin
-           (format pipe "~a~%~a~%~a~%"
-                   path (or deriver "") (length references))
-           (for-each (cut format pipe "~a~%" <>) references)
-           (zero? (close-pipe pipe))))))
+  (let* ((to-register (string-append %store-directory "/" (basename path))))
+    (let-values (((hash nar-size)
+                  (nar-sha256 (string-append prefix "/" to-register))))
+      (sqlite-register #:dbpath (string-append state-directory "/db/db.sqlite")
+                       #:path to-register
+                       #:references references
+                       #:deriver deriver
+                       #:hash (string-append "sha256:"
+                                             (bytevector->base16-string hash))
+                       #:nar-size nar-size))))
 
 \f
 ;;;
-- 
2.13.0


[-- Attachment #3: 0002-guix-register-path-Honor-environment-variables.patch --]
[-- Type: text/x-patch, Size: 5973 bytes --]

From ef072b28b764c192a31e4a4c7cd1b384e0943e49 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 01:34:28 -0500
Subject: [PATCH 2/7] guix: register-path: Honor environment variables.

* guix/store.scm (register-path): Honor environment variables involving the
store, state directory, or database path. Update copyright info.
* guix/sql.scm: Add copyright notice.
---
 guix/sql.scm   | 18 +++++++++++++++++
 guix/store.scm | 61 +++++++++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/guix/sql.scm b/guix/sql.scm
index b1e0c0aa4..b6153e332 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -1,3 +1,21 @@
+;;; Copyright © 2017 Caleb Ristvedt <caleb.ristvedt@cune.org>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+
 (define-module (guix sql)
   #:use-module (sqlite3)
   #:use-module (system foreign)
diff --git a/guix/store.scm b/guix/store.scm
index a62fcf3f1..2f16ec2b1 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1,5 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2017 Caleb Ristvedt <caleb.ristvedt@cune.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -1246,15 +1247,17 @@ makes a wrapper around a port which implements GET-POSITION."
 
 ;; TODO: make this canonicalize store items that are registered. This involves
 ;; setting permissions and timestamps, I think. Also, run a "deduplication
-;; pass", whatever that involves. Also, honor environment variables. Also,
-;; handle databases not existing yet (what should the default behavior be?
-;; Figuring out how the C++ stuff currently does it sounds like a lot of
-;; grepping for global variables...)
+;; pass", whatever that involves. Also, handle databases not existing yet
+;; (what should the default behavior be?  Figuring out how the C++ stuff
+;; currently does it sounds like a lot of grepping for global
+;; variables...). Also, return #t on success like the documentation says we
+;; should.
 
 (define* (register-path path
-                        #:key (references '()) deriver (prefix "")
-                        (state-directory
-                         (string-append prefix %state-directory)))
+                        #:key (references '()) deriver prefix
+                        state-directory)
+  ;; Priority for options: first what is given, then environment variables,
+  ;; then defaults.
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
 given, it must be the name of the directory containing the new store to
@@ -1264,16 +1267,42 @@ Return #t on success.
 
 Use with care as it directly modifies the store!  This is primarily meant to
 be used internally by the daemon's build hook."
-  (let* ((to-register (string-append %store-directory "/" (basename path))))
+  (let* ((db-dir (cond
+                  (state-directory
+                   (string-append state-directory "/db"))
+                  (prefix
+                   (string-append prefix %state-directory "/db"))
+                  ((getenv "NIX_DB_DIR")
+                   (getenv "NIX_DB_DIR"))
+                  ((getenv "NIX_STATE_DIR")
+                   (string-append (getenv "NIX_STATE_DIR") "/db"))
+                  (else
+                   (string-append %state-directory "/db"))))
+         (store-dir (if prefix
+                        (string-append prefix %store-directory)
+                        (or
+                         (getenv "NIX_STORE_DIR")
+                         (getenv "NIX_STORE")
+                         %store-directory)))
+         (to-register (if prefix
+                          ;; note: we assume here that if path is, for example,
+                          ;; /foo/bar/gnu/store/thing.txt, then an environment
+                          ;; variable has been used to change the store
+                          ;; directory to /foo/bar/gnu/store.
+                          (string-append %store-directory "/" (basename path))
+                          path))
+         (real-path (string-append store-dir "/"
+                                   (basename path))))
     (let-values (((hash nar-size)
-                  (nar-sha256 (string-append prefix "/" to-register))))
-      (sqlite-register #:dbpath (string-append state-directory "/db/db.sqlite")
-                       #:path to-register
-                       #:references references
-                       #:deriver deriver
-                       #:hash (string-append "sha256:"
-                                             (bytevector->base16-string hash))
-                       #:nar-size nar-size))))
+                  (nar-sha256 real-path)))
+      (sqlite-register
+       #:dbpath (string-append db-dir "/db.sqlite")
+       #:path to-register
+       #:references references
+       #:deriver deriver
+       #:hash (string-append "sha256:"
+                             (bytevector->base16-string hash))
+       #:nar-size nar-size))))
 
 \f
 ;;;
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-.dir-locals.el-properly-indent-sql-macros.patch --]
[-- Type: text/x-patch, Size: 7403 bytes --]

From 855a2347ee90cda2835deffda0a7fcb1348b8aef Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 20:28:33 -0500
Subject: [PATCH 3/7] .dir-locals.el: properly indent sql macros.

* .dir-locals.el: add indentation for with-sql-statement, with-sql-database,
  run-sql, and run-statement.
* guix/sql.scm: use that indentation.
  (step-all): fix invocation of named-let so it works for multi-step
  statements.
---
 .dir-locals.el |  5 +++
 guix/sql.scm   | 99 ++++++++++++++++++++++++++++++----------------------------
 2 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index 04b58d2ce..d5caef0a6 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -73,6 +73,11 @@
    (eval . (put 'wrap-program 'scheme-indent-function 1))
    (eval . (put 'with-imported-modules 'scheme-indent-function 1))
 
+   (eval . (put 'with-sql-statement 'scheme-indent-function 1))
+   (eval . (put 'with-sql-database 'scheme-indent-function 1))
+   (eval . (put 'run-sql 'scheme-indent-function 1))
+   (eval . (put 'run-statement 'scheme-indent-function 1))
+
    (eval . (put 'call-with-container 'scheme-indent-function 1))
    (eval . (put 'container-excursion 'scheme-indent-function 1))
    (eval . (put 'eventually 'scheme-indent-function 1))
diff --git a/guix/sql.scm b/guix/sql.scm
index b6153e332..d5c72105b 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -55,7 +55,7 @@
   (let maybe-step ((ret (sqlite-step statement))
                    (count 0))
     (if ret
-        (maybe-step ret (+ count 1))
+        (maybe-step (sqlite-step statement) (+ count 1))
         count)))
 
 ;; I get the feeling schemers have probably already got this "with" business
@@ -72,8 +72,8 @@ key-value pairs."
        (dynamic-wind noop
                      (lambda ()
                        (sql-parameters statement-var
-                                            (name1 val1)
-                                            (name2 val2) ...)
+                                       (name1 val1)
+                                       (name2 val2) ...)
                        exps ...)
                      (lambda ()
                        (sqlite-finalize statement-var)))))
@@ -155,10 +155,11 @@ placeholder name and the value to use. Returns the number of rows."
   "If the path \"path\" exists in the ValidPaths table, return its
 id. Otherwise, return #f. If you already have a compiled statement for this
 purpose, you can give it as statement."
-  (with-sql-statement db path-id-sql statement
-                      (;("$path" path)
-                       (1 path))
-                      (single-result statement)))
+  (with-sql-statement
+      db path-id-sql statement
+      (;("$path" path)
+       (1 path))
+      (single-result statement)))
 
 
 (define update-sql
@@ -177,32 +178,34 @@ of course. Returns the row id of the row that was modified or inserted."
   (let ((id (path-id db path)))
     (if id
         (begin
-          (run-sql db update-sql
-                   ;; As you may have noticed, sqlite-bind doesn't behave
-                   ;; exactly how I was expecting...
-                   ;; ("$id" id)
-                   ;; ("$deriver" deriver)
-                   ;; ("$hash" hash)
-                   ;; ("$size" nar-size)
-                   ;; ("$time" time)
-                   (5 id)
-                   (3 deriver)
-                   (1 hash)
-                   (4 nar-size)
-                   (2 time))
+          (run-sql
+              db update-sql
+              ;; As you may have noticed, sqlite-bind doesn't behave
+              ;; exactly how I was expecting...
+              ;; ("$id" id)
+              ;; ("$deriver" deriver)
+              ;; ("$hash" hash)
+              ;; ("$size" nar-size)
+              ;; ("$time" time)
+              (5 id)
+              (3 deriver)
+              (1 hash)
+              (4 nar-size)
+              (2 time))
           id)
         (begin
-          (run-sql db insert-sql
-                   ;; ("$path" path)
-                   ;; ("$deriver" deriver)
-                   ;; ("$hash" hash)
-                   ;; ("$size" nar-size)
-                   ;; ("$time" time)
-                   (1 path)
-                   (4 deriver)
-                   (2 hash)
-                   (5 nar-size)
-                   (3 time))
+          (run-sql
+              db insert-sql
+              ;; ("$path" path)
+              ;; ("$deriver" deriver)
+              ;; ("$hash" hash)
+              ;; ("$size" nar-size)
+              ;; ("$time" time)
+              (1 path)
+              (4 deriver)
+              (2 hash)
+              (5 nar-size)
+              (3 time))
           (sqlite-last-insert-rowid db)))))
 
 (define add-reference-sql
@@ -213,15 +216,16 @@ FROM ValidPaths WHERE path = $reference")
   "referrer is the id of the referring store item, references is a list
 containing store item paths being referred to. Note that all of the store
 items in \"references\" should already be registered."
-  (with-sql-statement db add-reference-sql add-reference-statement ()
-                      (for-each (lambda (reference)
-                                  (run-statement db
-                                                 add-reference-statement
-                                                 ;("$referrer" referrer)
-                                                 ;("$reference" reference)
-                                                 (1 referrer)
-                                                 (2 reference)))
-                                references)))
+  (with-sql-statement
+      db add-reference-sql add-reference-statement ()
+      (for-each (lambda (reference)
+                  (run-statement
+                      db add-reference-statement
+                                        ;("$referrer" referrer)
+                                        ;("$reference" reference)
+                      (1 referrer)
+                      (2 reference)))
+                references)))
 
 ;; XXX figure out caching of statement and database objects... later
 (define* (sqlite-register #:key dbpath path references deriver hash nar-size)
@@ -232,11 +236,12 @@ path of the derivation that created the store item PATH, HASH is the
 base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
 \"sha256:\") after being converted to nar form, and nar-size is the size in
 bytes of the store item denoted by PATH after being converted to nar form."
-  (with-sql-database dbpath db
-                     (let ((id (update-or-insert #:db db
-                                                 #:path path
-                                                 #:deriver deriver
-                                                 #:hash hash
-                                                 #:nar-size nar-size
-                                                 #:time (current-time))))
+  (with-sql-database
+      dbpath db
+      (let ((id (update-or-insert #:db db
+                                  #:path path
+                                  #:deriver deriver
+                                  #:hash hash
+                                  #:nar-size nar-size
+                                  #:time (current-time))))
      (add-references db id references))))
-- 
2.13.0


[-- Attachment #5: 0004-guix-sql.scm-split-into-generic-and-store-specific-p.patch --]
[-- Type: text/x-patch, Size: 12906 bytes --]

From e3619f840c69ea75669cf302fa3054ddc53aefb5 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 21:31:24 -0500
Subject: [PATCH 4/7] guix: sql.scm: split into generic and store-specific
 parts.

* guix/sql.scm (path-id-sql, path-id, update-sql, insert-sql,
  update-or-insert, add-reference-sql, add-references, sqlite-register):
  removed.
  (sqlite-parameter-index): new procedure.
  (sqlite-parameters): use sqlite-parameter-index, works with parameter names
  instead of indexes now.  Updated callers.
* guix/store/database.scm: new file.
  (path-id-sql, path-id, update-sql, insert-sql, update-or-insert,
  add-reference-sql, add-references, sqlite-register): added.
* guix/store.scm: use (guix store database) instead of (guix sql).
---
 guix/sql.scm            | 134 ++++++++++++------------------------------------
 guix/store.scm          |   2 +-
 guix/store/database.scm | 104 +++++++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+), 102 deletions(-)
 create mode 100644 guix/store/database.scm

diff --git a/guix/sql.scm b/guix/sql.scm
index d5c72105b..6b6f7867d 100644
--- a/guix/sql.scm
+++ b/guix/sql.scm
@@ -21,7 +21,20 @@
   #:use-module (system foreign)
   #:use-module (rnrs bytevectors)
   #:use-module (srfi srfi-9)
-  #:export (sqlite-register))
+  #:export (sqlite-last-insert-rowid
+            sql-parameters
+            with-sql-statement
+            with-sql-database
+            run-sql
+            run-statement
+            single-result)
+  #:re-export (sqlite-step
+               sqlite-fold
+               sqlite-fold-right
+               sqlite-map
+               sqlite-prepare
+               sqlite-reset
+               sqlite-finalize))
 
 ;; Miscellaneous SQL stuff, currently just setup for sqlite-register. Mostly
 ;; macros.
@@ -37,16 +50,31 @@
       "Gives the row id of the last inserted row in DB."
       (last-rowid ((@@ (sqlite3) db-pointer) db)))))
 
+(define sqlite-parameter-index
+  (let ((param-index (pointer->procedure
+                      int
+                      (dynamic-func "sqlite3_bind_parameter_index"
+                                    (@@ (sqlite3) libsqlite3))
+                      (list '* '*))))
+    (lambda (statement key)
+      "Gives the index of an sqlite parameter for a certain statement with a
+certain (string) name."
+      (param-index ((@@ (sqlite3) stmt-pointer) statement)
+                   (string->pointer key "utf-8")))))
+
 
-;; Should I go from key->index here or try to change that in guile-sqlite3?
 (define-syntax sql-parameters
   (syntax-rules ()
     "Converts key-value pairs into sqlite bindings for a specific statement."
     ((sql-parameters statement (name1 val1) (name2 val2) (name3 val3) ...)
-     (begin (sqlite-bind statement name1 val1)
+     (begin (sqlite-bind statement
+                         (sqlite-parameter-index statement name1)
+                         val1)
             (sql-parameters statement (name2 val2) (name3 val3) ...)))
     ((sql-parameters statement (name value))
-     (sqlite-bind statement name value))))
+     (sqlite-bind statement
+                  (sqlite-parameter-index statement name)
+                  value))))
 
 (define* (step-all statement #:optional (callback noop))
   "Step until statement is completed. Return number of rows."
@@ -141,8 +169,7 @@ placeholder name and the value to use. Returns the number of rows."
                    (lambda ()
                      (sqlite-reset statement))))))
 
-(define path-id-sql
-  "SELECT id FROM ValidPaths WHERE path = $path")
+
 
 (define (single-result statement)
   "Gives the first element of the first row returned by statement."
@@ -150,98 +177,3 @@ placeholder name and the value to use. Returns the number of rows."
     (if row
         (vector-ref row 0)
         #f)))
-
-(define* (path-id db path)
-  "If the path \"path\" exists in the ValidPaths table, return its
-id. Otherwise, return #f. If you already have a compiled statement for this
-purpose, you can give it as statement."
-  (with-sql-statement
-      db path-id-sql statement
-      (;("$path" path)
-       (1 path))
-      (single-result statement)))
-
-
-(define update-sql
-  "UPDATE ValidPaths SET hash = $hash, registrationTime = $time, deriver =
-$deriver, narSize = $size WHERE id = $id")
-
-(define insert-sql
-  "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
-VALUES ($path, $hash, $time, $deriver, $size)")
-
-(define* (update-or-insert #:key db path deriver hash nar-size time)
-  "The classic update-if-exists and insert-if-doesn't feature that sqlite
-doesn't exactly have... they've got something close, but it involves deleting
-and re-inserting instead of updating, which causes problems with foreign keys,
-of course. Returns the row id of the row that was modified or inserted."
-  (let ((id (path-id db path)))
-    (if id
-        (begin
-          (run-sql
-              db update-sql
-              ;; As you may have noticed, sqlite-bind doesn't behave
-              ;; exactly how I was expecting...
-              ;; ("$id" id)
-              ;; ("$deriver" deriver)
-              ;; ("$hash" hash)
-              ;; ("$size" nar-size)
-              ;; ("$time" time)
-              (5 id)
-              (3 deriver)
-              (1 hash)
-              (4 nar-size)
-              (2 time))
-          id)
-        (begin
-          (run-sql
-              db insert-sql
-              ;; ("$path" path)
-              ;; ("$deriver" deriver)
-              ;; ("$hash" hash)
-              ;; ("$size" nar-size)
-              ;; ("$time" time)
-              (1 path)
-              (4 deriver)
-              (2 hash)
-              (5 nar-size)
-              (3 time))
-          (sqlite-last-insert-rowid db)))))
-
-(define add-reference-sql
-  "INSERT OR IGNORE INTO Refs (referrer, reference) SELECT $referrer, id
-FROM ValidPaths WHERE path = $reference")
-
-(define (add-references db referrer references)
-  "referrer is the id of the referring store item, references is a list
-containing store item paths being referred to. Note that all of the store
-items in \"references\" should already be registered."
-  (with-sql-statement
-      db add-reference-sql add-reference-statement ()
-      (for-each (lambda (reference)
-                  (run-statement
-                      db add-reference-statement
-                                        ;("$referrer" referrer)
-                                        ;("$reference" reference)
-                      (1 referrer)
-                      (2 reference)))
-                references)))
-
-;; XXX figure out caching of statement and database objects... later
-(define* (sqlite-register #:key dbpath path references deriver hash nar-size)
-  "Registers this stuff in a database specified by DBPATH. PATH is the string
-path of some store item, REFERENCES is a list of string paths which the store
-item PATH refers to (they need to be already registered!), DERIVER is a string
-path of the derivation that created the store item PATH, HASH is the
-base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
-\"sha256:\") after being converted to nar form, and nar-size is the size in
-bytes of the store item denoted by PATH after being converted to nar form."
-  (with-sql-database
-      dbpath db
-      (let ((id (update-or-insert #:db db
-                                  #:path path
-                                  #:deriver deriver
-                                  #:hash hash
-                                  #:nar-size nar-size
-                                  #:time (current-time))))
-     (add-references db id references))))
diff --git a/guix/store.scm b/guix/store.scm
index 2f16ec2b1..f32cdc6aa 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -44,7 +44,7 @@
   #:use-module (ice-9 popen)
   #:use-module (web uri)
   #:use-module (sqlite3)
-  #:use-module (guix sql)
+  #:use-module (guix store database)
   #:export (%daemon-socket-uri
             %gc-roots-directory
             %default-substitute-urls
diff --git a/guix/store/database.scm b/guix/store/database.scm
new file mode 100644
index 000000000..ecf7ba4aa
--- /dev/null
+++ b/guix/store/database.scm
@@ -0,0 +1,104 @@
+;;; Copyright © 2017 Caleb Ristvedt <caleb.ristvedt@cune.org>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (guix store database)
+  #:use-module (guix sql)
+  #:export (sqlite-register))
+
+;;; Code for working with the store database directly.
+
+(define path-id-sql
+  "SELECT id FROM ValidPaths WHERE path = $path")
+
+(define* (path-id db path)
+  "If the path \"path\" exists in the ValidPaths table, return its
+id. Otherwise, return #f."
+  (with-sql-statement
+      db path-id-sql statement
+      (("$path" path))
+      (single-result statement)))
+
+
+(define update-sql
+  "UPDATE ValidPaths SET hash = $hash, registrationTime = $time, deriver =
+$deriver, narSize = $size WHERE id = $id")
+
+(define insert-sql
+  "INSERT INTO ValidPaths (path, hash, registrationTime, deriver, narSize)
+VALUES ($path, $hash, $time, $deriver, $size)")
+
+(define* (update-or-insert #:key db path deriver hash nar-size time)
+  "The classic update-if-exists and insert-if-doesn't feature that sqlite
+doesn't exactly have... they've got something close, but it involves deleting
+and re-inserting instead of updating, which causes problems with foreign keys,
+of course. Returns the row id of the row that was modified or inserted."
+  (let ((id (path-id db path)))
+    (if id
+        (begin
+          (run-sql
+              db update-sql
+              ("$id" id)
+              ("$deriver" deriver)
+              ("$hash" hash)
+              ("$size" nar-size)
+              ("$time" time))
+          id)
+        (begin
+          (run-sql
+              db insert-sql
+              ("$path" path)
+              ("$deriver" deriver)
+              ("$hash" hash)
+              ("$size" nar-size)
+              ("$time" time))
+          (sqlite-last-insert-rowid db)))))
+
+(define add-reference-sql
+  "INSERT OR IGNORE INTO Refs (referrer, reference) SELECT $referrer, id
+FROM ValidPaths WHERE path = $reference")
+
+(define (add-references db referrer references)
+  "referrer is the id of the referring store item, references is a list
+containing store item paths being referred to. Note that all of the store
+items in \"references\" should already be registered."
+  (with-sql-statement
+      db add-reference-sql add-reference-statement ()
+      (for-each (lambda (reference)
+                  (run-statement
+                      db add-reference-statement
+                      ("$referrer" referrer)
+                      ("$reference" reference)))
+                references)))
+
+;; XXX figure out caching of statement and database objects... later
+(define* (sqlite-register #:key dbpath path references deriver hash nar-size)
+  "Registers this stuff in a database specified by DBPATH. PATH is the string
+path of some store item, REFERENCES is a list of string paths which the store
+item PATH refers to (they need to be already registered!), DERIVER is a string
+path of the derivation that created the store item PATH, HASH is the
+base16-encoded sha256 hash of the store item denoted by PATH (prefixed with
+\"sha256:\") after being converted to nar form, and nar-size is the size in
+bytes of the store item denoted by PATH after being converted to nar form."
+  (with-sql-database
+      dbpath db
+      (let ((id (update-or-insert #:db db
+                                  #:path path
+                                  #:deriver deriver
+                                  #:hash hash
+                                  #:nar-size nar-size
+                                  #:time (current-time))))
+     (add-references db id references))))
-- 
2.13.0


[-- Attachment #6: 0005-guix-register-path-use-new-store-database-directory.patch --]
[-- Type: text/x-patch, Size: 5205 bytes --]

From ae3f83c82c46d8dec3dce5a3ff89805111c8e7ab Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Mon, 5 Jun 2017 22:34:59 -0500
Subject: [PATCH 5/7] guix: register-path: use new %store-database-directory

* guix/config.scm.in (%store-database-directory): new variable.
* guix/store.scm (register-path): use variables from (guix config) instead of
  using environment variables directly.
---
 guix/config.scm.in |  6 ++++++
 guix/store.scm     | 41 ++++++++++++++++++++++-------------------
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/guix/config.scm.in b/guix/config.scm.in
index 8f2c4abd8..dfe5fe0db 100644
--- a/guix/config.scm.in
+++ b/guix/config.scm.in
@@ -1,5 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2017 Caleb Ristvedt <caleb.ristvedt@cune.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -29,6 +30,7 @@
 
             %store-directory
             %state-directory
+            %store-database-directory
             %config-directory
             %guix-register-program
 
@@ -80,6 +82,10 @@
   (or (getenv "NIX_STATE_DIR")
       (string-append %localstatedir "/guix")))
 
+(define %store-database-directory
+  (or (and=> (getenv "NIX_DB_DIR") canonicalize-path)
+      (string-append %state-directory "/db")))
+
 (define %config-directory
   ;; This must match `GUIX_CONFIGURATION_DIRECTORY' as defined in `nix/local.mk'.
   (or (getenv "GUIX_CONFIGURATION_DIRECTORY")
diff --git a/guix/store.scm b/guix/store.scm
index f32cdc6aa..77fd5b51e 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1257,7 +1257,10 @@ makes a wrapper around a port which implements GET-POSITION."
                         #:key (references '()) deriver prefix
                         state-directory)
   ;; Priority for options: first what is given, then environment variables,
-  ;; then defaults.
+  ;; then defaults. %state-directory, %store-directory, and
+  ;; %store-database-directory already handle the "environment variables /
+  ;; defaults" question, so we only need to choose between what is given and
+  ;; those.
   "Register PATH as a valid store file, with REFERENCES as its list of
 references, and DERIVER as its deriver (.drv that led to it.)  If PREFIX is
 given, it must be the name of the directory containing the new store to
@@ -1271,28 +1274,28 @@ be used internally by the daemon's build hook."
                   (state-directory
                    (string-append state-directory "/db"))
                   (prefix
-                   (string-append prefix %state-directory "/db"))
-                  ((getenv "NIX_DB_DIR")
-                   (getenv "NIX_DB_DIR"))
-                  ((getenv "NIX_STATE_DIR")
-                   (string-append (getenv "NIX_STATE_DIR") "/db"))
+                   ;; If prefix is specified, the value of NIX_STATE_DIR
+                   ;; (which affects %state-directory) isn't supposed to
+                   ;; affect db-dir, only the compile-time-customized
+                   ;; default should. 
+                   (string-append prefix %localstatedir "/guix/db"))
                   (else
-                   (string-append %state-directory "/db"))))
+                   %store-database-directory)))
          (store-dir (if prefix
-                        (string-append prefix %store-directory)
-                        (or
-                         (getenv "NIX_STORE_DIR")
-                         (getenv "NIX_STORE")
-                         %store-directory)))
+                        ;; same situation as above
+                        (string-append prefix %storedir)
+                        %store-directory))
          (to-register (if prefix
-                          ;; note: we assume here that if path is, for example,
-                          ;; /foo/bar/gnu/store/thing.txt, then an environment
-                          ;; variable has been used to change the store
-                          ;; directory to /foo/bar/gnu/store.
-                          (string-append %store-directory "/" (basename path))
+                          (string-append %storedir "/" (basename path))
+                          ;; note: we assume here that if path is, for
+                          ;; example, /foo/bar/gnu/store/thing.txt and prefix
+                          ;; isn't given, then an environment variable has
+                          ;; been used to change the store directory to
+                          ;; /foo/bar/gnu/store, since otherwise real-path
+                          ;; would end up being /gnu/store/thing.txt, which is
+                          ;; probably not the right file in this case.
                           path))
-         (real-path (string-append store-dir "/"
-                                   (basename path))))
+         (real-path (string-append store-dir "/" (basename path))))
     (let-values (((hash nar-size)
                   (nar-sha256 real-path)))
       (sqlite-register
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #7: 0006-guix-register-path-reset-timestamps-after-registerin.patch --]
[-- Type: text/x-patch, Size: 2627 bytes --]

From caccdd578b2bb87a23d4bdbe29039595391bf768 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Tue, 6 Jun 2017 00:04:54 -0500
Subject: [PATCH 6/7] guix: register-path: reset timestamps after registering.

* guix/store.scm (register-path): Now resets timestamps.
---
 guix/store.scm | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index 77fd5b51e..cf08da632 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -45,6 +45,7 @@
   #:use-module (web uri)
   #:use-module (sqlite3)
   #:use-module (guix store database)
+  #:use-module (gnu build install)
   #:export (%daemon-socket-uri
             %gc-roots-directory
             %default-substitute-urls
@@ -1245,17 +1246,14 @@ makes a wrapper around a port which implements GET-POSITION."
         (values hash
                 size)))))
 
-;; TODO: make this canonicalize store items that are registered. This involves
-;; setting permissions and timestamps, I think. Also, run a "deduplication
-;; pass", whatever that involves. Also, handle databases not existing yet
-;; (what should the default behavior be?  Figuring out how the C++ stuff
-;; currently does it sounds like a lot of grepping for global
-;; variables...). Also, return #t on success like the documentation says we
-;; should.
+;; TODO: Run a "deduplication pass", whatever that involves. Also, handle
+;; databases not existing yet (what should the default behavior be?  Figuring
+;; out how the C++ stuff currently does it sounds like a lot of grepping for
+;; global variables...). Also, return #t on success like the documentation
+;; says we should.
 
 (define* (register-path path
-                        #:key (references '()) deriver prefix
-                        state-directory)
+                        #:key (references '()) deriver prefix state-directory)
   ;; Priority for options: first what is given, then environment variables,
   ;; then defaults. %state-directory, %store-directory, and
   ;; %store-database-directory already handle the "environment variables /
@@ -1305,7 +1303,13 @@ be used internally by the daemon's build hook."
        #:deriver deriver
        #:hash (string-append "sha256:"
                              (bytevector->base16-string hash))
-       #:nar-size nar-size))))
+       #:nar-size nar-size)
+      ;; reset-timestamps prints a message on each invocation that we probably
+      ;; don't want.
+      (with-output-to-port 
+          (%make-void-port "w")
+        (lambda ()
+          (reset-timestamps real-path))))))
 
 \f
 ;;;
-- 
2.13.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #8: 0007-guix-register-path-do-deduplication.patch --]
[-- Type: text/x-patch, Size: 3328 bytes --]

From 23c5dfb1b32e8ba9e820ce9866fc44a28b5603c2 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Tue, 6 Jun 2017 02:44:41 -0500
Subject: [PATCH 7/7] guix: register-path: do deduplication.

* guix/store.scm (get-temp-link, replace-with-link, deduplicate): new
  procedures.
  (register-path): uses deduplicate now.
---
 guix/store.scm | 47 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index cf08da632..6284736fa 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -43,7 +43,6 @@
   #:use-module (ice-9 vlist)
   #:use-module (ice-9 popen)
   #:use-module (web uri)
-  #:use-module (sqlite3)
   #:use-module (guix store database)
   #:use-module (gnu build install)
   #:export (%daemon-socket-uri
@@ -1246,11 +1245,44 @@ makes a wrapper around a port which implements GET-POSITION."
         (values hash
                 size)))))
 
-;; TODO: Run a "deduplication pass", whatever that involves. Also, handle
-;; databases not existing yet (what should the default behavior be?  Figuring
-;; out how the C++ stuff currently does it sounds like a lot of grepping for
-;; global variables...). Also, return #t on success like the documentation
-;; says we should.
+(define (get-temp-link target)
+  "Like mkstemp!, but instead of creating a new file and giving you the name,
+it creates a new hardlink to TARGET and gives you the name."
+  (let try-again ((tempname (tmpnam)))
+    (catch
+      #t
+      (lambda ()
+        (link target tempname)
+        tempname)
+      (lambda ()
+        (try-again (tmpnam))))))
+
+(define (replace-with-link target to-replace)
+  "Replaces the file TO-REPLACE with a hardlink to TARGET"
+  ;; According to the C++ code, this is how you replace it with a link
+  ;; "atomically".
+  (let ((temp-link (get-temp-link target)))
+    (delete-file to-replace)
+    (rename-file temp-link to-replace)))
+
+;; TODO: handling in case the .links directory doesn't exist? For now I'll
+;; just assume it's the responsibility of whoever makes the store to create
+;; it.
+(define (deduplicate path store hash)
+  "Checks if a store item with hash HASH already exists. If so, replaces PATH
+with a hardlink to the already-existing one. If not, it registers PATH so that
+future duplicates can hardlink to it."
+  (let ((links-path (string-append store
+                                   "/.links/"
+                                   (bytevector->base16-string hash))))
+    (if (file-exists? links-path)
+        (replace-with-link links-path path)
+        (link path links-path))))
+
+;; TODO: Handle databases not existing yet (what should the default behavior
+;; be?  Figuring out how the C++ stuff currently does it sounds like a lot of
+;; grepping for global variables...). Also, return #t on success like the
+;; documentation says we should.
 
 (define* (register-path path
                         #:key (references '()) deriver prefix state-directory)
@@ -1309,7 +1341,8 @@ be used internally by the daemon's build hook."
       (with-output-to-port 
           (%make-void-port "w")
         (lambda ()
-          (reset-timestamps real-path))))))
+          (reset-timestamps real-path)))
+      (deduplicate real-path store-dir hash))))
 
 \f
 ;;;
-- 
2.13.0


[-- Attachment #9: Type: text/plain, Size: 87 bytes --]



> Thank you, and thumbs up for the quality work so far!
>
> Ludo’.

Thanks!

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-06  8:59     ` Caleb Ristvedt
@ 2017-06-08 16:59       ` Ludovic Courtès
  0 siblings, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-06-08 16:59 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Hi reepca,

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> “make check TESTS=tests/store.scm” passes now with the fixes you posted
>> today (‘getenv’ & co.)
>
> I end up with 2 failed tests, but neither of them are the register-path
> one. 

Could you send tests/store.log?

>> For deduplication, you already know the (guix hash) API that will allow
>> you to do that I guess.  The “deduplication database” is simply the
>> /gnu/store/.links directory.  Each entry there is a hard link to a file;
>> the name of the entry is the hex representation of the sha256 hash of
>> that file.  So when inserting a file in the store, all you need is to
>> look up its hash in /gnu/store/.links and hard-link from there.  See
>> what I mean?
>
> And if it isn't in /gnu/store/.links, put a hardlink there? Also, when
> you say "when inserting a file", what do you mean by that? I was under
> the impression that the paths given to register-path were already in the
> store but simply weren't in the database or otherwise "officially" in
> the store. Is my assumption incorrect?

Your assumption is correct, sorry for the confusion.

‘register-path’ gets called when the files are already in the store and
the deduplication phase therefore happens when the files are already in
the store.  This can be seen in guix-register.cc, which does:

  store->registerValidPaths (infos);
  if (optimize)
    store->optimisePath (prefix + i.path);

I suppose we could add a #:optimize? parameter to ‘register-path’, which
would default to #t.

> Less on-topic, I don't really know how sha256 works, but if it is a
> hashing function, there must be a possibility (albeit small) of
> collisions, right? Is it worth developing a strategy for handling them?

No.  (There’s a paper called “Compare-by-Hash: A Reasoned Analysis” by
John Black that discusses this, if you want to read more.  :-))

>> Some minor suggestions about the code:
>
> Implemented all of these except maybe half of the commit log one... I'm
> not sure what the commit message headline should be, so I sort of
> guessed based on some of the ones in recent history, and I'm not sure
> I've got the formatting quite right yet, but it should be a lot closer.

Awesome!  AFAICS everything is in place, so I’ll just comment on the
cosmetic side of things; it’s kinda boring but it’s the sort of details
that will allow us to quickly integrate your code, so I think it
matters.

As discussed previously, I think it would make sense to move the
deduplication code to a new (guix store deduplication) module.

  ;; TODO: handling in case the .links directory doesn't exist? For now I'll
  ;; just assume it's the responsibility of whoever makes the store to create
  ;; it.
  (define (deduplicate path store hash)
    "Checks if a store item with hash HASH already exists. If so, replaces PATH
  with a hardlink to the already-existing one. If not, it registers PATH so that
  future duplicates can hardlink to it."
    (let ((links-path (string-append store
                                     "/.links/"
                                     (bytevector->base16-string hash))))
      (if (file-exists? links-path)
          (replace-with-link links-path path)
          (link path links-path))))

Here you could add (mkdir-p links-directory), it doesn’t hurt (‘mkdir-p’
comes from (guix build utils)).  I would change the signature to:

  (define* (deduplicate file hash #:optional (store %store-directory))
    …)

Regarding this:

  (define (get-temp-link target)
    "Like mkstemp!, but instead of creating a new file and giving you the name,
  it creates a new hardlink to TARGET and gives you the name."
    (let try-again ((tempname (tmpnam)))
      (catch
        #t
        (lambda ()
          (link target tempname)
          tempname)
        (lambda ()
          (try-again (tmpnam))))))

I think ‘get-temp-link’ should do the same as ‘optimizePath_’ in the C++
code, which is this:

    Path tempLink = (format("%1%/.tmp-link-%2%-%3%")
        % settings.nixStore % getpid() % rand()).str();

The reason is that this temporary file must live on the same file system
as the store (link(2) fails with EXDEV when call for files that live on
different file systems).

As is it written, the ‘deduplicate’ procedure does not handle the ENOSPC
and EMLINK cases that ‘optimizePath_’ handles.  Would be nice to catch
them.  To check for these specific error cases, you can do:

  (catch 'system-error
    (lambda ()
      (link a b))
    (lambda args
      (cond ((= EMLINK (system-error-errno args)) …)
            …)))

Other things:

  • Conceptually (guix store) should not import (gnu build install),
    which lives on a higher layer.  So I would suggest moving
    ‘reset-timestamps’ to (guix store deduplication) and have (gnu build
    install) use (guix store deduplication).

  • The convention in GNU is to use the term “file name” for file names
    (I often write ‘file’ as the variable name), and to use “path” only
    for search paths like $PATH, etc.  There are a few places where
    ‘register-path’ & co. use ‘path’ instead of ‘file’.

  • In databases.scm, the headers lacks the first line.  :-)

Your commit logs are good BTW.  :-)  For changes in store.scm, I would
use the “store:” prefix in the subject line rather than “guix:” to
clarify what part of the code it touches (you can check “git log
guix/store.scm” to get an idea), but that’s about it.

>> What about pushing your changes to a WIP branch on Savannah or
>> elsewhere?  (If you have an account on Savannah we can give you access.)
>
> I just made an account today as "reepca". Should I submit a "request for
> inclusion"? In the meantime, here's a much less fixup-riddled patch set
> that I think is just about feature-complete as far as register-path is
> concerned:

Excellent.  By the time you’ll read this email I should have added you
on Savannah.

Please make sure the OpenPGP key you’ll use to sign your commits is at
<https://savannah.gnu.org/users/reepca>, and send a reply to this
message signed with your key.

Also please read ‘HACKING’ about commit access to the repo.  I think
you’ll be working mostly on a branch, say, ‘scheme-daemon’.  You can
push your changes on the new branch to begin with.

A possible next task would be to check whether code that uses the
‘register-path’ procedure still works after this change.  That includes
things like ‘guix pack -f --localstatedir foo’ and also ‘guix system
disk-image’.  A few things will need to be adjusted to pull in the new
modules and guile-sqlite3.

Alternately, you could move on to other areas, such as the functionality
roughly corresponding to libstore/build.cc.

Anyway, great work, thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
@ 2017-06-12  5:14 Caleb Ristvedt
  2017-06-17 23:05 ` Ludovic Courtès
  0 siblings, 1 reply; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-12  5:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Apologies for the delay - in a shocking twist, it turns out professors
insisting on only one exit from functions had a point, and trying to
fully understand what optimisePath_, with its 14 exits and a goto, does,
was a challenge. I think I handle most of the main error cases (EEXIST,
ENOSPC, EMLINK) sort of okay now (the "shrug and ignore" sort of
exceptions are now caught, the rest are still propagated). I've also
created guix/store/deduplication.scm, and moved the relevant stuff
there. The tests pass to the same degree as before (2 failing, but
neither of them are register-path).

Here is my test-suite.log:


[-- Attachment #2: test-suite.log --]
[-- Type: application/octet-stream, Size: 99199 bytes --]

[-- Attachment #3: Type: text/plain, Size: 982 bytes --]


When I create a new module that directly incorporates code written by
others (as in the case of moving reset-timestamps to (guix store
deduplication)), should I add anything to the copyright notice at the
top? Would that involve looking through the commit history to find
exactly who wrote that?

I read HACKING - does the part about waiting two weeks for feedback on
guix-patches before committing still apply to branches?

I've now updated my savannah account with a key, used to sign this.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEdNapMPRLm4SepVYGwWaqSV9/GJwFAlk+InUACgkQwWaqSV9/
GJxpRAf+KWd4IQGdcGi4ret3GCfGJcfsmmZiSni8enzo4EdKmxT4MOf/FYwnrOHF
ZB33YSc2mlhxPKs1y2soWHsPYyhODVL2sy3jQ4CKSpINZNqb0iHts8zfDSDRQoKi
krOzPZTB7o1eOHYDS9xXnwqoNdGRzwG1KrdQZYCAIg6/UdkFaWevSn+Vo1LgzRp+
IFUxrLj92JxBraiyhQV4Uf7hALhA3WWKOym/9K5WN/NO8x4Somv4nIdrMFS21euq
qQbPzZ+5gb0m8k0smH6JC3E7DkPXL0UYTdbdsr4aGxIIFvwy6qLmDyBKZR7ieDjM
x2fTJp20v+28BV0Gzbtk4LnK1v59zA==
=1ttQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
@ 2017-06-12  5:45 Caleb Ristvedt
  0 siblings, 0 replies; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-12  5:45 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hm, M-x epa-mail-verify says the signature is wrong. When I test it
without an attachment, it works fine. That's strange, I would have
thought epa-mail-sign would be aware of such things. Anyway, here's a
message that should verify correctly.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEdNapMPRLm4SepVYGwWaqSV9/GJwFAlk+KlAACgkQwWaqSV9/
GJwvIAgAmRbOYPMgLHOdQqhV7U8jHXdWvESP2Q6ZWMYSjo0XSMlgjgViEYFl/1CE
uLRtUKm6Lm2xUKznlUV1dJ3aZ6H35TtPxgjbkcpNaKUR9kMXIzZWnBkjhTHPKi6b
bNEOdnEUyQY4qokD5J1/X9HMQ9Nj2KqC6jU3uAtTZVVhW+o0fMrk8aysNL4oSHho
PHBXQMqOloze0c3Wa9tXnKK8JiMQ7xaEv7iGt2GZc65jqyjxW17ZACP0AOf77IhS
bHQvC6xY1ZNfbD5In+4l3M8OfZO3sdrDMYXRH0vIFDrFirdCD27KUuZ64WnLLu26
5P/JR0F5YCOXpZstKHUz/HbGim1zJw==
=H7zu
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-12  5:14 Caleb Ristvedt
@ 2017-06-17 23:05 ` Ludovic Courtès
  0 siblings, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-06-17 23:05 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Howdy,

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Apologies for the delay - in a shocking twist, it turns out professors
> insisting on only one exit from functions had a point, and trying to
> fully understand what optimisePath_, with its 14 exits and a goto, does,
> was a challenge. I think I handle most of the main error cases (EEXIST,
> ENOSPC, EMLINK) sort of okay now (the "shrug and ignore" sort of
> exceptions are now caught, the rest are still propagated). I've also
> created guix/store/deduplication.scm, and moved the relevant stuff
> there. The tests pass to the same degree as before (2 failing, but
> neither of them are register-path).

Heheh.  :-)

> location: /home/reepca/Programming/guix/tests/store.scm:798
> source:
> + (test-assert
> +   "verify-store"
> +   (let* ((text (random-text))
> +          (file1 (add-text-to-store %store "foo" text))
> +          (file2 (add-text-to-store
> +                   %store
> +                   "bar"
> +                   (random-text)
> +                   (list file1))))
> +     (and (pk 'verify1 (verify-store %store))
> +          (begin
> +            (delete-file file1)
> +            (not (pk 'verify2 (verify-store %store))))
> +          (begin
> +            (call-with-output-file
> +              file1
> +              (lambda (port) (display text port)))
> +            (pk 'verify3 (verify-store %store))))))
> reading the Nix store...
> path `/home/reepca/Programming/code-downloads/guix/test-tmp/store/000hh9y0f7i6x9jkhhc3wq1iln4lvwvy-guile-xcb-1.3-guile-builder' is not in the Nix store
> actual-value: #f
> actual-error:
> + (srfi-34
> +   #<condition &nix-protocol-error [message: "executing SQLite statement: FOREIGN KEY constraint failed" status: 1] 2d0dea0>)
> result: FAIL

The dreaded foreign key constraint!  Can you rerun the tests after “rm
-rf test-tmp”?  It might be that you somehow borked the database under
test-tmp while testing stuff.

> When I create a new module that directly incorporates code written by
> others (as in the case of moving reset-timestamps to (guix store
> deduplication)), should I add anything to the copyright notice at the
> top? Would that involve looking through the commit history to find
> exactly who wrote that?

Yes, try to preserve the relevant copyright lines (by looking at C-x v l
or C-x v g), and also remember to add a line for you when adding new
code.

> I read HACKING - does the part about waiting two weeks for feedback on
> guix-patches before committing still apply to branches?

In your case (GSoC) you can go ahead.  We’ll probably review more
closely when we merge but that can be very quick if we’re already on the
same line of thought.

> I've now updated my savannah account with a key, used to sign this.
> -----BEGIN PGP SIGNATURE-----

Though something went from with your email client.  :-)

What are your thoughts on the next steps?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
@ 2017-06-18  9:22 Caleb Ristvedt
  2017-06-18 23:34 ` Chris Marusich
  2017-06-19 11:56 ` Ludovic Courtès
  0 siblings, 2 replies; 12+ messages in thread
From: Caleb Ristvedt @ 2017-06-18  9:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I figured out why I was getting those two test failures after looking
more closely at tests/store.log: when I installed GuixSD, I did so
directly from another hard drive on the same system (having read in the
past about issues with multi-booting, I figured I'd just decide which to
boot to by selecting hard drives). I had previously cloned guix on the
Ubuntu hard drive, and figured I'd just mount it and copy it over. It
turns out that re-running make didn't alter the tests quite as much as I
thought it would, and the old path was stuck in some of them (on GuixSD
there is no code-downloads folder, it's just straight in
Programming/). After a git clean, though, the tests all pass.

> Though something went from with your email client.  :-)

Aye, turns out M-x epa-mail-sign doesn't work well with attachments. I
posted another message that should be signed correctly, as well as this
one.

> What are your thoughts on the next steps?

Well, I took a look at libstore/build.cc, and that is a lot of code. I
hope I can get away with only reading as much as is necessary to clarify
details. To that end, I'd like to make sure that my high-level
understanding of the build process is accurate.

- - A derivation is a node in a (acyclic?) graph of "what depends on what" or,
  alternately put, what is input to what.

- - Derivations can depend on other derivations or plain files (can those files be
  directories?), where "plain files" cannot depend on anything else.

- - "Building" a derivation involves first ensuring that all of its inputs are
  "built" (in the case of inputs that are derivations, this means ensuring its
  output exists as a store item, and if not, producing it by this same
  process. In the case of plain file inputs, this means just ensuring that they
  exist as a store item), and then preparing a build environment. The "builder"
  script is then executed in a chroot rooted under the build tree. After it has
  completed, the output (where is it put during the build process?) from
  building the derivation is placed in the store (and registered).

- - "preparing a build environment" involves first creating a directory in which
  the build will be run, then populating it with certain expected files and
  directories (as detailed in 2.4.1 of the manual). Also, the store items
  produced from building the inputs need to somehow be made available (is there
  a gnu/store/ under the build tree? I feel like I don't quite understand where
  everything goes in the build tree). Also, set any environment variables
  associated with the derivation.

Assuming I've got the gist of it sort of right, that leaves one more question:
how are hashes for the paths computed? It's not sha256 like the database hashes,
it seems to be 160 bits in size. How are all the inputs hashed together? Are all
the inputs put in one big directory and that is hashed in nar form, or what? And
what about store items that don't have any inputs (for example, something placed
in there with add-text-to-store)? I suppose in that case they are their own
input, perhaps?

Some of those answers might be in the manual, but I didn't manage to
find them.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEdNapMPRLm4SepVYGwWaqSV9/GJwFAllGRk0ACgkQwWaqSV9/
GJw9TQf/W8ewHEFzwTssGhMGW/pGsSvUMnzdQNmx31Y1HLMpUbhBazbIUa57pz5g
ZfDbUIRnKBYlCPi5X+8Om8fzMjWpUixYeyv/Clk3sRvbcQhod0cejbRIC0PAiTN0
mVlu4tUWJlEXekcG7OqhdZ8AZXJ53qOkNMEHR3h91PCJOZj2fOVwzlC5kp7gUfcM
C9a/rgCb6ZG6dXQNH3Q0qI38+akDg3tbD0BDk97yhHCBSVitwbJd9vTiEl3QhAPP
WFlZjmZGFpfruZiuRDyO8pOFC4tfdsEy/wv4spjq63ZgBNqLQsI9Y0JfXJWM/fEH
d+KsI54ix/yDhZnbZofLaogAH3Fo8w==
=Ukhf
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-18  9:22 Caleb Ristvedt
@ 2017-06-18 23:34 ` Chris Marusich
  2017-06-19 11:47   ` Ludovic Courtès
  2017-06-19 11:56 ` Ludovic Courtès
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Marusich @ 2017-06-18 23:34 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 8576 bytes --]

Hi Caleb,

I'm sure Ludo may have more to say, but I think I can help you with a
few of your questions.

Caleb Ristvedt <caleb.ristvedt@cune.org> writes:

>> Though something went from with your email client.  :-)
>
> Aye, turns out M-x epa-mail-sign doesn't work well with attachments. I
> posted another message that should be signed correctly, as well as this
> one.

Looks like your latest email was signed correctly.  I see that it was
signed using your Savannah key, which has the following fingerprint:

74D6 A930 F44B 9B84 9EA5  5606 C166 AA49 5F7F 189C

> - A derivation is a node in a (acyclic?) graph of "what depends on what" or,
>   alternately put, what is input to what.

I don't think there's a simple answer for the question of "what is a
derivation"?  I'll explain what I know, and hopefully it'll help you.

No, derivations cannot depend on one another [1].  I believe it is
correct to think of a derivation as a node in an acyclic graph of
inter-dependent software components; that is one way to think about
derivations.  Perhaps Ludo can say more about that.

The word "derivation" is a bit overloaded.  Fundamentally (and vaguely),
it means "a component build action, which *derives* the component from
its inputs" [2].

More specifically, there are various interpretations of the word
"derivation", depending on the context.  For example, there are structs
in the Nix C++ source code that represent derivations [3].  In the Nix
expression language, derivations are represented as attribute sets [4].
There are also "store derivations", which are files in the store that
precisely encode the build actions that will be executed when the
derivation is built [5].

Some of this carries over to Guix.  The general meaning of the word
"derivation" appears to be the same as in Nix: a derivation basically
takes inputs and produces an output.  Guix also uses "store derivations"
because we use (a modified version of) the Nix daemon.  Note that the
Guix manual sometimes uses the word "derivation" to refer to "store
derivations", which can be confusing.

As you probably know, Guix eschews the Nix expression language in favor
of Guile for composing components.  Therefore, Guix has defined
abstractions in Guile to represent derivations [6].  Guix has also
defined additional abstractions to represent things like packages [7],
which are higher level abstractions than derivations.  There are a lot
of abstractions like this in Guix, and they interact with each other in
various ways.  To get an idea of what I mean, you can look at the
'(guix) Invoking guix graph' section in the Guix manual.  Derivations
are a general concept taken from Nix, but as far as I know, things like
bags and packages are Guix-specific concepts.

> - "Building" a derivation involves first ensuring that all of its inputs are
>   "built" (in the case of inputs that are derivations, this means ensuring its
>   output exists as a store item, and if not, producing it by this same
>   process. In the case of plain file inputs, this means just ensuring that they
>   exist as a store item), and then preparing a build environment. The "builder"
>   script is then executed in a chroot rooted under the build tree. After it has
>   completed, the output (where is it put during the build process?) from
>   building the derivation is placed in the store (and registered).

Note that sometimes the word "build" is used to refer to the act of
creating a derivation, but not executing its build actions [8].

The process you've described sounds basically right to me; Ludo probably
knows more.  The Nix thesis might be out of date by now, but it
describes (described?) these kinds of things in pretty good detail [9].
It also describes the invariants that must be upheld by the entire
process.

> - "preparing a build environment" involves first creating a directory
>in which
>   the build will be run, then populating it with certain expected files and
>   directories (as detailed in 2.4.1 of the manual). Also, the store items
>   produced from building the inputs need to somehow be made available (is there
>   a gnu/store/ under the build tree? I feel like I don't quite understand where
>   everything goes in the build tree). Also, set any environment variables
>   associated with the derivation.

I believe that inputs and outputs are, from the a derivation's builder's
perspective, just paths in the store which can be accessed via
environment variables that Nix sets up during the build.  I think this
is the same in Guix as in Nix.  However, in Guix, I believe there are
additional ways to refer to inputs when defining a builder, e.g. by
using G-Expressions, which hide details like this from the user so that
the user can reason about builds in terms of higher-level abstractions.

As for how the store items produced from building the inputs are "made
available," my understanding is that the builder just needs to write
files to the special "output" path, which (I think) is in the store.
That special output path is made available to the builder during the
build via an environment variable (the variable is named "out" or
"output"; I can't remember exactly).  If you're using G-Expressions to
define a builder, the output path can be accessed via "#$output" (see
'(guix) G-Expressions' for details).

> Assuming I've got the gist of it sort of right, that leaves one more
>question:
> how are hashes for the paths computed? It's not sha256 like the database hashes,
> it seems to be 160 bits in size. How are all the inputs hashed together? Are all
> the inputs put in one big directory and that is hashed in nar form, or what? And
> what about store items that don't have any inputs (for example, something placed
> in there with add-text-to-store)? I suppose in that case they are their own
> input, perhaps?

The hashing mechanisms, the relevant invariants, and the motivation
behind these design decisions are described in the Nix thesis [10].  I
think some (maybe all) of this hashing stuff is implemented in Guix in
guix/derivations.scm (check the 'derivation' procedure) and
guix/store.scm (check the procedures that 'derivation' calls).

Hope that helps!  Good luck with your project.

Footnotes: 
[1]  See footnote 4 on p. 29 of Eelco Dolstra's Ph. D. thesis.

[2]  See p. 27 of the thesis.  There are alternate (and equally vague)
definitions in the Guix and Nix manuals.  They all have the same spirit:
a derivation is like a function from software components to software
components.

[3]  See Derivation in derivations.hh in the Nix source code.

[4]  The following blog entry explains this in detail:
https://lethalman.blogspot.co.uk/2014/07/nix-pill-6-our-first-derivation.html

[5]  See Section 2.4 "Store derivations" in the thesis for a great
explanation of store derivations, including why it is useful to
represent derivations in this way.

[6]  See the <derivation> record type in guix/derivations.scm.  Looks
like Guix also implements some of the actual logic for serializing a
<derivation> into a store derivation (see write-derivation in the same
module).

[7]  See <package> guix/packages.scm.

[8]  See for example the docstring for the 'derivation' procedure defined
in guix/derivations.scm, which claims to "build a derivation" when in
fact it only creates the store derivation without executing its build
actions.

[9]  See for example section 5.5 "Building store derivations", which
describes how store derivations are built in the extensional model (the
intensional model, discussed later in the thesis, has not been
implemented in Nix or Guix).  Figure 5.11, box #79 shows that the
builder is executed in a temporary directory; presumably the builder
writes the component's output directly into the output path (which is in
the store).  You can probably compare this to the actual build algorithm
in the C++ source code to figure out what's really going on.

[10] See sections 5.1 "Cryptographic hashes" and 5.2.2 "Store paths" in
the Nix thesis.  Various kinds of hashes are used in various places;
it's all pretty complicated, but the thesis describes much of the
details and motivation.  There may be other relevant sections, depending
on what exactly you want to know.  Of course, this thesis may be out of
date by now, so the C++ source code is the place to look to find actual
implementation.  But it's helpful to know the intent behind the code.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-18 23:34 ` Chris Marusich
@ 2017-06-19 11:47   ` Ludovic Courtès
  0 siblings, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-06-19 11:47 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel

Heya,

Thanks for the perfect explanation, Chris.  To complement what you wrote:

> More specifically, there are various interpretations of the word
> "derivation", depending on the context.  For example, there are structs
> in the Nix C++ source code that represent derivations [3].  In the Nix
> expression language, derivations are represented as attribute sets [4].
> There are also "store derivations", which are files in the store that
> precisely encode the build actions that will be executed when the
> derivation is built [5].
>
> Some of this carries over to Guix.  The general meaning of the word
> "derivation" appears to be the same as in Nix: a derivation basically
> takes inputs and produces an output.  Guix also uses "store derivations"
> because we use (a modified version of) the Nix daemon.  Note that the
> Guix manual sometimes uses the word "derivation" to refer to "store
> derivations", which can be confusing.

The meaning of “derivation” is the same in Guix as in Nix,
unsurprisingly:

  https://www.gnu.org/software/guix/manual/html_node/Derivations.html

But it’s true that the manual uses the word “derivation” both to denote
the in-memory representation (<derivation> records) and the on-disk
representation (.drv files.)  Most of the time, at the Scheme level, all
you see is <derivation> records.

In the context of reepca’s work, the new daemon in Scheme would be able
to reuse (guix derivations) and <derivation> records directly.

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Prototype register-path
  2017-06-18  9:22 Caleb Ristvedt
  2017-06-18 23:34 ` Chris Marusich
@ 2017-06-19 11:56 ` Ludovic Courtès
  1 sibling, 0 replies; 12+ messages in thread
From: Ludovic Courtès @ 2017-06-19 11:56 UTC (permalink / raw)
  To: Caleb Ristvedt; +Cc: guix-devel

Hello!

Caleb Ristvedt <caleb.ristvedt@cune.org> skribis:

> Well, I took a look at libstore/build.cc, and that is a lot of code. I
> hope I can get away with only reading as much as is necessary to clarify
> details. To that end, I'd like to make sure that my high-level
> understanding of the build process is accurate.
>
> - A derivation is a node in a (acyclic?) graph of "what depends on what" or,
>   alternately put, what is input to what.

Yes, a directed acyclic graph.

> - Derivations can depend on other derivations or plain files (can those files be
>   directories?), where "plain files" cannot depend on anything else.

Right, “plain files” are referred to as “sources” in the code (e.g., the
‘sources’ field of <derivation>).  “Sources” are content-addressable
things added with ‘add-text-to-store’, ‘add-to-store’, and similar.

> - "Building" a derivation involves first ensuring that all of its inputs are
>   "built" (in the case of inputs that are derivations, this means ensuring its
>   output exists as a store item, and if not, producing it by this same
>   process. In the case of plain file inputs, this means just ensuring that they
>   exist as a store item), and then preparing a build environment. The "builder"
>   script is then executed in a chroot rooted under the build tree. After it has
>   completed, the output (where is it put during the build process?) from
>   building the derivation is placed in the store (and registered).

Exactly.

> - "preparing a build environment" involves first creating a directory in which
>   the build will be run, then populating it with certain expected files and
>   directories (as detailed in 2.4.1 of the manual). Also, the store items
>   produced from building the inputs need to somehow be made available (is there
>   a gnu/store/ under the build tree? I feel like I don't quite understand where
>   everything goes in the build tree). Also, set any environment variables
>   associated with the derivation.

Yes.  In the build environment, there’s the subset of /gnu/store that is
explicitly declared as a dependency, directly or indirectly.  For
example, we can see it here:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(guix)
scheme@(guile-user)> ,enter-store-monad
store-monad@(guile-user) [1]> (gexp->derivation "contents"
						#~(begin
						    (use-modules (ice-9 ftw))
						    (call-with-output-file #$output
						      (lambda (port)
							(write (scandir "/gnu/store") port)))))
$2 = #<derivation /gnu/store/8r9l9n9vv8vsnjy1b7w15bp8liz9bcvh-contents.drv => /gnu/store/a1z90bzb7xl6glp033jkxjc1n3w0lw7w-contents 41847d0>
store-monad@(guile-user) [1]> (built-derivations (list $2))
$3 = #t
store-monad@(guile-user) [1]> ,q
scheme@(guile-user)> ,pp (call-with-input-file (derivation->output-path $2) read)
$4 = ("."
 ".."
 "02426nwiy32cscm4h83729vn5ws1gs2i-bash-static-4.4.12"
 "3lsfrwlp1qa345x71yw5w49i2mpp0vxm-guile-2.0.14"
 "6czmriicly50bljbp6p34py5a63wzf05-contents-builder"
 "9f66pvz4vf1d9k8iir6asdp3l8k58cnn-libatomic-ops-7.4.4"
 "a1z90bzb7xl6glp033jkxjc1n3w0lw7w-contents"
 "b7w6bjp602qvhbsjs535dfax8v7wy8s8-gmp-6.1.2"
 "dhc2iy059hi91fk55dcv79z09kp6500y-gcc-5.4.0-lib"
 "dj9w9y66ncmn7qpnahz5fhqjjwqrnbjm-ncurses-6.0"
 "hvyk1qyph1hihfmym1w271ygp84adb0v-readline-7.0"
 "k7029k5va68lkapbzcycdzj7m5bjb4b8-bash-4.4.12"
 "kndl3vllk4bdq1xd3p8k67h8jrsq8xbq-readline-7.0"
 "lcmcm4c0zjv3sa9amdrhaszd7vwwxjh1-libltdl-2.4.6"
 "px46g18zg3sjgndwdcsgm6027w7s5gbc-pkg-config-0.29.1"
 "q1x4v3x8v2g59d244hl7k0i1n4h83c9a-ncurses-6.0"
 "rmjlycdgiq8pfy5hfi42qhw3k7p6kdav-glibc-2.25"
 "v4h4qw8a95479capaq08vs4zdyxdijhv-libunistring-0.9.7"
 "wqx8sxqjvz323vk9xalrhqk5f35yd42f-libffi-3.2.1"
 "yr7m8ldp3n40mrzjax91cj9hjw1k2a58-libgc-7.6.0")
--8<---------------cut here---------------end--------------->8---

What we see above is Guile and its dependencies, along with the output
and its build script.

> Assuming I've got the gist of it sort of right, that leaves one more question:
> how are hashes for the paths computed? It's not sha256 like the database hashes,
> it seems to be 160 bits in size. How are all the inputs hashed together? Are all
> the inputs put in one big directory and that is hashed in nar form, or what? And
> what about store items that don't have any inputs (for example, something placed
> in there with add-text-to-store)? I suppose in that case they are their own
> input, perhaps?

See (guix derivations) for how these hashes are computed, and also
‘store-path’, ‘output-path’, and ‘fixed-output-path’ in (guix store).
Normally you won’t have to reimplement any of these.

> Some of those answers might be in the manual, but I didn't manage to
> find them.

The manual doesn’t tell much about internal APIs.  :-)

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-19 11:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-03  8:47 [PATCH] Prototype register-path Caleb Ristvedt
2017-06-05  8:38 ` Caleb Ristvedt
2017-06-05 20:34   ` Ludovic Courtès
2017-06-06  8:59     ` Caleb Ristvedt
2017-06-08 16:59       ` Ludovic Courtès
  -- strict thread matches above, loose matches on Subject: below --
2017-06-12  5:14 Caleb Ristvedt
2017-06-17 23:05 ` Ludovic Courtès
2017-06-12  5:45 Caleb Ristvedt
2017-06-18  9:22 Caleb Ristvedt
2017-06-18 23:34 ` Chris Marusich
2017-06-19 11:47   ` Ludovic Courtès
2017-06-19 11:56 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).