unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization
@ 2020-12-15  9:38 Ludovic Courtès
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:38 UTC (permalink / raw)
  To: 45253

Hello Guix!

This is a followup to <https://issues.guix.gnu.org/45018>.  It is
meant to be applied on top of <https://issues.guix.gnu.org/44760>.

Until now, guix-daemon would check the hash of store items just
substituted, reset timestamps/permissions, and deduplicate.  This
would lead to extra I/O: the whole set of files is traversed three
times by the daemon and read two times.

This patch series is about delegating that work to ‘guix substitute’,
which it can do directly as it restores file, thereby reducing I/O
to the minimum necessary.

I tested with substitutes that contain many files:

  guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
    emacs-no-x emacs-no-x-toolkit

On my laptop with an SSD, the wall-clock time is almost unchanged
when fetching lzip substitutes.  You can see that the throughput
displayed while downloading is slightly lower than before, which
is consistent because lzip downloads are CPU-bound¹, but this is
compensated by the lack of processing time between substitutes.
With gzip substitutes, I see a 10% speedup on the wall-clock time
on my laptop.

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html

Ludovic Courtès (6):
  tests: Check the build trace for hash mismatches on substitutes.
  daemon: Let 'guix substitute' perform hash checks.
  tests: Check the mtime and permissions of substituted items.
  daemon: Do not reset timestamps and permissions on substituted items.
  tests: Make sure substituted items are deduplicated.
  daemon: Delegate deduplication to 'guix substitute'.

 guix/scripts/substitute.scm | 70 +++++++++++++++++++++++++-----
 guix/serialization.scm      |  8 +++-
 nix/libstore/build.cc       | 85 ++++++++++++++++++++-----------------
 tests/store.scm             | 82 +++++++++++++++++++++++++++++++++++
 tests/substitute.scm        | 58 ++++++++++++++++++++++---
 5 files changed, 248 insertions(+), 55 deletions(-)

-- 
2.29.2





^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes.
  2020-12-15  9:38 [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
@ 2020-12-15  9:57 ` Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 2/6] daemon: Let 'guix substitute' perform hash checks Ludovic Courtès
                     ` (4 more replies)
  2020-12-15 11:43 ` [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
  2020-12-19 23:04 ` bug#45253: " Ludovic Courtès
  2 siblings, 5 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

* tests/store.scm ("substitute, corrupt output hash, build trace"): New
test.
---
 tests/store.scm | 55 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/tests/store.scm b/tests/store.scm
index 38051bf5e5..7f1ec51875 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -787,6 +787,61 @@
                (build-derivations s (list d))
                #f))))))
 
+(test-assert "substitute, corrupt output hash, build trace"
+  ;; Likewise, and check the build trace.
+  (with-store s
+    (let* ((c   "hello, world")                   ; contents of the output
+           (d   (build-expression->derivation
+                 s "corrupt-substitute"
+                 `(mkdir %output)
+                 #:guile-for-build
+                 (package-derivation s %bootstrap-guile (%current-system))))
+           (o   (derivation->output-path d)))
+      ;; Make sure we use 'guix substitute'.
+      (set-build-options s
+                         #:print-build-trace #t
+                         #:use-substitutes? #t
+                         #:fallback? #f
+                         #:substitute-urls (%test-substitute-urls))
+
+      (with-derivation-substitute d c
+        (sha256 => (make-bytevector 32 0)) ;select a hash that doesn't match C
+
+        (define output
+          (call-with-output-string
+            (lambda (port)
+              (parameterize ((current-build-output-port port))
+                (guard (c ((store-protocol-error? c) #t))
+                  (build-derivations s (list d))
+                  #f)))))
+
+        (define actual-hash
+          (let-values (((port get-hash)
+                        (gcrypt:open-hash-port
+                         (gcrypt:hash-algorithm gcrypt:sha256))))
+            (write-file-tree "foo" port
+                             #:file-type+size
+                             (lambda _
+                               (values 'regular (string-length c)))
+                             #:file-port
+                             (lambda _
+                               (open-input-string c)))
+            (close-port port)
+            (bytevector->nix-base32-string (get-hash))))
+
+        (define expected-hash
+          (bytevector->nix-base32-string (make-bytevector 32 0)))
+
+        (define mismatch
+          (string-append "@ hash-mismatch " o " sha256 "
+                         expected-hash " " actual-hash "\n"))
+
+        (define failure
+          (string-append "@ substituter-failed " o))
+
+        (and (string-contains output mismatch)
+             (string-contains output failure))))))
+
 (test-assert "substitute --fallback"
   (with-store s
     (let* ((t   (random-text))                    ; contents of the output
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 2/6] daemon: Let 'guix substitute' perform hash checks.
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
@ 2020-12-15  9:57   ` Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 3/6] tests: Check the mtime and permissions of substituted items Ludovic Courtès
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

This way, the hash of the store item can be computed as it is restored,
thereby avoiding an additional file tree traversal ('hashPath' call)
later on in the daemon.  Consequently, it should reduce latency between
subsequent substitute downloads.

This is a followup to 5ff521452b9ec2aae9ed8e4bb7bdc250a581f203.

* guix/scripts/substitute.scm (narinfo-hash-algorithm+value): New
procedure.
(process-substitution): Wrap INPUT into a hash input port, 'hashed', and
read from it.  Compare the actual and expected hashes, and print a
"hash-mismatch" status line when they differ.  When they match, print
not just "success" but also the nar hash and size.
* nix/libstore/build.cc (class SubstitutionGoal)[expectedHashStr]:
Remove.
(SubstitutionGoal::finished): Tokenize 'status'.  Parse it and handle
"success" and "hash-mismatch" accordingly.  Call 'hashPath' only when
the returned hash is not SHA256.
(SubstitutionGoal::handleChildOutput): Remove 'expectedHashStr'
handling.
* tests/substitute.scm ("substitute, invalid hash"): Rename to...
("substitute, invalid narinfo hash"): ... this.
("substitute, invalid hash"): New test.
---
 guix/scripts/substitute.scm | 45 +++++++++++++++++++----
 nix/libstore/build.cc       | 73 ++++++++++++++++++++-----------------
 tests/substitute.scm        | 52 ++++++++++++++++++++++++--
 3 files changed, 125 insertions(+), 45 deletions(-)

diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 25075eedff..17d0002b9f 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -26,6 +26,8 @@
   #:use-module (guix combinators)
   #:use-module (guix config)
   #:use-module (guix records)
+  #:use-module (guix diagnostics)
+  #:use-module (guix i18n)
   #:use-module ((guix serialization) #:select (restore-file))
   #:autoload   (guix scripts discover) (read-substitute-urls)
   #:use-module (gcrypt hash)
@@ -256,6 +258,18 @@ connection (typically PORT) is kept open once data has been fetched from URI."
   ;; for more information.
   (contents     narinfo-contents))
 
+(define (narinfo-hash-algorithm+value narinfo)
+  "Return two values: the hash algorithm used by NARINFO and its value as a
+bytevector."
+  (match (string-tokenize (narinfo-hash narinfo)
+                          (char-set-complement (char-set #\:)))
+    ((algorithm base32)
+     (values (lookup-hash-algorithm (string->symbol algorithm))
+             (nix-base32-string->bytevector base32)))
+    (_
+     (raise (formatted-message
+             (G_ "invalid narinfo hash: ~s") (narinfo-hash narinfo))))))
+
 (define (narinfo-hash->sha256 hash)
   "If the string HASH denotes a sha256 hash, return it as a bytevector.
 Otherwise return #f."
@@ -1033,7 +1047,9 @@ one.  Return #f if URI's scheme is 'file' or #f."
 (define* (process-substitution store-item destination
                                #:key cache-urls acl print-build-trace?)
   "Substitute STORE-ITEM (a store file name) from CACHE-URLS, and write it to
-DESTINATION as a nar file.  Verify the substitute against ACL."
+DESTINATION as a nar file.  Verify the substitute against ACL, and verify its
+hash against what appears in the narinfo.  Print a status line on the current
+output port."
   (define narinfo
     (lookup-narinfo cache-urls store-item
                     (cut valid-narinfo? <> acl)))
@@ -1044,9 +1060,6 @@ DESTINATION as a nar file.  Verify the substitute against ACL."
 
   (let-values (((uri compression file-size)
                 (narinfo-best-uri narinfo)))
-    ;; Tell the daemon what the expected hash of the Nar itself is.
-    (format #t "~a~%" (narinfo-hash narinfo))
-
     (unless print-build-trace?
       (format (current-error-port)
               (G_ "Downloading ~a...~%") (uri->string uri)))
@@ -1079,9 +1092,16 @@ DESTINATION as a nar file.  Verify the substitute against ACL."
                    ;; closed here, while the child process doing the
                    ;; reporting will close it upon exit.
                    (decompressed-port (string->symbol compression)
-                                      progress)))
+                                      progress))
+
+                  ;; Compute the actual nar hash as we read it.
+                  ((algorithm expected)
+                   (narinfo-hash-algorithm+value narinfo))
+                  ((hashed get-hash)
+                   (open-hash-input-port algorithm input)))
       ;; Unpack the Nar at INPUT into DESTINATION.
-      (restore-file input destination)
+      (restore-file hashed destination)
+      (close-port hashed)
       (close-port input)
 
       ;; Wait for the reporter to finish.
@@ -1091,8 +1111,17 @@ DESTINATION as a nar file.  Verify the substitute against ACL."
       ;; one to visually separate substitutions.
       (display "\n\n" (current-error-port))
 
-      ;; Tell the daemon that we're done.
-      (display "success\n" (current-output-port)))))
+      ;; Check whether we got the data announced in NARINFO.
+      (let ((actual (get-hash)))
+        (if (bytevector=? actual expected)
+            ;; Tell the daemon that we're done.
+            (format (current-output-port) "success ~a ~a~%"
+                    (narinfo-hash narinfo) (narinfo-size narinfo))
+            ;; The actual data has a different hash than that in NARINFO.
+            (format (current-output-port) "hash-mismatch ~a ~a ~a~%"
+                    (hash-algorithm-name algorithm)
+                    (bytevector->nix-base32-string expected)
+                    (bytevector->nix-base32-string actual)))))))
 
 \f
 ;;;
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index b5551b87ae..b19471a68f 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -2790,10 +2790,6 @@ private:
     /* The substituter. */
     std::shared_ptr<Agent> substituter;
 
-    /* Either the empty string, or the expected hash as returned by the
-       substituter.  */
-    string expectedHashStr;
-
     /* Either the empty string, or the status phrase returned by the
        substituter.  */
     string status;
@@ -3032,36 +3028,47 @@ void SubstitutionGoal::finished()
     /* Check the exit status and the build result. */
     HashResult hash;
     try {
-
-        if (status != "success")
-            throw SubstError(format("fetching path `%1%' (status: '%2%')")
-                % storePath % status);
-
-        if (!pathExists(destPath))
-            throw SubstError(format("substitute did not produce path `%1%'") % destPath);
-
-	if (expectedHashStr == "")
-	    throw SubstError(format("substituter did not communicate hash for `%1'") % storePath);
-
-        hash = hashPath(htSHA256, destPath);
-
-        /* Verify the expected hash we got from the substituer. */
-	size_t n = expectedHashStr.find(':');
-	if (n == string::npos)
-	    throw Error(format("bad hash from substituter: %1%") % expectedHashStr);
-	HashType hashType = parseHashType(string(expectedHashStr, 0, n));
-	if (hashType == htUnknown)
-	    throw Error(format("unknown hash algorithm in `%1%'") % expectedHashStr);
-	Hash expectedHash = parseHash16or32(hashType, string(expectedHashStr, n + 1));
-	Hash actualHash = hashType == htSHA256 ? hash.first : hashPath(hashType, destPath).first;
-	if (expectedHash != actualHash) {
-	    if (settings.printBuildTrace)
+	auto statusList = tokenizeString<vector<string> >(status);
+
+	if (statusList.empty()) {
+            throw SubstError(format("fetching path `%1%' (empty status: '%2%')")
+			     % storePath % status);
+	} else if (statusList[0] == "hash-mismatch") {
+	    if (settings.printBuildTrace) {
+		auto hashType = statusList[1];
+		auto expectedHash = statusList[2];
+		auto actualHash = statusList[3];
 		printMsg(lvlError, format("@ hash-mismatch %1% %2% %3% %4%")
-			 % storePath % "sha256"
-			 % printHash16or32(expectedHash)
-			 % printHash16or32(actualHash));
+			 % storePath
+			 % hashType % expectedHash % actualHash);
+	    }
 	    throw SubstError(format("hash mismatch for substituted item `%1%'") % storePath);
+	} else if (statusList[0] == "success") {
+	    if (!pathExists(destPath))
+		throw SubstError(format("substitute did not produce path `%1%'") % destPath);
+
+	    std::string hashStr = statusList[1];
+	    size_t n = hashStr.find(':');
+	    if (n == string::npos)
+		throw Error(format("bad hash from substituter: %1%") % hashStr);
+
+	    HashType hashType = parseHashType(string(hashStr, 0, n));
+	    switch (hashType) {
+	    case htUnknown:
+		throw Error(format("unknown hash algorithm in `%1%'") % hashStr);
+	    case htSHA256:
+		hash.first = parseHash16or32(hashType, string(hashStr, n + 1));
+		hash.second = std::atoi(statusList[2].c_str());
+		break;
+	    default:
+		/* The database only stores SHA256 hashes, so compute it.  */
+		hash = hashPath(htSHA256, destPath);
+		break;
+	    }
 	}
+	else
+            throw SubstError(format("fetching path `%1%' (status: '%2%')")
+                % storePath % status);
 
     } catch (SubstError & e) {
 
@@ -3123,9 +3130,7 @@ void SubstitutionGoal::handleChildOutput(int fd, const string & data)
 	    string trimmed = (end != string::npos) ? input.substr(0, end) : input;
 
 	    /* Update the goal's state accordingly.  */
-	    if (expectedHashStr == "") {
-		expectedHashStr = trimmed;
-	    } else if (status == "") {
+	    if (status == "") {
 		status = trimmed;
 		worker.wakeUp(shared_from_this());
 	    } else {
diff --git a/tests/substitute.scm b/tests/substitute.scm
index b86ce09425..241c55a1f8 100644
--- a/tests/substitute.scm
+++ b/tests/substitute.scm
@@ -28,7 +28,9 @@
   #:use-module (guix base32)
   #:use-module ((guix store) #:select (%store-prefix))
   #:use-module ((guix ui) #:select (guix-warning-port))
-  #:use-module ((guix utils) #:select (call-with-compressed-output-port))
+  #:use-module ((guix utils)
+                #:select (call-with-temporary-directory
+                          call-with-compressed-output-port))
   #:use-module ((guix build utils)
                 #:select (mkdir-p delete-file-recursively dump-port))
   #:use-module (guix tests http)
@@ -36,6 +38,7 @@
   #:use-module (rnrs io ports)
   #:use-module (web uri)
   #:use-module (ice-9 regex)
+  #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-34)
   #:use-module (srfi srfi-35)
@@ -304,7 +307,7 @@ System: mips64el-linux\n")
       (lambda ()
         (guix-substitute "--substitute")))))
 
-(test-quit "substitute, invalid hash"
+(test-quit "substitute, invalid narinfo hash"
     "no valid substitute"
   ;; The hash in the signature differs from the hash of %NARINFO.
   (with-narinfo (string-append %narinfo "Signature: "
@@ -317,6 +320,49 @@ System: mips64el-linux\n")
       (lambda ()
         (guix-substitute "--substitute")))))
 
+(test-equal "substitute, invalid hash"
+  (string-append "hash-mismatch sha256 "
+                 (bytevector->nix-base32-string (sha256 #vu8())) " "
+                 (let-values (((port get-hash)
+                               (open-hash-port (hash-algorithm sha256)))
+                              ((content)
+                               "Substitutable data."))
+                   (write-file-tree "foo" port
+                                    #:file-type+size
+                                    (lambda _
+                                      (values 'regular
+                                              (string-length content)))
+                                    #:file-port
+                                    (lambda _
+                                      (open-input-string content)))
+                   (close-port port)
+                   (bytevector->nix-base32-string (get-hash)))
+                 "\n")
+
+  ;; Arrange so the actual data hash does not match the 'NarHash' field in the
+  ;; narinfo.
+  (with-output-to-string
+    (lambda ()
+      (let ((narinfo (string-append "StorePath: " (%store-prefix)
+                                    "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash
+URL: example.nar
+Compression: none
+NarHash: sha256:" (bytevector->nix-base32-string (sha256 #vu8())) "
+NarSize: 42
+References: 
+Deriver: " (%store-prefix) "/foo.drv
+System: mips64el-linux\n")))
+        (with-narinfo (string-append narinfo "Signature: "
+                                     (signature-field narinfo) "\n")
+          (call-with-temporary-directory
+           (lambda (directory)
+             (with-input-from-string (string-append
+                                      "substitute " (%store-prefix)
+                                      "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash "
+                                      directory "/wrong-hash\n")
+               (lambda ()
+                 (guix-substitute "--substitute"))))))))))
+
 (test-quit "substitute, unauthorized key"
     "no valid substitute"
   (with-narinfo (string-append %narinfo "Signature: "
@@ -326,7 +372,7 @@ System: mips64el-linux\n")
                                "\n")
     (with-input-from-string (string-append "substitute "
                                            (%store-prefix)
-                                           "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo"
+                                           "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash"
                                            " foo\n")
       (lambda ()
         (guix-substitute "--substitute")))))
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 3/6] tests: Check the mtime and permissions of substituted items.
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 2/6] daemon: Let 'guix substitute' perform hash checks Ludovic Courtès
@ 2020-12-15  9:57   ` Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 4/6] daemon: Do not reset timestamps and permissions on " Ludovic Courtès
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

* tests/store.scm ("substitute")
("substitute + build-things with output path")
("substitute + build-things with specific output"): Call 'canonical-file?'.
* tests/substitute.scm ("substitute, authorized key"): Check the mtime
and permissions of "substitute-retrieved".
---
 tests/store.scm      | 3 +++
 tests/substitute.scm | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/store.scm b/tests/store.scm
index 7f1ec51875..4dc125bcb9 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -715,6 +715,7 @@
                            #:substitute-urls (%test-substitute-urls))
         (and (has-substitutes? s o)
              (build-derivations s (list d))
+             (canonical-file? o)
              (equal? c (call-with-input-file o get-string-all)))))))
 
 (test-assert "substitute + build-things with output path"
@@ -735,6 +736,7 @@
         (and (has-substitutes? s o)
              (build-things s (list o))            ;give the output path
              (valid-path? s o)
+             (canonical-file? o)
              (equal? c (call-with-input-file o get-string-all)))))))
 
 (test-assert "substitute + build-things with specific output"
@@ -755,6 +757,7 @@
              (build-things s `((,(derivation-file-name d) . "out")))
 
              (valid-path? s o)
+             (canonical-file? o)
              (equal? c (call-with-input-file o get-string-all)))))))
 
 (test-assert "substitute, corrupt output hash"
diff --git a/tests/substitute.scm b/tests/substitute.scm
index 241c55a1f8..4f55a14957 100644
--- a/tests/substitute.scm
+++ b/tests/substitute.scm
@@ -378,7 +378,7 @@ System: mips64el-linux\n")))
         (guix-substitute "--substitute")))))
 
 (test-equal "substitute, authorized key"
-  "Substitutable data."
+  '("Substitutable data." 1 #o444)
   (with-narinfo (string-append %narinfo "Signature: "
                                (signature-field %narinfo))
     (dynamic-wind
@@ -387,7 +387,9 @@ System: mips64el-linux\n")))
         (request-substitution (string-append (%store-prefix)
                                              "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo")
                               "substitute-retrieved")
-        (call-with-input-file "substitute-retrieved" get-string-all))
+        (list (call-with-input-file "substitute-retrieved" get-string-all)
+              (stat:mtime (lstat "substitute-retrieved"))
+              (stat:perms (lstat "substitute-retrieved"))))
       (lambda ()
         (false-if-exception (delete-file "substitute-retrieved"))))))
 
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 4/6] daemon: Do not reset timestamps and permissions on substituted items.
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 2/6] daemon: Let 'guix substitute' perform hash checks Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 3/6] tests: Check the mtime and permissions of substituted items Ludovic Courtès
@ 2020-12-15  9:57   ` Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 5/6] tests: Make sure substituted items are deduplicated Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 6/6] daemon: Delegate deduplication to 'guix substitute' Ludovic Courtès
  4 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

'guix substitute' now takes care of it via 'restore-file'.

* nix/libstore/build.cc (SubstitutionGoal::finished): Remove call to
'canonicalisePathMetaData'.
---
 nix/libstore/build.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index b19471a68f..ea809c6971 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -3085,7 +3085,8 @@ void SubstitutionGoal::finished()
 
     if (repair) replaceValidPath(storePath, destPath);
 
-    canonicalisePathMetaData(storePath, -1);
+    /* Note: 'guix substitute' takes care of resetting timestamps and
+       permissions on 'destPath', so no need to do it here.  */
 
     worker.store.optimisePath(storePath); // FIXME: combine with hashPath()
 
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 5/6] tests: Make sure substituted items are deduplicated.
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
                     ` (2 preceding siblings ...)
  2020-12-15  9:57   ` [bug#45253] [PATCH 4/6] daemon: Do not reset timestamps and permissions on " Ludovic Courtès
@ 2020-12-15  9:57   ` Ludovic Courtès
  2020-12-15  9:57   ` [bug#45253] [PATCH 6/6] daemon: Delegate deduplication to 'guix substitute' Ludovic Courtès
  4 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

* tests/store.scm ("substitute, deduplication"): New test.
---
 tests/store.scm | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tests/store.scm b/tests/store.scm
index 4dc125bcb9..c9a08ac690 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -718,6 +718,30 @@
              (canonical-file? o)
              (equal? c (call-with-input-file o get-string-all)))))))
 
+(test-assert "substitute, deduplication"
+  (with-store s
+    (let* ((c   (random-text))                     ; contents of the output
+           (g   (package-derivation s %bootstrap-guile))
+           (d1  (build-expression->derivation s "substitute-me"
+                                              `(begin ,c (exit 1))
+                                              #:guile-for-build g))
+           (d2  (build-expression->derivation s "build-me"
+                                              `(call-with-output-file %output
+                                                 (lambda (p)
+                                                   (display ,c p)))
+                                              #:guile-for-build g))
+           (o1  (derivation->output-path d1))
+           (o2  (derivation->output-path d2)))
+      (with-derivation-substitute d1 c
+        (set-build-options s #:use-substitutes? #t
+                           #:substitute-urls (%test-substitute-urls))
+        (and (has-substitutes? s o1)
+             (build-derivations s (list d2))      ;build
+             (build-derivations s (list d1))      ;substitute
+             (canonical-file? o1)
+             (equal? c (call-with-input-file o1 get-string-all))
+             (= (stat:ino (stat o1)) (stat:ino (stat o2))))))))
+
 (test-assert "substitute + build-things with output path"
   (with-store s
     (let* ((c   (random-text))                    ;contents of the output
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 6/6] daemon: Delegate deduplication to 'guix substitute'.
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
                     ` (3 preceding siblings ...)
  2020-12-15  9:57   ` [bug#45253] [PATCH 5/6] tests: Make sure substituted items are deduplicated Ludovic Courtès
@ 2020-12-15  9:57   ` Ludovic Courtès
  4 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15  9:57 UTC (permalink / raw)
  To: 45253

This removes the main source of latency between subsequent downloads.

* nix/libstore/build.cc (SubstitutionGoal::tryToRun): Add a
"deduplicate" key to ENV.
(SubstitutionGoal::finished): Remove call to 'optimisePath'.
* guix/scripts/substitute.scm (process-substitution)[destination-in-store?]
[dump-file/deduplicate*]: New variables.
Pass #:dump-file to 'restore-file'.
* guix/scripts/substitute.scm (guix-substitute)[deduplicate?]: New
variable.
Pass #:deduplicate? to 'process-substitution'.
* guix/serialization.scm (dump-file): Export and augment 'dump-file'.
---
 guix/scripts/substitute.scm | 31 ++++++++++++++++++++++++++-----
 guix/serialization.scm      |  8 ++++++--
 nix/libstore/build.cc       | 13 ++++++++-----
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 17d0002b9f..38702d0c4b 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -28,7 +28,8 @@
   #:use-module (guix records)
   #:use-module (guix diagnostics)
   #:use-module (guix i18n)
-  #:use-module ((guix serialization) #:select (restore-file))
+  #:use-module ((guix serialization) #:select (restore-file dump-file))
+  #:autoload   (guix store deduplication) (dump-file/deduplicate)
   #:autoload   (guix scripts discover) (read-substitute-urls)
   #:use-module (gcrypt hash)
   #:use-module (guix base32)
@@ -1045,15 +1046,27 @@ one.  Return #f if URI's scheme is 'file' or #f."
   (call-with-cached-connection uri (lambda (port) exp ...)))
 
 (define* (process-substitution store-item destination
-                               #:key cache-urls acl print-build-trace?)
+                               #:key cache-urls acl
+                               deduplicate? print-build-trace?)
   "Substitute STORE-ITEM (a store file name) from CACHE-URLS, and write it to
 DESTINATION as a nar file.  Verify the substitute against ACL, and verify its
-hash against what appears in the narinfo.  Print a status line on the current
-output port."
+hash against what appears in the narinfo.  When DEDUPLICATE? is true, and if
+DESTINATION is in the store, deduplicate its files.  Print a status line on
+the current output port."
   (define narinfo
     (lookup-narinfo cache-urls store-item
                     (cut valid-narinfo? <> acl)))
 
+  (define destination-in-store?
+    (string-prefix? (string-append (%store-prefix) "/")
+                    destination))
+
+  (define (dump-file/deduplicate* . args)
+    ;; Make sure deduplication looks at the right store (necessary in test
+    ;; environments).
+    (apply dump-file/deduplicate
+           (append args (list #:store (%store-prefix)))))
+
   (unless narinfo
     (leave (G_ "no valid substitute for '~a'~%")
            store-item))
@@ -1100,7 +1113,11 @@ output port."
                   ((hashed get-hash)
                    (open-hash-input-port algorithm input)))
       ;; Unpack the Nar at INPUT into DESTINATION.
-      (restore-file hashed destination)
+      (restore-file hashed destination
+                    #:dump-file (if (and destination-in-store?
+                                         deduplicate?)
+                                    dump-file/deduplicate*
+                                    dump-file))
       (close-port hashed)
       (close-port input)
 
@@ -1248,6 +1265,9 @@ default value."
       ((= string->number number) (> number 0))
       (_ #f)))
 
+  (define deduplicate?
+    (find-daemon-option "deduplicate"))
+
   ;; The daemon's agent code opens file descriptor 4 for us and this is where
   ;; stderr should go.
   (parameterize ((current-error-port (if (%error-to-file-descriptor-4?)
@@ -1307,6 +1327,7 @@ default value."
                  (process-substitution store-path destination
                                        #:cache-urls (substitute-urls)
                                        #:acl (current-acl)
+                                       #:deduplicate? deduplicate?
                                        #:print-build-trace?
                                        print-build-trace?)
                  (loop))))))
diff --git a/guix/serialization.scm b/guix/serialization.scm
index 9e2dce8bb0..59cd93fb18 100644
--- a/guix/serialization.scm
+++ b/guix/serialization.scm
@@ -51,7 +51,8 @@
             write-file
             write-file-tree
             fold-archive
-            restore-file))
+            restore-file
+            dump-file))
 
 ;;; Comment:
 ;;;
@@ -458,7 +459,10 @@ depends on TYPE."
            (&nar-read-error (port port) (file file) (token x)))))))))
 
 (define (dump-file file input size type)
-  "Dump SIZE bytes from INPUT to FILE."
+  "Dump SIZE bytes from INPUT to FILE.
+
+This procedure is suitable for use as the #:dump-file argument to
+'restore-file'."
   (call-with-output-file file
     (lambda (output)
       (dump input output size))))
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index ea809c6971..20d83fea4a 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -2984,7 +2984,12 @@ void SubstitutionGoal::tryToRun()
 
     if (!worker.substituter) {
 	const Strings args = { "substitute", "--substitute" };
-	const std::map<string, string> env = { { "_NIX_OPTIONS", settings.pack() } };
+	const std::map<string, string> env = {
+	    { "_NIX_OPTIONS",
+	      settings.pack() + "deduplicate="
+	      + (settings.autoOptimiseStore ? "yes" : "no")
+	    }
+	};
 	worker.substituter = std::make_shared<Agent>(settings.guixProgram, args, env);
     }
 
@@ -3085,10 +3090,8 @@ void SubstitutionGoal::finished()
 
     if (repair) replaceValidPath(storePath, destPath);
 
-    /* Note: 'guix substitute' takes care of resetting timestamps and
-       permissions on 'destPath', so no need to do it here.  */
-
-    worker.store.optimisePath(storePath); // FIXME: combine with hashPath()
+    /* Note: 'guix substitute' takes care of resetting timestamps and of
+       deduplicating 'destPath', so no need to do it here.  */
 
     ValidPathInfo info2;
     info2.path = storePath;
-- 
2.29.2





^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization
  2020-12-15  9:38 [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
@ 2020-12-15 11:43 ` Ludovic Courtès
  2020-12-19  2:07   ` Maxim Cournoyer
  2020-12-19 23:04 ` bug#45253: " Ludovic Courtès
  2 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-15 11:43 UTC (permalink / raw)
  To: 45253

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> I tested with substitutes that contain many files:
>
>   guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
>     emacs-no-x emacs-no-x-toolkit
>
> On my laptop with an SSD, the wall-clock time is almost unchanged
> when fetching lzip substitutes.  You can see that the throughput
> displayed while downloading is slightly lower than before, which
> is consistent because lzip downloads are CPU-bound¹, but this is
> compensated by the lack of processing time between substitutes.
> With gzip substitutes, I see a 10% speedup on the wall-clock time
> on my laptop.

Picture!  First the timechart with the current daemon (gzip
substitutes, downloading from the LAN):


[-- Attachment #2: daemon checking hashes by itself --]
[-- Type: image/png, Size: 91708 bytes --]

[-- Attachment #3: Type: text/plain, Size: 246 bytes --]


Notice how guix-daemon is busy in between substitute downloads: that’s
the time it takes to compute the nar hash of the store item, reset its
timestamps, and deduplicate its files.

Now the same operation after with this patch series:


[-- Attachment #4: daemon delegating to guix substitute --]
[-- Type: image/png, Size: 98207 bytes --]

[-- Attachment #5: Type: text/plain, Size: 158 bytes --]


This time guix-daemon remains idle all along whereas ‘guix substitute’
is almost 100% busy.  There’s no pause time between substitutes.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization
  2020-12-15 11:43 ` [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
@ 2020-12-19  2:07   ` Maxim Cournoyer
  0 siblings, 0 replies; 10+ messages in thread
From: Maxim Cournoyer @ 2020-12-19  2:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 45253

Hey Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tested with substitutes that contain many files:
>>
>>   guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
>>     emacs-no-x emacs-no-x-toolkit
>>
>> On my laptop with an SSD, the wall-clock time is almost unchanged
>> when fetching lzip substitutes.  You can see that the throughput
>> displayed while downloading is slightly lower than before, which
>> is consistent because lzip downloads are CPU-bound¹, but this is
>> compensated by the lack of processing time between substitutes.
>> With gzip substitutes, I see a 10% speedup on the wall-clock time
>> on my laptop.
>
> Picture!  First the timechart with the current daemon (gzip
> substitutes, downloading from the LAN):
>
>
>
>
> Notice how guix-daemon is busy in between substitute downloads: that’s
> the time it takes to compute the nar hash of the store item, reset its
> timestamps, and deduplicate its files.
>
> Now the same operation after with this patch series:
>
>
>
>
> This time guix-daemon remains idle all along whereas ‘guix substitute’
> is almost 100% busy.  There’s no pause time between substitutes.
>
> Ludo’.

Very nice!  Thanks for this work!  I can't wait to try it.

Maxim




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#45253: [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization
  2020-12-15  9:38 [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
  2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
  2020-12-15 11:43 ` [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
@ 2020-12-19 23:04 ` Ludovic Courtès
  2 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-19 23:04 UTC (permalink / raw)
  To: 45253-done

Ludovic Courtès <ludo@gnu.org> skribis:

>   tests: Check the build trace for hash mismatches on substitutes.
>   daemon: Let 'guix substitute' perform hash checks.
>   tests: Check the mtime and permissions of substituted items.
>   daemon: Do not reset timestamps and permissions on substituted items.
>   tests: Make sure substituted items are deduplicated.
>   daemon: Delegate deduplication to 'guix substitute'.

Pushed as c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa.

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-19 23:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15  9:38 [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
2020-12-15  9:57 ` [bug#45253] [PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes Ludovic Courtès
2020-12-15  9:57   ` [bug#45253] [PATCH 2/6] daemon: Let 'guix substitute' perform hash checks Ludovic Courtès
2020-12-15  9:57   ` [bug#45253] [PATCH 3/6] tests: Check the mtime and permissions of substituted items Ludovic Courtès
2020-12-15  9:57   ` [bug#45253] [PATCH 4/6] daemon: Do not reset timestamps and permissions on " Ludovic Courtès
2020-12-15  9:57   ` [bug#45253] [PATCH 5/6] tests: Make sure substituted items are deduplicated Ludovic Courtès
2020-12-15  9:57   ` [bug#45253] [PATCH 6/6] daemon: Delegate deduplication to 'guix substitute' Ludovic Courtès
2020-12-15 11:43 ` [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization Ludovic Courtès
2020-12-19  2:07   ` Maxim Cournoyer
2020-12-19 23:04 ` bug#45253: " Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).