From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 6HwKNuWL019mBgAA0tVLHw (envelope-from ) for ; Fri, 11 Dec 2020 15:10:29 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id wDnUMeWL019cEwAAB5/wlQ (envelope-from ) for ; Fri, 11 Dec 2020 15:10:29 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5AFBB9405D2 for ; Fri, 11 Dec 2020 15:10:29 +0000 (UTC) Received: from localhost ([::1]:58966 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1knk3w-0004Ln-9E for larch@yhetil.org; Fri, 11 Dec 2020 10:10:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:49850) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1knk3X-0004JF-B5 for bug-guix@gnu.org; Fri, 11 Dec 2020 10:10:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:59275) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1knk3X-0000dC-2i for bug-guix@gnu.org; Fri, 11 Dec 2020 10:10:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1knk3W-0007jT-TE for bug-guix@gnu.org; Fri, 11 Dec 2020 10:10:02 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#44760: [PATCH 03/15] nar: Deduplicate files right as they are restored. Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 11 Dec 2020 15:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44760 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 44760@debbugs.gnu.org Received: via spool by 44760-submit@debbugs.gnu.org id=B44760.160769938729631 (code B ref 44760); Fri, 11 Dec 2020 15:10:02 +0000 Received: (at 44760) by debbugs.gnu.org; 11 Dec 2020 15:09:47 +0000 Received: from localhost ([127.0.0.1]:42572 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1knk3G-0007hY-Ij for submit@debbugs.gnu.org; Fri, 11 Dec 2020 10:09:47 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56558) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1knk3C-0007gT-C9 for 44760@debbugs.gnu.org; Fri, 11 Dec 2020 10:09:42 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]:47530) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1knk37-0000Td-8V; Fri, 11 Dec 2020 10:09:37 -0500 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=59324 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1knk35-0004Tz-TC; Fri, 11 Dec 2020 10:09:36 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Date: Fri, 11 Dec 2020 16:09:13 +0100 Message-Id: <20201211150919.18435-4-ludo@gnu.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201211150919.18435-1-ludo@gnu.org> References: <87h7pkffzy.fsf@inria.fr> <20201211150919.18435-1-ludo@gnu.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: 2.20 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 5AFBB9405D2 X-Spam-Score: 2.20 X-Migadu-Scanner: scn1.migadu.com X-TUID: dC8EYwPH+zzj This avoids having to traverse and re-read the files that we have just restored, thereby reducing I/O. * guix/serialization.scm (dump-file): New procedure. (restore-file): Add #:dump-file parameter and honor it. * guix/store/deduplication.scm (tee, dump-file/deduplicate): New procedures. * guix/nar.scm (restore-one-item): Pass #:dump-file to 'restore-file'. (finalize-store-file): Pass #:deduplicate? #f to 'register-items'. * tests/nar.scm : Call 'setenv' to set "NIX_STORE". --- guix/nar.scm | 12 ++++---- guix/serialization.scm | 27 ++++++++++++----- guix/store/deduplication.scm | 57 +++++++++++++++++++++++++++++++++++- tests/nar.scm | 3 ++ 4 files changed, 85 insertions(+), 14 deletions(-) diff --git a/guix/nar.scm b/guix/nar.scm index edfcc9aab5..ba035ca6dc 100644 --- a/guix/nar.scm +++ b/guix/nar.scm @@ -27,6 +27,7 @@ ;; (guix store) since this is "daemon-side" code. #:use-module (guix store) #:use-module (guix store database) + #:use-module ((guix store deduplication) #:select (dump-file/deduplicate)) #:use-module ((guix build store-copy) #:select (store-info)) #:use-module (guix i18n) @@ -114,12 +115,12 @@ held." ;; Install the new TARGET. (rename-file source target) - ;; Register TARGET. As a side effect, run a deduplication pass. - ;; Timestamps and permissions are already correct thanks to - ;; 'restore-file'. + ;; Register TARGET. The 'restore-file' call took care of + ;; deduplication, timestamps, and permissions. (register-items db (list (store-info target deriver references)) - #:reset-timestamps? #f)) + #:reset-timestamps? #f + #:deduplicate? #f)) (when lock? (delete-file (string-append target ".lock")) @@ -212,7 +213,8 @@ s-expression")) (let-values (((port get-hash) (open-sha256-input-port port))) (with-temporary-store-file temp - (restore-file port temp) + (restore-file port temp + #:dump-file dump-file/deduplicate) (let ((magic (read-int port))) (unless (= magic %export-magic) diff --git a/guix/serialization.scm b/guix/serialization.scm index 677ca60b66..9e2dce8bb0 100644 --- a/guix/serialization.scm +++ b/guix/serialization.scm @@ -457,9 +457,22 @@ depends on TYPE." (&message (message "unsupported nar entry type")) (&nar-read-error (port port) (file file) (token x))))))))) -(define (restore-file port file) +(define (dump-file file input size type) + "Dump SIZE bytes from INPUT to FILE." + (call-with-output-file file + (lambda (output) + (dump input output size)))) + +(define* (restore-file port file + #:key (dump-file dump-file)) "Read a file (possibly a directory structure) in Nar format from PORT. -Restore it as FILE with canonical permissions and timestamps." +Restore it as FILE with canonical permissions and timestamps. To write a +regular or executable file, call: + + (DUMP-FILE FILE INPUT SIZE TYPE) + +The default is to dump SIZE bytes from INPUT to FILE, but callers can provide +a custom procedure, for instance to deduplicate FILE on the fly." (fold-archive (lambda (file type content result) (match type ('directory @@ -473,12 +486,10 @@ Restore it as FILE with canonical permissions and timestamps." ((or 'regular 'executable) (match content ((input . size) - (call-with-output-file file - (lambda (output) - (dump input output size) - (chmod output (if (eq? type 'executable) - #o555 - #o444)))) + (dump-file file input size type) + (chmod file (if (eq? type 'executable) + #o555 + #o444)) (utime file 1 1 0 0)))))) #t port diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm index 0655ceb890..b4d37d4525 100644 --- a/guix/store/deduplication.scm +++ b/guix/store/deduplication.scm @@ -26,12 +26,15 @@ #:use-module (guix build syscalls) #:use-module (guix base32) #:use-module (srfi srfi-11) + #:use-module (srfi srfi-34) + #:use-module (srfi srfi-35) #:use-module (rnrs io ports) #:use-module (ice-9 ftw) #:use-module (ice-9 match) #:use-module (guix serialization) #:export (nar-sha256 - deduplicate)) + deduplicate + dump-file/deduplicate)) ;; XXX: This port is used as a workaround on Guile <= 2.2.4 where ;; 'port-position' throws to 'out-of-range' when the offset is great than or @@ -201,3 +204,55 @@ under STORE." ;; that's OK: we just can't deduplicate it more. #f) (else (apply throw args))))))))))) + +(define (tee input len output) + "Return a port that reads up to LEN bytes from INPUT and writes them to +OUTPUT as it goes." + (define bytes-read 0) + + (define (fail) + ;; Reached EOF before we had read LEN bytes from INPUT. + (raise (condition + (&nar-error (port input) + (file (port-filename output)))))) + + (define (read! bv start count) + ;; Read at most LEN bytes in total. + (let ((count (min count (- len bytes-read)))) + (let loop ((ret (get-bytevector-n! input bv start count))) + (cond ((eof-object? ret) + (if (= bytes-read len) + 0 ; EOF + (fail))) + ((and (zero? ret) (> count 0)) + ;; Do not return zero since zero means EOF, so try again. + (loop (get-bytevector-n! input bv start count))) + (else + (put-bytevector output bv start ret) + (set! bytes-read (+ bytes-read ret)) + ret))))) + + (make-custom-binary-input-port "tee input port" read! #f #f #f)) + +(define* (dump-file/deduplicate file input size type + #:key (store (%store-directory))) + "Write SIZE bytes read from INPUT to FILE. TYPE is a symbol, either +'regular or 'executable. + +This procedure is suitable as a #:dump-file argument to 'restore-file'. When +used that way, it deduplicates files on the fly as they are restored, thereby +removing the need to a deduplication pass that would re-read all the files +down the road." + (define hash + (call-with-output-file file + (lambda (output) + (let-values (((hash-port get-hash) + (open-hash-port (hash-algorithm sha256)))) + (write-file-tree file hash-port + #:file-type+size (lambda (_) (values type size)) + #:file-port + (const (tee input size output))) + (close-port hash-port) + (get-hash))))) + + (deduplicate file hash #:store store)) diff --git a/tests/nar.scm b/tests/nar.scm index 59616659c8..ba4881caaa 100644 --- a/tests/nar.scm +++ b/tests/nar.scm @@ -452,6 +452,9 @@ (false-if-exception (rm-rf %test-dir)) (setlocale LC_ALL locale))))) +;; XXX: Tell the 'deduplicate' procedure what store we're actually using. +(setenv "NIX_STORE" (%store-prefix)) + (test-assert "restore-file-set (signed, valid)" (with-store store (let* ((texts (unfold (cut >= <> 10) -- 2.29.2