From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id iKcJK0hW9F7vRgAA0tVLHw (envelope-from ) for ; Thu, 25 Jun 2020 07:46:16 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id qBDfJkhW9F5SPwAA1q6Kng (envelope-from ) for ; Thu, 25 Jun 2020 07:46:16 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E38FD9401CF for ; Thu, 25 Jun 2020 07:46:15 +0000 (UTC) Received: from localhost ([::1]:45812 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1joMaL-0007ll-QN for larch@yhetil.org; Thu, 25 Jun 2020 03:46:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58262) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1joMaA-0007la-41 for guix-patches@gnu.org; Thu, 25 Jun 2020 03:46:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:56350) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1joMa9-0001Oh-QN for guix-patches@gnu.org; Thu, 25 Jun 2020 03:46:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1joMa9-0006nl-OB for guix-patches@gnu.org; Thu, 25 Jun 2020 03:46:01 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#42023] [PATCH] database: register-items: reduce transaction scope. Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 25 Jun 2020 07:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42023 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: Caleb Ristvedt Cc: 42023@debbugs.gnu.org, Christopher Baines Received: via spool by 42023-submit@debbugs.gnu.org id=B42023.159307112826103 (code B ref 42023); Thu, 25 Jun 2020 07:46:01 +0000 Received: (at 42023) by debbugs.gnu.org; 25 Jun 2020 07:45:28 +0000 Received: from localhost ([127.0.0.1]:39663 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1joMZc-0006mx-8l for submit@debbugs.gnu.org; Thu, 25 Jun 2020 03:45:28 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43754) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1joMZX-0006mf-2M for 42023@debbugs.gnu.org; Thu, 25 Jun 2020 03:45:26 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:35253) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1joMZR-0000sj-AL; Thu, 25 Jun 2020 03:45:17 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=33090 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1joMZQ-0003Ss-Aj; Thu, 25 Jun 2020 03:45:16 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20200623163649.32444-1-mail@cbaines.net> <87366lxwcd.fsf@gnu.org> <87v9jg4aiz.fsf@cune.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 8 Messidor an 228 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 25 Jun 2020 09:45:14 +0200 In-Reply-To: <87v9jg4aiz.fsf@cune.org> (Caleb Ristvedt's message of "Wed, 24 Jun 2020 12:51:48 -0500") Message-ID: <87eeq3vbat.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: Lef35kVUAu0y --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Caleb Ristvedt skribis: > Ludovic Court=C3=A8s writes: [...] >> Two questions: >> >> 1. Can another process come and fiddle with TO-REGISTER while we=E2=80= =99re >> still in =E2=80=98reset-timestamps=E2=80=99? Or can GC happen whil= e we=E2=80=99re in >> =E2=80=98reset-timestamps=E2=80=99 and delete TO-REGISTER and remov= e it from the >> database? > >> >> I think none of these scenarios can happen, as long as we=E2=80=99ve tak= en the >> .lock file for TO-REGISTER before, like =E2=80=98finalize-store-file=E2= =80=99 does. > > The lock file has no bearing on liveness of the path it locks, though > the liveness of the path it locks *does* count as liveness for the lock > file and for the .chroot file; see tryToDelete() in nix/libstore/gc.cc. > > (Well, semi-liveness; .lock and .chroot files won't show up in a list of > live paths, but they will still be protected from deletion if their > associated store item is a temp root) > > So general "fiddling" can't happen, but GC can. The responsibility for > both locking and registering as temporary roots falls on the caller, > currently, as I believe it should. We may want to document this > responsibility in the docstring for register-items, though. Yes, it would be good to add a sentence to document it. > So currently finalize-store-file is safe from clobbering by another > process attempting to finalize to the same path, but not safe from > garbage collection between when the temporary store file is renamed and > when it is registered. It needs an add-temp-root prior to renaming. Ah, so we=E2=80=99d need to do that before applying the patch that reduces = the scope of the transaction, right? >> 2. After the transaction, TO-REGISTER is considered valid. But are >> the effects of the on-going deduplication observable, due to >> non-atomicity of some operation? > > Subdirectories of store items need to be made writable prior to renaming > the temp link, so there will necessarily be a window of time during > which various subdirectories will appear writable. I don't think this > should cause problems; we already assume that the .lock file is held, so > we should be the only process attempting to deduplicate it. On a related > note, we should probably use dynamic-wind for setting and restoring the > permissions in replace-with-link. Yes. Here=E2=80=99s a patch for =E2=80=98dynamic-wind=E2=80=99: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/.dir-locals.el b/.dir-locals.el index 92979fc5ed..155255a770 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -37,6 +37,7 @@ (eval . (put 'with-file-lock 'scheme-indent-function 1)) (eval . (put 'with-file-lock/no-wait 'scheme-indent-function 1)) (eval . (put 'with-profile-lock 'scheme-indent-function 1)) + (eval . (put 'with-writable-file 'scheme-indent-function 1)) (eval . (put 'package 'scheme-indent-function 0)) (eval . (put 'origin 'scheme-indent-function 0)) diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm index 6784ee0b92..af52c03370 100644 --- a/guix/store/deduplication.scm +++ b/guix/store/deduplication.scm @@ -94,6 +94,20 @@ LINK-PREFIX." (try (tempname-in link-prefix)) (apply throw args)))))) +(define (call-with-writable-file file thunk) + (let ((stat (lstat file))) + (dynamic-wind + (lambda () + (make-file-writable file)) + thunk + (lambda () + (set-file-time file stat) + (chmod file (stat:mode stat)))))) + +(define-syntax-rule (with-writable-file file exp ...) + "Make FILE writable for the dynamic extent of EXP..." + (call-with-writable-file file (lambda () exp ...))) + ;; There are 3 main kinds of errors we can get from hardlinking: "Too many ;; things link to this" (EMLINK), "this link already exists" (EEXIST), and ;; "can't fit more stuff in this directory" (ENOSPC). @@ -120,20 +134,14 @@ Note: TARGET, TO-REPLACE, and SWAP-DIRECTORY must be on the same file system." ;; If we couldn't create TEMP-LINK, that's OK: just don't do the ;; replacement, which means TO-REPLACE won't be deduplicated. (when temp-link - (let* ((parent (dirname to-replace)) - (stat (stat parent))) - (make-file-writable parent) + (with-writable-file (dirname to-replace) (catch 'system-error (lambda () (rename-file temp-link to-replace)) (lambda args (delete-file temp-link) (unless (= EMLINK (system-error-errno args)) - (apply throw args)))) - - ;; Restore PARENT's mtime and permissions. - (set-file-time parent stat) - (chmod parent (stat:mode stat))))) + (apply throw args))))))) (define* (deduplicate path hash #:key (store %store-directory)) "Check if a store item with sha256 hash HASH already exists. If so, --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Also, replace-with-link doesn't check whether the "containing directory" > is the store like optimisePath_() does, so in theory if another process > tried to make a permanent change to the store's permissions it could be > clobbered when replace-with-link restores the permissions. I don't know > of any instance where this could happen currently, but it's something to > keep in mind. I guess we should also avoid changing permissions on /gnu/store, it would be wiser. > And, of course, there's the inherent visible change of deduplication - > possible modification of inode number, but I don't see how that could > cause problems with any reasonable programs. Yes, that=E2=80=99s fine. >> I think the =E2=80=98replace-with-link=E2=80=99 dance is atomic, so we s= hould be fine. >> >> Thoughts? > > replace-with-link is atomic, but it can fail if the "canonical" link in > .links is deleted by the garbage collector between when its existence is > checked in 'deduplicate' and when temp link creation in > replace-with-link happens. Then ENOENT would be returned, and > 'deduplicate' wouldn't catch that exception, nor would optimisePath_() > in nix/libstore/optimise-store.cc. > > The proper behavior there, in my opinion, would be to retry the > deduplication. Attached is a patch that makes 'deduplicate' do that. > > Also, while I'm perusing optimisePath_(), there's a minor bug in which > EMLINK generated by renaming the temp link may still be treated as a > fatal error. This is because errno may be clobbered by unlink() prior to > being checked (according to the errno man page it can still be modified > even if the call succeeds). Indeed, good catch! > From 461064da9e169df3dd939b734bb2860598d113f4 Mon Sep 17 00:00:00 2001 > From: Caleb Ristvedt > Date: Wed, 24 Jun 2020 00:56:50 -0500 > Subject: [PATCH 1/2] deduplication: retry on ENOENT. > > It's possible for the garbage collector to remove the "canonical" link af= ter > it's been detected as existing by 'deduplicate'. This would cause an ENO= ENT > error when replace-with-link attempts to create the temporary link. This > changes it so that it will properly handle that by retrying. > > * guix/store/deduplication.scm (deduplicate): retry on ENOENT. > --- > guix/store/deduplication.scm | 64 +++++++++++++++++++++++------------- > 1 file changed, 41 insertions(+), 23 deletions(-) > > diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm > index 6784ee0b92..b6d94e49c2 100644 > --- a/guix/store/deduplication.scm > +++ b/guix/store/deduplication.scm > @@ -161,26 +161,44 @@ under STORE." > (scandir* path)) > (let ((link-file (string-append links-directory "/" > (bytevector->nix-base32-string h= ash)))) > - (if (file-exists? link-file) > - (replace-with-link link-file path > - #:swap-directory links-directory) > - (catch 'system-error > - (lambda () > - (link path link-file)) > - (lambda args > - (let ((errno (system-error-errno args))) > - (cond ((=3D errno EEXIST) > - ;; Someone else put an entry for PATH in > - ;; LINKS-DIRECTORY before we could. Let's us= e it. > - (replace-with-link path link-file > - #:swap-directory links-dir= ectory)) > - ((=3D errno ENOSPC) > - ;; There's not enough room in the directory i= ndex for > - ;; more entries in .links, but that's fine: w= e can > - ;; just stop. > - #f) > - ((=3D errno EMLINK) > - ;; PATH has reached the maximum number of lin= ks, but > - ;; that's OK: we just can't deduplicate it mo= re. > - #f) > - (else (apply throw args))))))))))) > + (let retry () > + (if (file-exists? link-file) > + (catch 'system-error > + (lambda () > + (replace-with-link link-file path > + #:swap-directory links-directory)) > + (lambda args > + (if (and (=3D (system-error-errno args) ENOENT) > + ;; ENOENT can be produced because: > + ;; - LINK-FILE has missing directory compon= ents > + ;; - LINKS-DIRECTORY has missing directory > + ;; components > + ;; - PATH has missing directory components > + ;; > + ;; the last two are errors, the first just > + ;; requires a retry. > + (file-exists? (dirname path)) > + (file-exists? links-directory)) > + (retry) > + (apply throw args)))) I feel like there are TOCTTOU issues here and redundant =E2=80=98stat=E2=80= =99 calls, plus the risk of catching a =E2=80=98system-error=E2=80=99 coming from =E2= =80=9Csomewhere else.=E2=80=9D What about baking this logic directly in =E2=80=98replace-with-link=E2=80= =99, and replacing =E2=80=98file-exists?=E2=80=99 calls by 'system-error handling? > From a14ff0a9dab4d73680befaf9d244d6cce0a73ab3 Mon Sep 17 00:00:00 2001 > From: Caleb Ristvedt > Date: Wed, 24 Jun 2020 01:00:40 -0500 > Subject: [PATCH 2/2] .dir-locals.el: fix call-with-{retrying-}transaction > indenting. > > * .dir-locals.el (call-with-transaction, call-with-retrying-transaction): > change scheme-indent-function property from 2 to 1. LGTM! Thanks for your quick feedback and thorough analyses! Ludo=E2=80=99. --=-=-=--