From mboxrd@z Thu Jan 1 00:00:00 1970 From: Caleb Ristvedt Subject: Re: guix gc takes long in "deleting unused links ..." Date: Fri, 01 Feb 2019 16:22:21 -0600 Message-ID: <87womjcfoi.fsf@cune.org> References: <20190201065332.6d4c9815@alma-ubu> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:55291) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gphCo-00041H-OW for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gphCf-0003al-1U for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:32 -0500 Received: from mail-yb1-xb2b.google.com ([2607:f8b0:4864:20::b2b]:42614) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gphCd-0003Yp-US for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:28 -0500 Received: by mail-yb1-xb2b.google.com with SMTP id l20so3302665ybl.9 for ; Fri, 01 Feb 2019 14:22:27 -0800 (PST) Received: from GuixPotato ([208.89.170.37]) by smtp.gmail.com with ESMTPSA id w1sm3091210ywd.49.2019.02.01.14.22.24 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 01 Feb 2019 14:22:25 -0800 (PST) In-Reply-To: <20190201065332.6d4c9815@alma-ubu> (=?utf-8?Q?=22Bj=C3=B6rn?= =?utf-8?Q?_H=C3=B6fling=22's?= message of "Fri, 1 Feb 2019 06:53:40 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: guix-devel@gnu.org Bj=C3=B6rn H=C3=B6fling writes: > Why does that take soo long? Warning: technical overview follows. It takes so long because after the garbage collection pass it then does a *full* pass over the /gnu/store/.links directory. Which is huge. It contains an entry for every unique file (not just store entry, but everything in those entries, recursively) in the store. The individual work for each entry is low - just a readdir(), lstat() to see if the link is still in use anywhere, and an unlink() if it isn't. But my store, for example, has 998536 entries in there. I got that number with a combination of ls and wc, and it took probably around 4 minutes to get it. Ideally, the reference-counting approach to removing files would work the same as in programming languages: as soon as a reference is removed, check whether the reference count is now 0 (in our case 1, since an entry would still exist in .links). In our case, we'd actually have to check prior to losing the reference whether the count *would become* 1, that is, whether it is currently 2. But unlike in programming languages, we can't just "free a file" (more specifically, an inode). We have to delete the last existing reference, in .links. The only way to find that is by hashing the file prior to deleting it, which could be quite expensive, but for any garbage collection targeting a small subset of store items it would likely still be much faster. A potential fix there would be to augment the store database with a table mapping store paths to hashes (hashes already get computed when store items are registered). Or we could switch between the full-pass and incremental approaches based on characteristics of the request. > Or better: Is it save here to just hit CTRL-C (and let the daemon work > in background, or whatever)? I expect that CTRL-C at that point would cause the guix process to terminate, closing its connection to the daemon. I don't believe the daemon uses asynchronous I/O, so it wouldn't be affected until it tried reading or writing from/to that socket. So yeah, if you do that at that point it would probably work, but you may as well just start it in the background in that case ("guix gc ... &") or put it in the background with CTRL-Z followed by the 'bg' command. - reepca