From mboxrd@z Thu Jan  1 00:00:00 1970
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Subject: Re: guix gc takes long in "deleting unused links ..."
Date: Fri, 01 Feb 2019 16:22:21 -0600
Message-ID: <87womjcfoi.fsf@cune.org>
References: <20190201065332.6d4c9815@alma-ubu>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([209.51.188.92]:55291)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <caleb.ristvedt@cune.org>) id 1gphCo-00041H-OW
	for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:39 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <caleb.ristvedt@cune.org>) id 1gphCf-0003al-1U
	for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:32 -0500
Received: from mail-yb1-xb2b.google.com ([2607:f8b0:4864:20::b2b]:42614)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <caleb.ristvedt@cune.org>)
	id 1gphCd-0003Yp-US
	for guix-devel@gnu.org; Fri, 01 Feb 2019 17:22:28 -0500
Received: by mail-yb1-xb2b.google.com with SMTP id l20so3302665ybl.9
	for <guix-devel@gnu.org>; Fri, 01 Feb 2019 14:22:27 -0800 (PST)
Received: from GuixPotato ([208.89.170.37])
	by smtp.gmail.com with ESMTPSA id w1sm3091210ywd.49.2019.02.01.14.22.24
	for <guix-devel@gnu.org>
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Fri, 01 Feb 2019 14:22:25 -0800 (PST)
In-Reply-To: <20190201065332.6d4c9815@alma-ubu> (=?utf-8?Q?=22Bj=C3=B6rn?=
	=?utf-8?Q?_H=C3=B6fling=22's?= message of
	"Fri, 1 Feb 2019 06:53:40 +0100")
List-Id: "Development of GNU Guix and the GNU System distribution."
	<guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel/>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
Sender: "Guix-devel" <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
To: guix-devel@gnu.org

Bj=C3=B6rn H=C3=B6fling <bjoern.hoefling@bjoernhoefling.de> writes:

> Why does that take soo long?

Warning: technical overview follows.

It takes so long because after the garbage collection pass it then does
a *full* pass over the /gnu/store/.links directory. Which is huge. It
contains an entry for every unique file (not just store entry, but
everything in those entries, recursively) in the store. The individual
work for each entry is low - just a readdir(), lstat() to see if the
link is still in use anywhere, and an unlink() if it isn't. But my
store, for example, has 998536 entries in there. I got that number with
a combination of ls and wc, and it took probably around 4 minutes to get
it.

Ideally, the reference-counting approach to removing files would work
the same as in programming languages: as soon as a reference is removed,
check whether the reference count is now 0 (in our case 1, since an
entry would still exist in .links). In our case, we'd actually have to
check prior to losing the reference whether the count *would become* 1,
that is, whether it is currently 2. But unlike in programming languages,
we can't just "free a file" (more specifically, an inode). We have to
delete the last existing reference, in .links. The only way to find that
is by hashing the file prior to deleting it, which could be quite
expensive, but for any garbage collection targeting a small subset of
store items it would likely still be much faster. A potential fix there
would be to augment the store database with a table mapping store paths
to hashes (hashes already get computed when store items are
registered). Or we could switch between the full-pass and incremental
approaches based on characteristics of the request.

> Or better: Is it save here to just hit CTRL-C (and let the daemon work
> in background, or whatever)?

I expect that CTRL-C at that point would cause the guix process to
terminate, closing its connection to the daemon. I don't believe the
daemon uses asynchronous I/O, so it wouldn't be affected until it tried
reading or writing from/to that socket. So yeah, if you do that at that
point it would probably work, but you may as well just start it in the
background in that case ("guix gc ... &") or put it in the background
with CTRL-Z followed by the 'bg' command.

- reepca