unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
To: guix-devel@gnu.org
Subject: Re: guix gc takes long in "deleting unused links ..."
Date: Fri, 01 Feb 2019 16:22:21 -0600	[thread overview]
Message-ID: <87womjcfoi.fsf@cune.org> (raw)
In-Reply-To: <20190201065332.6d4c9815@alma-ubu> ("Björn Höfling"'s message of "Fri, 1 Feb 2019 06:53:40 +0100")

Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:

> Why does that take soo long?

Warning: technical overview follows.

It takes so long because after the garbage collection pass it then does
a *full* pass over the /gnu/store/.links directory. Which is huge. It
contains an entry for every unique file (not just store entry, but
everything in those entries, recursively) in the store. The individual
work for each entry is low - just a readdir(), lstat() to see if the
link is still in use anywhere, and an unlink() if it isn't. But my
store, for example, has 998536 entries in there. I got that number with
a combination of ls and wc, and it took probably around 4 minutes to get
it.

Ideally, the reference-counting approach to removing files would work
the same as in programming languages: as soon as a reference is removed,
check whether the reference count is now 0 (in our case 1, since an
entry would still exist in .links). In our case, we'd actually have to
check prior to losing the reference whether the count *would become* 1,
that is, whether it is currently 2. But unlike in programming languages,
we can't just "free a file" (more specifically, an inode). We have to
delete the last existing reference, in .links. The only way to find that
is by hashing the file prior to deleting it, which could be quite
expensive, but for any garbage collection targeting a small subset of
store items it would likely still be much faster. A potential fix there
would be to augment the store database with a table mapping store paths
to hashes (hashes already get computed when store items are
registered). Or we could switch between the full-pass and incremental
approaches based on characteristics of the request.

> Or better: Is it save here to just hit CTRL-C (and let the daemon work
> in background, or whatever)?

I expect that CTRL-C at that point would cause the guix process to
terminate, closing its connection to the daemon. I don't believe the
daemon uses asynchronous I/O, so it wouldn't be affected until it tried
reading or writing from/to that socket. So yeah, if you do that at that
point it would probably work, but you may as well just start it in the
background in that case ("guix gc ... &") or put it in the background
with CTRL-Z followed by the 'bg' command.

- reepca

  reply	other threads:[~2019-02-01 22:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-01  5:53 guix gc takes long in "deleting unused links ..." Björn Höfling
2019-02-01 22:22 ` Caleb Ristvedt [this message]
2019-02-02  6:25   ` Björn Höfling
2019-02-02 10:38   ` Giovanni Biscuolo
2019-02-04 21:11   ` Ludovic Courtès
2019-02-06 21:32     ` Caleb Ristvedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87womjcfoi.fsf@cune.org \
    --to=caleb.ristvedt@cune.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).