unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 24937@debbugs.gnu.org
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Sun, 11 Dec 2016 09:23:38 -0500	[thread overview]
Message-ID: <87twaaa6j9.fsf@netris.org> (raw)
In-Reply-To: <87lgvm4lzu.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sun, 11 Dec 2016 14:46:13 +0100")

ludo@gnu.org (Ludovic Courtès) writes:

> Here’s a proposed patch that follows your suggestion, Mark, but places
> an upper bound on the number of directory entries loaded in memory.
>
> On my laptop, which has ~500k entries in /gnu/store/.links, the result
> is something like this (notice the inode numbers in ‘lstat’ calls):
>
> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0
> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
> [...]
> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0
> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> [...]
> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
> [...]
> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0
> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0
> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0
> [...]
> 13738 getdents(8, {{d_ino=328672,
> [...]
> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0
> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0
>
> I can’t tell exactly how this affects performance because my machine has
> an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
> it has performance comparable to that of doing all the ‘readdir’ at once
> followed by all the ‘lstat’.
>
> Mark, how does that sound?

I think we should sort the entire directory using merge sort backed to
disk files.  If we load chunks of the directory, sort them and process
them individually, I expect that this will increase the amount of I/O
required by a non-trivial factor.  In each pass, we would load blocks of
inodes from disk, almost all of which are likely to be present in the
store and thus linked from the directory, but in this scheme we will
process only a small number of them and drop the rest on the floor to be
read again in the next pass.  Given that even my fairly optimal
implementation takes about 35 minutes to run on Hydra, I'd prefer to
avoid multiplying that by a non-trivial factor.

Why not just use GNU sort?  It already exists, and does exactly what we
need.

If you object to using an external program for some reason, I would
prefer to re-implement a similar algorithm in the daemon.

     Thanks,
       Mark

  reply	other threads:[~2016-12-11 14:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2016-12-09 22:43 ` Ludovic Courtès
2016-12-11 13:46 ` Ludovic Courtès
2016-12-11 14:23   ` Mark H Weaver [this message]
2016-12-11 18:02     ` Ludovic Courtès
2016-12-11 19:27       ` Mark H Weaver
2016-12-13  0:00         ` Ludovic Courtès
2016-12-13 12:48           ` Mark H Weaver
2016-12-13 17:02             ` Ludovic Courtès
2016-12-13 17:18               ` Ricardo Wurmus
2020-04-16 13:26                 ` Ricardo Wurmus
2020-04-16 14:27                   ` Ricardo Wurmus
2020-04-17  8:16                     ` Ludovic Courtès
2020-04-17  8:28                       ` Ricardo Wurmus
2016-12-13  4:09         ` Mark H Weaver
2016-12-15  1:19           ` Mark H Weaver
2021-11-09 14:44 ` Ludovic Courtès
2021-11-09 15:00   ` Ludovic Courtès
2021-11-11 20:59   ` Maxim Cournoyer
2021-11-13 16:56     ` Ludovic Courtès
2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
2021-11-16 13:54           ` bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2021-11-13 21:45       ` Ludovic Courtès
2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87twaaa6j9.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=24937@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).