bug#24937: "deleting unused links" GC phase is too slow

unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed

* bug#24937: "deleting unused links" GC phase is too slow
@ 2016-11-13 17:41 Ludovic Courtès
  2016-12-09 22:43 ` Ludovic Courtès
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Ludovic Courtès @ 2016-11-13 17:41 UTC (permalink / raw)
  To: 24937

‘LocalStore::removeUnusedLinks’ traverses all the entries in
/gnu/store/.links and calls lstat(2) on each one of them and checks
‘st_nlink’ to determine whether they can be deleted.

There are two problems: lstat(2) can be slow on spinning disks as found
on hydra.gnu.org, and the algorithm is proportional in the number of
entries in /gnu/store/.links, which is a lot on hydra.gnu.org.

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
@ 2016-12-09 22:43 ` Ludovic Courtès
  2016-12-11 13:46 ` Ludovic Courtès
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2016-12-09 22:43 UTC (permalink / raw)
  To: 24937

ludo@gnu.org (Ludovic Courtès) skribis:

> ‘LocalStore::removeUnusedLinks’ traverses all the entries in
> /gnu/store/.links and calls lstat(2) on each one of them and checks
> ‘st_nlink’ to determine whether they can be deleted.
>
> There are two problems: lstat(2) can be slow on spinning disks as found
> on hydra.gnu.org, and the algorithm is proportional in the number of
> entries in /gnu/store/.links, which is a lot on hydra.gnu.org.

On Dec. 2 on guix-sysadmin@gnu.org, Mark described an improvement that
noticeably improved performance:

  The idea is to read the entire /gnu/store/.links directory, sort the
  entries by inode number, and then iterate over the entries by inode
  number, calling 'lstat' on each one and deleting the ones with a link
  count of 1.

  The reason this is so much faster is because the inodes are stored on
  disk in order of inode number, so this leads to a sequential access
  pattern on disk instead of a random access pattern.

  The difficulty is that the directory is too large to comfortably store
  all of the entries in virtual memory.  Instead, the entries should be
  written to temporary files on disk, and then sorted using merge sort to
  ensure sequential access patterns during sorting.  Fortunately, this is
  exactly what 'sort' does from GNU coreutils.

  So, for now, I've implemented this as a pair of small C programs that is
  used in a pipeline with GNU sort.  The first program simply reads a
  directory and writes lines of the form "<inode> <name>" to stdout.
  (Unfortunately, "ls -i" calls stat on each entry, so it can't be used).
  This is piped through 'sort -n' and then into another small C program
  that reads these lines, calls 'lstat' on each one, and deletes the
  non-directories with link count 1.

Regarding memory usage, I replied:

  Really?

  For each entry, we have to store roughly 70 bytes for the file name (or
  52 if we consider only the basename), plus 8 bytes for the inode number;
  let’s say 64 bytes.

  If we have 10 M entries, that’s 700 MB (or 520 MB), which is a lot, but
  maybe acceptable?

  At worst, we may still see an improvement if we proceed by batches: we
  read 10000 directory entries (7 MB), sort them, and stat them, then read
  the next 10000 entries.  WDYT?

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
  2016-12-09 22:43 ` Ludovic Courtès
@ 2016-12-11 13:46 ` Ludovic Courtès
  2016-12-11 14:23   ` Mark H Weaver
  2021-11-09 14:44 ` Ludovic Courtès
  2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix
  3 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2016-12-11 13:46 UTC (permalink / raw)
  To: 24937

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]

Hello!

Here’s a proposed patch that follows your suggestion, Mark, but places
an upper bound on the number of directory entries loaded in memory.

On my laptop, which has ~500k entries in /gnu/store/.links, the result
is something like this (notice the inode numbers in ‘lstat’ calls):

--8<---------------cut here---------------start------------->8---
13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0
13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
[...]
13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0
13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
[...]
13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
[...]
13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0
13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0
13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0
[...]
13738 getdents(8, {{d_ino=328672,
[...]
13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0
13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0
--8<---------------cut here---------------end--------------->8---

I can’t tell exactly how this affects performance because my machine has
an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
it has performance comparable to that of doing all the ‘readdir’ at once
followed by all the ‘lstat’.

Mark, how does that sound?

I’d like to commit it soon if there are no objections.

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3133 bytes --]

diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
index 72eff52..db58603 100644
--- a/nix/libstore/gc.cc
+++ b/nix/libstore/gc.cc
@@ -545,6 +545,9 @@ void LocalStore::tryToDelete(GCState & state, const Path & path)
 }
 
 
+/* Like 'dirent', but with just what we need.  */
+typedef std::pair<Path, ino_t> MiniDirEntry;
+
 /* Unlink all files in /nix/store/.links that have a link count of 1,
    which indicates that there are no other links and so they can be
    safely deleted.  FIXME: race condition with optimisePath(): we
@@ -555,32 +558,57 @@ void LocalStore::removeUnusedLinks(const GCState & state)
     AutoCloseDir dir = opendir(linksDir.c_str());
     if (!dir) throw SysError(format("opening directory `%1%'") % linksDir);
 
+    /* Maximum number of entries stored in memory; each 'MiniDirEntry' takes
+       ~80 bytes.  */
+    const size_t maxEntries = 100000;
+
     long long actualSize = 0, unsharedSize = 0;
 
-    struct dirent * dirent;
-    while (errno = 0, dirent = readdir(dir)) {
-        checkInterrupt();
-        string name = dirent->d_name;
-        if (name == "." || name == "..") continue;
-        Path path = linksDir + "/" + name;
-
-        struct stat st;
-        if (lstat(path.c_str(), &st) == -1)
-            throw SysError(format("statting `%1%'") % path);
-
-        if (st.st_nlink != 1) {
-            unsigned long long size = st.st_blocks * 512ULL;
-            actualSize += size;
-            unsharedSize += (st.st_nlink - 1) * size;
-            continue;
-        }
-
-        printMsg(lvlTalkative, format("deleting unused link `%1%'") % path);
-
-        if (unlink(path.c_str()) == -1)
-            throw SysError(format("deleting `%1%'") % path);
-
-        state.results.bytesFreed += st.st_blocks * 512;
+    bool endOfDir = false;
+
+    while (!endOfDir) {
+	/* Read as many entries as possible at once, up to 'maxEntries'.  */
+	std::list<MiniDirEntry> entries;
+	struct dirent * dirent;
+	while (errno = 0,
+	       (entries.size() < maxEntries) && (dirent = readdir(dir))) {
+	    checkInterrupt();
+	    string name = dirent->d_name;
+	    if (name == "." || name == "..") continue;
+	    entries.emplace_back(MiniDirEntry(name, dirent->d_ino));
+	}
+
+	endOfDir = (dirent == NULL);
+
+	/* Sort entries by inode number to minimize disk seeks induced by the
+	   following 'lstat' calls.  */
+	entries.sort([] (const MiniDirEntry& e1, const MiniDirEntry& e2) {
+		return e1.second < e2.second;
+	    });
+
+	for (auto && item: entries) {
+	    checkInterrupt();
+
+	    Path path = linksDir + "/" + item.first;
+
+	    struct stat st;
+	    if (lstat(path.c_str(), &st) == -1)
+		throw SysError(format("statting `%1%'") % path);
+
+	    if (st.st_nlink != 1) {
+		unsigned long long size = st.st_blocks * 512ULL;
+		actualSize += size;
+		unsharedSize += (st.st_nlink - 1) * size;
+		continue;
+	    }
+
x+	    printMsg(lvlTalkative, format("deleting unused link `%1%'") % path);
+
+	    if (unlink(path.c_str()) == -1)
+		throw SysError(format("deleting `%1%'") % path);
+
+	    state.results.bytesFreed += st.st_blocks * 512;
+	}
     }
 
     struct stat st;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-11 13:46 ` Ludovic Courtès
@ 2016-12-11 14:23   ` Mark H Weaver
  2016-12-11 18:02     ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2016-12-11 14:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

ludo@gnu.org (Ludovic Courtès) writes:

> Here’s a proposed patch that follows your suggestion, Mark, but places
> an upper bound on the number of directory entries loaded in memory.
>
> On my laptop, which has ~500k entries in /gnu/store/.links, the result
> is something like this (notice the inode numbers in ‘lstat’ calls):
>
> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0
> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
> [...]
> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0
> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> [...]
> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
> [...]
> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0
> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0
> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0
> [...]
> 13738 getdents(8, {{d_ino=328672,
> [...]
> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0
> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0
>
> I can’t tell exactly how this affects performance because my machine has
> an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
> it has performance comparable to that of doing all the ‘readdir’ at once
> followed by all the ‘lstat’.
>
> Mark, how does that sound?

I think we should sort the entire directory using merge sort backed to
disk files.  If we load chunks of the directory, sort them and process
them individually, I expect that this will increase the amount of I/O
required by a non-trivial factor.  In each pass, we would load blocks of
inodes from disk, almost all of which are likely to be present in the
store and thus linked from the directory, but in this scheme we will
process only a small number of them and drop the rest on the floor to be
read again in the next pass.  Given that even my fairly optimal
implementation takes about 35 minutes to run on Hydra, I'd prefer to
avoid multiplying that by a non-trivial factor.

Why not just use GNU sort?  It already exists, and does exactly what we
need.

If you object to using an external program for some reason, I would
prefer to re-implement a similar algorithm in the daemon.

     Thanks,
       Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-11 14:23   ` Mark H Weaver
@ 2016-12-11 18:02     ` Ludovic Courtès
  2016-12-11 19:27       ` Mark H Weaver
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2016-12-11 18:02 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 24937

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Here’s a proposed patch that follows your suggestion, Mark, but places
>> an upper bound on the number of directory entries loaded in memory.
>>
>> On my laptop, which has ~500k entries in /gnu/store/.links, the result
>> is something like this (notice the inode numbers in ‘lstat’ calls):
>>
>> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
>> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
>> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0
>> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
>> [...]
>> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
>> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0
>> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
>> [...]
>> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
>> [...]
>> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0
>> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0
>> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0
>> [...]
>> 13738 getdents(8, {{d_ino=328672,
>> [...]
>> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0
>> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0
>> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0
>>
>> I can’t tell exactly how this affects performance because my machine has
>> an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
>> it has performance comparable to that of doing all the ‘readdir’ at once
>> followed by all the ‘lstat’.
>>
>> Mark, how does that sound?
>
> I think we should sort the entire directory using merge sort backed to
> disk files.  If we load chunks of the directory, sort them and process
> them individually, I expect that this will increase the amount of I/O
> required by a non-trivial factor.  In each pass, we would load blocks of
> inodes from disk, almost all of which are likely to be present in the
> store and thus linked from the directory, but in this scheme we will
> process only a small number of them and drop the rest on the floor to be
> read again in the next pass.  Given that even my fairly optimal
> implementation takes about 35 minutes to run on Hydra, I'd prefer to
> avoid multiplying that by a non-trivial factor.

Sure, though it’s not obvious to me how much of a difference it makes;
my guess is that processing in large chunks is already a win, but we’d
have to measure.

> Why not just use GNU sort?  It already exists, and does exactly what we
> need.

Does ‘sort’ manage to avoid reading whole files in memory?  If not, I
think it wouldn’t buy us anything to use it.

> If you object to using an external program for some reason, I would
> prefer to re-implement a similar algorithm in the daemon.

Yeah, I’d rather avoid serializing the list of file names/inode number
pairs just to invoke ‘sort’ on that.

Also, what algorithm are you referring to?

Thanks for the quick feedback!

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-11 18:02     ` Ludovic Courtès
@ 2016-12-11 19:27       ` Mark H Weaver
  2016-12-13  0:00         ` Ludovic Courtès
  2016-12-13  4:09         ` Mark H Weaver
  0 siblings, 2 replies; 25+ messages in thread
From: Mark H Weaver @ 2016-12-11 19:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> I think we should sort the entire directory using merge sort backed to
>> disk files.  If we load chunks of the directory, sort them and process
>> them individually, I expect that this will increase the amount of I/O
>> required by a non-trivial factor.  In each pass, we would load blocks of
>> inodes from disk, almost all of which are likely to be present in the
>> store and thus linked from the directory, but in this scheme we will
>> process only a small number of them and drop the rest on the floor to be
>> read again in the next pass.  Given that even my fairly optimal
>> implementation takes about 35 minutes to run on Hydra, I'd prefer to
>> avoid multiplying that by a non-trivial factor.
>
> Sure, though it’s not obvious to me how much of a difference it makes;
> my guess is that processing in large chunks is already a win, but we’d
> have to measure.

I agree, it would surely be a win.  Given that it currently takes on the
order of a day to run this phase on Hydra, if your proposed method takes
2 hours, that would be a huge win, but still not good, IMO.  Even 35
minutes is slower than I'd like.

>> Why not just use GNU sort?  It already exists, and does exactly what we
>> need.
>
> Does ‘sort’ manage to avoid reading whole files in memory?

Yes, it does.  I monitored the 'sort' process when I first ran my
optimized pipeline.  It created about 10 files in /tmp, approximately 70
megabytes each as I recall, and then read them all concurrently while
writing the sorted output.

My guess is that it reads a manageable chunk of the input, sorts it in
memory, and writes it to a temporary file.  I guess it repeats this
process, writing multiple temporary files, until the entire input is
consumed, and then reads all of those temporary files, merging them
together into the output stream.

>> If you object to using an external program for some reason, I would
>> prefer to re-implement a similar algorithm in the daemon.
>
> Yeah, I’d rather avoid serializing the list of file names/inode number
> pairs just to invoke ‘sort’ on that.

Sure, I agree that it would be better to avoid that, but IMO not at the
cost of using O(N) memory instead of O(1) memory, nor at the cost of
multiplying the amount of disk I/O by a non-trivial factor.

> Also, what algorithm are you referring to?

The algorithm I described above, which I guess is close to what GNU sort
does.

    Thanks,
      Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-11 19:27       ` Mark H Weaver
@ 2016-12-13  0:00         ` Ludovic Courtès
  2016-12-13 12:48           ` Mark H Weaver
  2016-12-13  4:09         ` Mark H Weaver
  1 sibling, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2016-12-13  0:00 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 24937

[-- Attachment #1: Type: text/plain, Size: 4071 bytes --]

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Mark H Weaver <mhw@netris.org> skribis:
>>
>>> I think we should sort the entire directory using merge sort backed to
>>> disk files.  If we load chunks of the directory, sort them and process
>>> them individually, I expect that this will increase the amount of I/O
>>> required by a non-trivial factor.  In each pass, we would load blocks of
>>> inodes from disk, almost all of which are likely to be present in the
>>> store and thus linked from the directory, but in this scheme we will
>>> process only a small number of them and drop the rest on the floor to be
>>> read again in the next pass.  Given that even my fairly optimal
>>> implementation takes about 35 minutes to run on Hydra, I'd prefer to
>>> avoid multiplying that by a non-trivial factor.
>>
>> Sure, though it’s not obvious to me how much of a difference it makes;
>> my guess is that processing in large chunks is already a win, but we’d
>> have to measure.
>
> I agree, it would surely be a win.  Given that it currently takes on the
> order of a day to run this phase on Hydra, if your proposed method takes
> 2 hours, that would be a huge win, but still not good, IMO.  Even 35
> minutes is slower than I'd like.

Of course.

I did some measurements with the attached program on chapters, which is
a Xen VM with spinning disks underneath, similar to hydra.gnu.org.  It
has 600k entries in /gnu/store/.links.

Here’s a comparison of the “optimal” mode (bulk stats after we’ve
fetched all the dirents) vs. the “semi-interleaved” mode (doing bulk
stats every 100,000 dirents):

--8<---------------cut here---------------start------------->8---
ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c  -DMODE=3
ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
ludo@guix:~$ time ./a.out
603858 dir_entries, 157 seconds
stat took 1 seconds

real    2m38.508s
user    0m0.324s
sys     0m1.824s
ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c  -DMODE=2
ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
ludo@guix:~$ time ./a.out 
3852 dir_entries, 172 seconds (including stat)

real    2m51.827s
user    0m0.312s
sys     0m1.808s
--8<---------------cut here---------------end--------------->8---

Semi-interleaved is ~12% slower here (not sure how reproducible that is
though).

>>> Why not just use GNU sort?  It already exists, and does exactly what we
>>> need.
>>
>> Does ‘sort’ manage to avoid reading whole files in memory?
>
> Yes, it does.  I monitored the 'sort' process when I first ran my
> optimized pipeline.  It created about 10 files in /tmp, approximately 70
> megabytes each as I recall, and then read them all concurrently while
> writing the sorted output.
>
> My guess is that it reads a manageable chunk of the input, sorts it in
> memory, and writes it to a temporary file.  I guess it repeats this
> process, writing multiple temporary files, until the entire input is
> consumed, and then reads all of those temporary files, merging them
> together into the output stream.

OK.  That seems to be that the comment above ‘sortlines’ in sort.c
describes.

>>> If you object to using an external program for some reason, I would
>>> prefer to re-implement a similar algorithm in the daemon.
>>
>> Yeah, I’d rather avoid serializing the list of file names/inode number
>> pairs just to invoke ‘sort’ on that.
>
> Sure, I agree that it would be better to avoid that, but IMO not at the
> cost of using O(N) memory instead of O(1) memory, nor at the cost of
> multiplying the amount of disk I/O by a non-trivial factor.

Understood.

sort.c in Coreutils is very big, and we surely don’t want to duplicate
all that.  Yet, I’d rather not shell out to ‘sort’.

Do you know how many entries are in .links on hydra.gnu.org?  If it
performs comparably to chapters, the timings suggests it should have
around 10.5M entries.

Thanks!

Ludo’.


[-- Attachment #2: the code --]
[-- Type: text/plain, Size: 2014 bytes --]

#include <unistd.h>
#include <dirent.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include <string.h>
#include <sys/stat.h>
#include <assert.h>

#define STAT_INTERLEAVED 1
#define STAT_SEMI_INTERLEAVED 2
#define STAT_OPTIMAL 3

struct entry
{
  char *name;
  ino_t inode;
};

#define MAX_ENTRIES 1000000
static struct entry dir_entries[MAX_ENTRIES];

int
main ()
{
  struct timeval start, end;

  /* For useful timings, do:
     sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'  */
  gettimeofday (&start, NULL);
  DIR *links = opendir ("/gnu/store/.links");

  size_t count = 0;

#if MODE != STAT_INTERLEAVED
  void sort_entries (void)
  {
    int entry_lower (const void *a, const void *b)
    {
      return ((struct entry *)a)->inode < ((struct entry *)b)->inode;
    }

    qsort (dir_entries, count, sizeof (struct entry),
	   entry_lower);
  }
#endif

  void stat_entries (void)
  {
    for (size_t i = 0; i < count; i++)
      {
	struct stat st;
	lstat (dir_entries[i].name, &st);
      }
  }

  for (struct dirent *entry = readdir (links);
       entry != NULL;
       entry = readdir (links))
    {
      assert (count < MAX_ENTRIES);
      dir_entries[count].name = strdup (entry->d_name);
      dir_entries[count].inode = entry->d_ino;
#if MODE == STAT_INTERLEAVED
      struct stat st;
      lstat (entry->d_name, &st);
#endif

#if MODE == STAT_SEMI_INTERLEAVED
      if (count++ >= 100000)
	{
	  sort_entries ();
	  stat_entries ();
	  count = 0;
	}
#else
      count++;
#endif
    }

#if MODE == STAT_SEMI_INTERLEAVED
  sort_entries ();
  stat_entries ();
#endif

  gettimeofday (&end, NULL);
  printf ("%zi dir_entries, %zi seconds"
#if MODE != STAT_OPTIMAL
	  " (including stat)"
#endif
	  "\n", count,
	  end.tv_sec - start.tv_sec);

#if MODE == STAT_OPTIMAL
  sort_entries ();
  gettimeofday (&start, NULL);
  stat_entries ();
  gettimeofday (&end, NULL);

  printf ("stat took %zi seconds\n", end.tv_sec - start.tv_sec);
#endif

  return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-11 19:27       ` Mark H Weaver
  2016-12-13  0:00         ` Ludovic Courtès
@ 2016-12-13  4:09         ` Mark H Weaver
  2016-12-15  1:19           ` Mark H Weaver
  1 sibling, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2016-12-13  4:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

Do as you wish.  I don't have time to continue discussing this.

     Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-13  0:00         ` Ludovic Courtès
@ 2016-12-13 12:48           ` Mark H Weaver
  2016-12-13 17:02             ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Mark H Weaver @ 2016-12-13 12:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

ludo@gnu.org (Ludovic Courtès) writes:

> I did some measurements with the attached program on chapters, which is
> a Xen VM with spinning disks underneath, similar to hydra.gnu.org.  It
> has 600k entries in /gnu/store/.links.

I just want to point out that 600k inodes use 150 megabytes of disk
space on ext4, which is small enough to fit in the cache, so the disk
I/O will not be multiplied for such a small test case.

> Here’s a comparison of the “optimal” mode (bulk stats after we’ve
> fetched all the dirents) vs. the “semi-interleaved” mode (doing bulk
> stats every 100,000 dirents):
>
> ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c  -DMODE=3
> ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
> ludo@guix:~$ time ./a.out
> 603858 dir_entries, 157 seconds
> stat took 1 seconds
>
> real    2m38.508s
> user    0m0.324s
> sys     0m1.824s
> ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c  -DMODE=2
> ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
> ludo@guix:~$ time ./a.out 
> 3852 dir_entries, 172 seconds (including stat)
>
> real    2m51.827s
> user    0m0.312s
> sys     0m1.808s
>
> Semi-interleaved is ~12% slower here (not sure how reproducible that is
> though).

This directory you're testing on is more than an order of magnitude
smaller than Hydra's when it's full.  Unlike in your test above, all of
the inodes in Hydra's store won't fit in the cache.

In my opinion, the reason Hydra performs so poorly is because efficiency
and scalability are apparently very low priorities in the design of the
software running on it.  Unfortunately, I feel that my advice in this
area is discarded more often than not.

>>>> Why not just use GNU sort?  It already exists, and does exactly what we
>>>> need.
>>>
>>> Does ‘sort’ manage to avoid reading whole files in memory?
>>
>> Yes, it does.  I monitored the 'sort' process when I first ran my
>> optimized pipeline.  It created about 10 files in /tmp, approximately 70
>> megabytes each as I recall, and then read them all concurrently while
>> writing the sorted output.
>>
>> My guess is that it reads a manageable chunk of the input, sorts it in
>> memory, and writes it to a temporary file.  I guess it repeats this
>> process, writing multiple temporary files, until the entire input is
>> consumed, and then reads all of those temporary files, merging them
>> together into the output stream.
>
> OK.  That seems to be that the comment above ‘sortlines’ in sort.c
> describes.

Also, see <https://en.wikipedia.org/wiki/External_sorting>.  This is a
well-studied problem with a long history.

>>>> If you object to using an external program for some reason, I would
>>>> prefer to re-implement a similar algorithm in the daemon.
>>>
>>> Yeah, I’d rather avoid serializing the list of file names/inode number
>>> pairs just to invoke ‘sort’ on that.

I'm fairly sure that the overhead of serializing the file names and
inode numbers is *far* less than the overhead you would add by iterating
over the inodes in multiple passes.

>> Sure, I agree that it would be better to avoid that, but IMO not at the
>> cost of using O(N) memory instead of O(1) memory, nor at the cost of
>> multiplying the amount of disk I/O by a non-trivial factor.
>
> Understood.
>
> sort.c in Coreutils is very big, and we surely don’t want to duplicate
> all that.  Yet, I’d rather not shell out to ‘sort’.

The "shell" would not be involved here at all, just the "sort" program.
I guess you dislike launching external processes?  Can you explain why?

Guix-daemon launches external processes for building derivations, so why
is using one for garbage collection a problem?  Emacs, a program that
you cite in your talks as having many qualities that we seek to emulate,
does not shy away from using external programs.

> Do you know how many entries are in .links on hydra.gnu.org?

"df -i /gnu" indicates that it currently has about 5.5M inodes, but
that's with only 29% of the disk in use.  A few days ago, when the disk
was full, assuming that the average file size is the same, it may have
had closer to 5.5M / 0.29 ~= 19M inodes, which is over 30 times as many
as used in your measurements above.  On ext4, which uses 256-byte
inodes, that's about 5 gigabytes of inodes.

      Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-13 12:48           ` Mark H Weaver
@ 2016-12-13 17:02             ` Ludovic Courtès
  2016-12-13 17:18               ` Ricardo Wurmus
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2016-12-13 17:02 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 24937

Hello Mark,

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> I did some measurements with the attached program on chapters, which is
>> a Xen VM with spinning disks underneath, similar to hydra.gnu.org.  It
>> has 600k entries in /gnu/store/.links.
>
> I just want to point out that 600k inodes use 150 megabytes of disk
> space on ext4, which is small enough to fit in the cache, so the disk
> I/O will not be multiplied for such a small test case.

Right.  That’s the only spinning-disk machine I could access without
problem.  :-/

Ricardo, Roel: would you be able to run that links-traversal.c from
<https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>
on a machine with a big store, as described at
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?

>> Semi-interleaved is ~12% slower here (not sure how reproducible that is
>> though).
>
> This directory you're testing on is more than an order of magnitude
> smaller than Hydra's when it's full.  Unlike in your test above, all of
> the inodes in Hydra's store won't fit in the cache.

Good point.  I’m trying my best to get performance figures, there’s no
doubt we could do better!

> In my opinion, the reason Hydra performs so poorly is because efficiency
> and scalability are apparently very low priorities in the design of the
> software running on it.  Unfortunately, I feel that my advice in this
> area is discarded more often than not.

Well, as you know, I’m currently traveling, yet I take the time to
answer your email at night; I think this should suggest that far from
discarding your advice, I very much value it.

I’m a maintainer though, so I’m trying to understand the problem better.
It’s not just about finding the “optimal” solution, but also about
finding a tradeoff between the benefits and the maintainability costs.

>> sort.c in Coreutils is very big, and we surely don’t want to duplicate
>> all that.  Yet, I’d rather not shell out to ‘sort’.
>
> The "shell" would not be involved here at all, just the "sort" program.
> I guess you dislike launching external processes?  Can you explain why?

I find that passing strings around among programs is inelegant
(subjective), but I don’t think you’re really looking to argue about
that, are you?  :-)

It remains that, if invoking ‘sort’ appears to be preferable *both* from
performance and maintenance viewpoints, then it’s a good choice.  That
may be the case, but again, I prefer to have figures to back that.

>> Do you know how many entries are in .links on hydra.gnu.org?
>
> "df -i /gnu" indicates that it currently has about 5.5M inodes, but
> that's with only 29% of the disk in use.  A few days ago, when the disk
> was full, assuming that the average file size is the same, it may have
> had closer to 5.5M / 0.29 ~= 19M inodes,

OK, good to know.

Thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-13 17:02             ` Ludovic Courtès
@ 2016-12-13 17:18               ` Ricardo Wurmus
  2020-04-16 13:26                 ` Ricardo Wurmus
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Wurmus @ 2016-12-13 17:18 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937


Ludovic Courtès <ludo@gnu.org> writes:

> Ricardo, Roel: would you be able to run that links-traversal.c from
> <https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>
> on a machine with a big store, as described at
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?

I just ran this on my workstation in the office where I regularly build
packages.  Here’s the output of “df -i /gnu”

  Filesystem               Inodes   IUsed   IFree IUse% Mounted on
  /dev/mapper/fedora-root 3301376 1098852 2202524   34% /

Probably not large enough to derive conclusions about hydra’s behaviour.

[I can’t run it on the shared store at the MDC because NFS performance is
too poor.  I recently ran “guix gc --optimize” to dedupe the shared
store (post-build deduplication is disabled since a few weeks) and it’s
at 3,197,489 used inodes.]

Here are the results of running the link-traversal code on my
workstation:

--8<---------------cut here---------------start------------->8---
rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c  -DMODE=3
rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
rwurmus in ~: time ./a.out 
412825 dir_entries, 107 seconds
stat took 0 seconds

real	1m47.264s
user	0m0.214s
sys	0m1.314s

rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c  -DMODE=2
rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
rwurmus in ~: time ./a.out 
12821 dir_entries, 107 seconds (including stat)

real	1m46.475s
user	0m0.201s
sys	0m1.309s
--8<---------------cut here---------------end--------------->8---


-- 
Ricardo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-13  4:09         ` Mark H Weaver
@ 2016-12-15  1:19           ` Mark H Weaver
  0 siblings, 0 replies; 25+ messages in thread
From: Mark H Weaver @ 2016-12-15  1:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

I apologize for losing my patience earlier.

     Mark

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-12-13 17:18               ` Ricardo Wurmus
@ 2020-04-16 13:26                 ` Ricardo Wurmus
  2020-04-16 14:27                   ` Ricardo Wurmus
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Wurmus @ 2020-04-16 13:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937


Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Ricardo, Roel: would you be able to run that links-traversal.c from
>> <https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>
>> on a machine with a big store, as described at
>> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?
>
> I just ran this on my workstation in the office where I regularly build
> packages.  Here’s the output of “df -i /gnu”
>
>   Filesystem               Inodes   IUsed   IFree IUse% Mounted on
>   /dev/mapper/fedora-root 3301376 1098852 2202524   34% /
>
> Probably not large enough to derive conclusions about hydra’s behaviour.
>
> [I can’t run it on the shared store at the MDC because NFS performance is
> too poor.  I recently ran “guix gc --optimize” to dedupe the shared
> store (post-build deduplication is disabled since a few weeks) and it’s
> at 3,197,489 used inodes.]
>
> Here are the results of running the link-traversal code on my
> workstation:
>
> --8<---------------cut here---------------start------------->8---
> rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c  -DMODE=3
> rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
> rwurmus in ~: time ./a.out
> 412825 dir_entries, 107 seconds
> stat took 0 seconds
>
> real	1m47.264s
> user	0m0.214s
> sys	0m1.314s
>
> rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c  -DMODE=2
> rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
> rwurmus in ~: time ./a.out
> 12821 dir_entries, 107 seconds (including stat)
>
> real	1m46.475s
> user	0m0.201s
> sys	0m1.309s
> --8<---------------cut here---------------end--------------->8---

I ran this for the first time on ci.guix.gnu.org, which has a very big
store (currently at around 29TB).

df -i /gnu:

  Filesystem        Inodes     IUsed     IFree IUse% Mounted on
  /dev/sdb1      610021376 132350406 477670970   22% /gnu

I had to increase the number of MAX_ENTRIES to 135000000.

I forgot to drop caches initially.  This is the first run:

--8<---------------cut here---------------start------------->8---
root@berlin ~ [env]# gcc links-traversal.c -DMODE=3 -o links-traversal
root@berlin ~ [env]# time ./links-traversal
57079502 dir_entries, 3906 seconds
stat took 136 seconds

real    67m48.145s
user    0m59.575s
sys     2m30.065s
--8<---------------cut here---------------end--------------->8---

I aborted the run after I dropped caches after 67 minutes.

I’m going to continue testing on one of the build nodes, and I’ll try
using statx.

--
Ricardo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2020-04-16 13:26                 ` Ricardo Wurmus
@ 2020-04-16 14:27                   ` Ricardo Wurmus
  2020-04-17  8:16                     ` Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Ricardo Wurmus @ 2020-04-16 14:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

[-- Attachment #1: Type: text/plain, Size: 6110 bytes --]

Here are more benchmarks on one of the build nodes.  It doesn’t nearly
have as many used inodes as ci.guix.gnu.org, but I could fill it up if
necessary.

  root@hydra-guix-127 ~# df -i /gnu/
  Filesystem       Inodes   IUsed    IFree IUse% Mounted on
  /dev/sda3      28950528 2796829 26153699   10% /

  root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l
  2017395

I tested all three modes with statx and with lstat.  The
links-traversal-statx.c is attached below.

* mode 1 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversal
links-traversal-statx.c:53:8: warning: �stat_entries� defined but not used [-Wunused-function]
   53 |   void stat_entries (void)
      |        ^~~~~~~~~~~~
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 9 seconds (including stat)

real    0m9.176s
user    0m0.801s
sys     0m4.236s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.556s
user    0m0.708s
sys     0m2.848s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.553s
user    0m0.599s
sys     0m2.954s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---


* mode 2 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 10 seconds (including stat)

real    0m9.598s
user    0m1.210s
sys     0m4.257s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.094s
user    0m0.988s
sys     0m3.107s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.095s
user    0m0.933s
sys     0m3.162s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---


* mode 3 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversal^C
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 7 seconds
stat took 3 seconds

real    0m9.992s
user    0m1.411s
sys     0m4.221s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 1 seconds
stat took 2 seconds

real    0m4.265s
user    0m1.120s
sys     0m3.145s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 2 seconds
stat took 2 seconds

real    0m4.267s
user    0m1.072s
sys     0m3.195s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

Now with just lstat:

* mode 1 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversal
links-traversal.c:49:8: warning: �stat_entries� defined but not used [-Wunused-function]
   49 |   void stat_entries (void)
      |        ^~~~~~~~~~~~
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 9 seconds (including stat)

real    0m9.303s
user    0m0.748s
sys     0m4.397s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.526s
user    0m0.540s
sys     0m2.987s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 3 seconds (including stat)

real    0m3.519s
user    0m0.600s
sys     0m2.919s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

* mode 2 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 9 seconds (including stat)

real    0m9.614s
user    0m1.205s
sys     0m4.250s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.060s
user    0m1.052s
sys     0m3.008s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.057s
user    0m0.984s
sys     0m3.073s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

* mode 3 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 6 seconds
stat took 3 seconds

real    0m9.767s
user    0m1.270s
sys     0m4.339s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 2 seconds
stat took 2 seconds

real    0m4.234s
user    0m1.136s
sys     0m3.097s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 1 seconds
stat took 2 seconds

real    0m4.222s
user    0m1.052s
sys     0m3.170s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

They are all very close, so I think I need to work with a bigger store
to see a difference.

Or perhaps I did something silly because I don’t know C…  If so please
let me know.

--
Ricardo


[-- Attachment #2: links-traversal-statx.c --]
[-- Type: application/octet-stream, Size: 2327 bytes --]

#include <unistd.h>
#include <dirent.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>



#define STAT_INTERLEAVED 1
#define STAT_SEMI_INTERLEAVED 2
#define STAT_OPTIMAL 3

struct entry
{
  char *name;
  ino_t inode;
};

#define MAX_ENTRIES 135000000
static struct entry dir_entries[MAX_ENTRIES];

int
main ()
{
  struct timeval start, end;

  /* For useful timings, do:
     sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'  */
  gettimeofday (&start, NULL);
  DIR *links = opendir ("/gnu/store/.links");

  size_t count = 0;

#if MODE != STAT_INTERLEAVED
  void sort_entries (void)
  {
    int entry_lower (const void *a, const void *b)
    {
      return ((struct entry *)a)->inode < ((struct entry *)b)->inode;
    }

    qsort (dir_entries, count, sizeof (struct entry),
	   entry_lower);
  }
#endif

  void stat_entries (void)
  {
    for (size_t i = 0; i < count; i++)
      {
	struct statx st;
	statx(AT_FDCWD, dir_entries[i].name,
              AT_SYMLINK_NOFOLLOW | AT_STATX_DONT_SYNC,
              STATX_SIZE | STATX_NLINK, &st);
	//lstat (dir_entries[i].name, &st);
      }
  }

  for (struct dirent *entry = readdir (links);
       entry != NULL;
       entry = readdir (links))
    {
      assert (count < MAX_ENTRIES);
      dir_entries[count].name = strdup (entry->d_name);
      dir_entries[count].inode = entry->d_ino;
#if MODE == STAT_INTERLEAVED
      struct statx st;
      statx(AT_FDCWD, entry->d_name,
            AT_SYMLINK_NOFOLLOW | AT_STATX_DONT_SYNC, STATX_SIZE | STATX_NLINK, &st);

      //lstat (entry->d_name, &st);
#endif

#if MODE == STAT_SEMI_INTERLEAVED
      if (count++ >= 100000)
	{
	  sort_entries ();
	  stat_entries ();
	  count = 0;
	}
#else
      count++;
#endif
    }

#if MODE == STAT_SEMI_INTERLEAVED
  sort_entries ();
  stat_entries ();
#endif

  gettimeofday (&end, NULL);
  printf ("%zi dir_entries, %zi seconds"
#if MODE != STAT_OPTIMAL
	  " (including stat)"
#endif
	  "\n", count,
	  end.tv_sec - start.tv_sec);

#if MODE == STAT_OPTIMAL
  sort_entries ();
  gettimeofday (&start, NULL);
  stat_entries ();
  gettimeofday (&end, NULL);

  printf ("stat took %zi seconds\n", end.tv_sec - start.tv_sec);
#endif

  return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2020-04-16 14:27                   ` Ricardo Wurmus
@ 2020-04-17  8:16                     ` Ludovic Courtès
  2020-04-17  8:28                       ` Ricardo Wurmus
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2020-04-17  8:16 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 24937

Hi Ricardo,

Thanks for running this benchmark!

Ricardo Wurmus <rekado@elephly.net> skribis:

>   root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l
>   2017395

That’s not a lot, my laptop has 2.8M links.

It’s interesting to see that system time remains at ~4.2s in all modes.
So the only thing that modes 2 and 3 achieve is increasing CPU time.
It’s as if the order in which files are stat’d had no impact on I/O
performance.

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2020-04-17  8:16                     ` Ludovic Courtès
@ 2020-04-17  8:28                       ` Ricardo Wurmus
  0 siblings, 0 replies; 25+ messages in thread
From: Ricardo Wurmus @ 2020-04-17  8:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937


Ludovic Courtès <ludo@gnu.org> writes:

>>   root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l
>>   2017395
>
> That’s not a lot, my laptop has 2.8M links.

Let me rerun this after copying a few thousand store items from
ci.guix.gnu.org over.  Maybe we’ll see the different times diverge then.

--
Ricardo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
  2016-12-09 22:43 ` Ludovic Courtès
  2016-12-11 13:46 ` Ludovic Courtès
@ 2021-11-09 14:44 ` Ludovic Courtès
  2021-11-09 15:00   ` Ludovic Courtès
  2021-11-11 20:59   ` Maxim Cournoyer
  2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix
  3 siblings, 2 replies; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-09 14:44 UTC (permalink / raw)
  To: 24937; +Cc: Maxim Cournoyer

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

Hi!

ludo@gnu.org (Ludovic Courtès) skribis:

> ‘LocalStore::removeUnusedLinks’ traverses all the entries in
> /gnu/store/.links and calls lstat(2) on each one of them and checks
> ‘st_nlink’ to determine whether they can be deleted.
>
> There are two problems: lstat(2) can be slow on spinning disks as found
> on hydra.gnu.org, and the algorithm is proportional in the number of
> entries in /gnu/store/.links, which is a lot on hydra.gnu.org.

Taking a step back, we could perhaps mitigate this with heuristics to
reduce the number of entries in .links:

  1. Do not deduplicate files with a size lower than some threshold;

  2. Delete links with st_nlink <= 3 (instead of <= 2); that would
     prevent *further* deduplication of those files, but they’d already
     have two instances sharing the same inode;

  3. Stop deduplicating once the number of entries in .links has reached
     a certain threshold.

For #1, a key insight is that about 30% of the files actually
deduplicated (in my store, where /gnu/store/.links has 2.2M entries) are
smaller than 1 KiB:


[-- Attachment #2: sizes --]
[-- Type: image/png, Size: 22211 bytes --]

[-- Attachment #3: Type: text/plain, Size: 93 bytes --]


… but 85% of them have at most 4 links (thus, saving up to 2 KiB by
deduplicating):


[-- Attachment #4: link counts for files < 1KiB --]
[-- Type: image/png, Size: 20456 bytes --]

[-- Attachment #5: Type: text/plain, Size: 440 bytes --]


On my laptop, we’re talking about space savings of 325 MiB, a tiny
fraction of my store:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (saved-space (filter (lambda (file)
					    (< (deduplicated-file-size file) 1024))
					  l))
$40 = 325914739
--8<---------------cut here---------------end--------------->8---

Files smaller than 1 KiB represent 35% of the entries in .links:


[-- Attachment #6: distribution by file size --]
[-- Type: image/png, Size: 21654 bytes --]

[-- Attachment #7: Type: text/plain, Size: 728 bytes --]


By not deduplicating files smaller than 1 KiB, we’d reduce the number of
entries by 35%, which should already have a tangible impact on
performance.  It’d be a “mitigation” more than a “fix”, but it has a
good work/reward ratio.

We could conduct a similar analysis for #2.

#3 is more difficult to implement because you cannot know the number of
entries in .links until you’ve traversed it (note that currently
deduplication stops when link(2) returns ENOSPC in .links).

I’m attaching the script I’ve used for that, derived from an earlier
experiment¹.  You’re welcome to give it a spin!

Thoughts?

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2014-09/msg00422.html


[-- Attachment #8: the script --]
[-- Type: text/plain, Size: 5803 bytes --]

(use-modules (charting)
             ((guix store) #:select (%store-prefix))
             (ice-9 ftw)
             (ice-9 match)
             (srfi srfi-1)
             (srfi srfi-9))

(define-record-type <deduplicated-file>
  (deduplicated-file name size links)
  deduplicated-file?
  (name    deduplicated-file-name)
  (size    deduplicated-file-size)
  (links   deduplicated-file-link-count))

(define %links-directory
  (string-append (%store-prefix) "/.links"))

(define (links)
  "Return a list of <deduplicated-file>."
  (file-system-fold (const #t)
                    (lambda (file stat result)      ;leaf
                      (cons (deduplicated-file file
                                               (stat:size stat)
                                               (stat:nlink stat))
                            result))
                    (lambda (directory stat result) ;down
                      result)
                    (lambda (directory stat result) ;up
                      result)
                    (const #f)                      ;skip
                    (lambda (file stat errno result)
                      (error "i/o error" file errno))
                    '()
                    %links-directory
                    lstat))

(define KiB (expt 2 10))
(define MiB (* KiB KiB))
(define GiB (* KiB MiB))

(define (saved-space files)
  "Return the total amount of saved space given FILES, a list of
<deduplicated-file>."
  (fold (lambda (df result)
          (match df
            (($ <deduplicated-file> name size links)
             (when (< links 2)
               (error "too few links" name links))
             (+ result (* size (- links 2))))))
        0
        files))

(define (cumulative-distribution files property)
  "Return a list of (VALUE . COUNT) pairs representing the number of FILES
whose PROPERTY is VALUE or less."
  (define (file<? df1 df2)
    (< (property df1) (property df2)))

  (fold (lambda (df result)
          (match result
            (((value . count) . rest)
             (let ((current (property df)))
               (if (= value current)
                   (alist-cons value (+ count 1) rest)
                   (alist-cons current (+ count 1) result))))
            (_
             (alist-cons (property df) 1 result))))
        '()
        (sort files file<?)))

(define* (plot-distribution distribution output
                            #:key (subtitle "SUBTITLE") max-x
                            (group-name "GROUP") x-axis-label)
  "Plot DISTRIBUTION, and produce file OUTPUT."
  (define (log2 n)
    (let loop ((result 1))
      (if (zero? (ash n (- result)))
          (- result 1)
          (loop (+ 1 result)))))

  (define (format-log2-tick tick)
    ;; (string-append "2^"
    ;;                (number->string (log2 (inexact->exact tick))))
    (number->string (inexact->exact tick)))

  (define (adjust-items total)
    (lambda (x)
      (match x
        ;; XXX: Filter out the two cases that would give us a numerical
        ;; overflow.
        ((0 . _) #f)
        ((1 . _) #f)
        ((value . count)
         (and (or (not max-x) (< value max-x))
              (cons value (* 100. (/ count total))))))))

  (match distribution
    (((_ . total) . rest)
     (let ((percent (filter-map (adjust-items total) distribution)))
       (make-scatter-plot #:title (string-append "Cumulative distribution by "
                                                 subtitle)
                          #:data `((,group-name ,@percent))
                          #:x-axis-label x-axis-label
                          #:y-axis-label "%"
                          #:tick-label-formatter format-log2-tick
                          #:log-x-base 2
                          #:min-x 1
                          #:max-y 101
                          #:write-to-png output)))))

#! Examples
(define l (links))  ;this is the expensive part
(plot-distribution (cumulative-distribution l deduplicated-file-link-count)
                   "/tmp/nlink.png" #:x-axis-label "number of hard links"
                   #:subtitle "hard link count" #:max-x 2048
                   #:group-name "nlinks")

(plot-distribution (cumulative-distribution
                    (filter (lambda (file)
                              (< (deduplicated-file-size file) 1024))
                            l)
                    deduplicated-file-link-count)
                   "/tmp/nlink-small.png" #:x-axis-label "number of hard links"
                   #:subtitle "hard link count for files < 1KiB" #:max-x 2048
                   #:group-name "nlinks")


(plot-distribution (cumulative-distribution l deduplicated-file-size)
                   "/tmp/size.png" #:x-axis-label "file size"
                   #:subtitle "file size" #:max-x 32768
                   #:group-name "size (B)")

(plot-distribution (cumulative-distribution
                    (filter (lambda (f)
                              (> (deduplicated-file-link-count f) 2))
                            l)
                    deduplicated-file-size)
                   "/tmp/size-deduplicated.png" #:x-axis-label "file size" #:subtitle
                   "size for files actually deduplicated" #:max-x 32768
                   #:group-name "size (B)")

(plot-distribution (cumulative-distribution
                    (filter (lambda (file)
                              (< (deduplicated-file-size file) 1024))
                            l)
                    (lambda (file)
                      (* (deduplicated-file-size file)
                         (- (deduplicated-file-link-count file) 2))))
                   "/tmp/size-savings.png" #:x-axis-label "savings" #:subtitle
                   "savings for files < 1KiB" #:max-x 32768
                   #:group-name "savings (B)")
!#

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2021-11-09 14:44 ` Ludovic Courtès
@ 2021-11-09 15:00   ` Ludovic Courtès
  2021-11-11 20:59   ` Maxim Cournoyer
  1 sibling, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-09 15:00 UTC (permalink / raw)
  To: 24937; +Cc: Maxim Cournoyer

Ludovic Courtès <ludo@gnu.org> skribis:

> On my laptop, we’re talking about space savings of 325 MiB, a tiny
> fraction of my store:
>
> scheme@(guile-user)> (saved-space (filter (lambda (file)
> 					    (< (deduplicated-file-size file) 1024))
> 					  l))
> $40 = 325914739

For files < 4 KiB, the savings are ~2 GiB, roughly 1% of my store.

Ludo’.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2021-11-09 14:44 ` Ludovic Courtès
  2021-11-09 15:00   ` Ludovic Courtès
@ 2021-11-11 20:59   ` Maxim Cournoyer
  2021-11-13 16:56     ` Ludovic Courtès
  1 sibling, 1 reply; 25+ messages in thread
From: Maxim Cournoyer @ 2021-11-11 20:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 24937

[-- Attachment #1: Type: text/plain, Size: 102 bytes --]

Hi Ludovic,

I haven't done any analysis, just grabbed the result, but here it what
it looks for me:


[-- Attachment #2: nlink.png --]
[-- Type: image/png, Size: 19832 bytes --]

[-- Attachment #3: nlink-small.png --]
[-- Type: image/png, Size: 21217 bytes --]

[-- Attachment #4: size-deduplicated.png --]
[-- Type: image/png, Size: 23574 bytes --]

[-- Attachment #5: size.png --]
[-- Type: image/png, Size: 23313 bytes --]

[-- Attachment #6: size-savings.png --]
[-- Type: image/png, Size: 22143 bytes --]

[-- Attachment #7: Type: text/plain, Size: 46 bytes --]


That's for 3631356 links.

Thank you!

Maxim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2021-11-11 20:59   ` Maxim Cournoyer
@ 2021-11-13 16:56     ` Ludovic Courtès
  2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
  2021-11-13 21:45       ` Ludovic Courtès
  0 siblings, 2 replies; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-13 16:56 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 24937

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I haven't done any analysis, just grabbed the result, but here it what
> it looks for me:

There’s a bit more than 35% of deduplicated files that are < 1KiB, and
not much to be gained by deduplicating them.

On IRC several people shared the results on their machine; several had
similar results, and one person had a lot more of those small files (50%
of deduplicated files were < 1KiB).

The chart (with a kinda bogus layout) below is perhaps more interesting:
it shows the contribution of files below a certain size to the overall
space savings.


[-- Attachment #2: the chart --]
[-- Type: image/png, Size: 22673 bytes --]

[-- Attachment #3: Type: text/plain, Size: 2360 bytes --]


In a nutshell:

  • Files < 1KiB contribute to 0.3% of the space savings;

  • Files < 4KiB contribute to 2.5% of the space savings;

  • Files < 256KiB contribute to 42% of the space savings.

You can create this plot with:

--8<---------------cut here---------------start------------->8---
(make-scatter-plot #:title "Contribution to space savings"
                   #:write-to-png "/tmp/space-saving-contribution.png"
                   #:chart-width 1000
                   #:y-axis-label "contribution (%)"
                   #:x-axis-label "size (B)"
                   #:log-x-base 2
                   #:min-x 513
                   #:data
                   (let ((total (saved-space l)))
                     `(("contribution"
                        ,@(map (lambda (size)
                                 (cons size
                                       (/ (saved-space (filter (lambda (file)
                                                                 (< (deduplicated-file-size
                                                                     file)
                                                                    size))
                                                               l))
                                          total .01)))
                               (map (cut expt 2 <>)
                                    (iota 12 10 1)))))))
--8<---------------cut here---------------end--------------->8---

You can also compute individual points like this:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 1024))
					     l))
			(saved-space l) 1.)
$60 = 0.0034284626558736746
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 4096))
					     l))
			(saved-space l) 1.)
$62 = 0.025190871178467848
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) (expt 2 18)))
					     l))
			(saved-space l) 1.)
$65 = 0.42411104869782185
--8<---------------cut here---------------end--------------->8---

Choosing a deduplication threshold of 2KiB or 4KiB would have a
negligible impact on disk usage on my machine.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: [PATCH 1/2] tests: Factorize 'file=?'.
  2021-11-13 16:56     ` Ludovic Courtès
@ 2021-11-13 21:37       ` Ludovic Courtès
  2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
  2021-11-13 21:45       ` Ludovic Courtès
  1 sibling, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-13 21:37 UTC (permalink / raw)
  To: 24937

* guix/tests.scm (file=?): Add optional 'stat' parameter.  Add fast
patch comparing inode numbers.
* tests/gexp.scm ("imported-files with file-like objects"): Remove
'file=?' procedure and use the one from (guix tests).
---
 guix/tests.scm | 30 +++++++++++++++++-------------
 tests/gexp.scm | 11 +++--------
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/guix/tests.scm b/guix/tests.scm
index fc3d521163..e1c194340c 100644
--- a/guix/tests.scm
+++ b/guix/tests.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013-2021 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -182,18 +182,22 @@ (define (random-bytevector n)
             (loop (1+ i)))
           bv))))
 
-(define (file=? a b)
-  "Return true if files A and B have the same type and same content."
-  (and (eq? (stat:type (lstat a)) (stat:type (lstat b)))
-       (case (stat:type (lstat a))
-         ((regular)
-          (equal?
-           (call-with-input-file a get-bytevector-all)
-           (call-with-input-file b get-bytevector-all)))
-         ((symlink)
-          (string=? (readlink a) (readlink b)))
-         (else
-          (error "what?" (lstat a))))))
+(define* (file=? a b #:optional (stat lstat))
+  "Return true if files A and B have the same type and same content.  Call
+STAT to obtain file metadata."
+  (let ((sta (stat a)) (stb (stat b)))
+    (and (eq? (stat:type sta) (stat:type stb))
+         (case (stat:type sta)
+           ((regular)
+            (or (and (= (stat:ino sta) (stat:ino stb))
+                     (= (stat:dev sta) (stat:dev stb)))
+                (equal?
+                 (call-with-input-file a get-bytevector-all)
+                 (call-with-input-file b get-bytevector-all))))
+           ((symlink)
+            (string=? (readlink a) (readlink b)))
+           (else
+            (error "what?" (stat a)))))))
 
 (define (canonical-file? file)
   "Return #t if FILE is in the store, is read-only, and its mtime is 1."
diff --git a/tests/gexp.scm b/tests/gexp.scm
index 39a47d4e8c..0758a49f5f 100644
--- a/tests/gexp.scm
+++ b/tests/gexp.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014-2021 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2021 Maxime Devos <maximedevos@telenet.be>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -827,19 +827,14 @@ (define (canonical-file? file)
                        (files -> `(("a/b/c" . ,q-scm)
                                    ("p/q"   . ,plain)))
                        (drv      (imported-files files)))
-    (define (file=? file1 file2)
-      ;; Assume deduplication is in place.
-      (= (stat:ino (stat file1))
-         (stat:ino (stat file2))))
-
     (mbegin %store-monad
       (built-derivations (list (pk 'drv drv)))
       (mlet %store-monad ((dir -> (derivation->output-path drv))
                           (plain* (text-file "foo" "bar!"))
                           (q-scm* (interned-file q-scm "c")))
         (return
-         (and (file=? (string-append dir "/a/b/c") q-scm*)
-              (file=? (string-append dir "/p/q") plain*)))))))
+         (and (file=? (string-append dir "/a/b/c") q-scm* stat)
+              (file=? (string-append dir "/p/q") plain* stat)))))))
 
 (test-equal "gexp-modules & ungexp"
   '((bar) (foo))
-- 
2.33.0





^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB.
  2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
@ 2021-11-13 21:37         ` Ludovic Courtès
  2021-11-16 13:54           ` bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-13 21:37 UTC (permalink / raw)
  To: 24937

Files smaller than 4 KiB typically represent ~60% of the entries in
/gnu/store/.links but only contribute to ~2.5% of the space savings
afforded by deduplication.

Not considering these files for deduplication speeds up file insertion
in the store and, more importantly, leaves 'removeUnusedLinks' with
fewer entries to traverse, thereby speeding it up proportionally.

Partly fixes <https://issues.guix.gnu.org/24937>.

* config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK
definition.
* guix/store/deduplication.scm (%deduplication-minimum-size): New
variable.
(deduplicate)[loop]: Do not recurse when FILE's size is below
%DEDUPLICATION-MINIMUM-SIZE.
(dump-port): New procedure.
(dump-file/deduplicate)[hash]: Turn into...
[dump-and-compute-hash]: ... this thunk.
Call 'deduplicate' only when SIZE is greater than
%DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'.
* nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where
st.st_size < deduplicationMinSize.
* nix/libstore/local-store.hh (deduplicationMinSize): New declaration.
* nix/libstore/optimise-store.cc (deduplicationMinSize): New variable.
(LocalStore::optimisePath_): Return when PATH is a symlink or smaller
than 'deduplicationMinSize'.
* tests/derivations.scm ("identical files are deduplicated"): Produce
files bigger than %DEDUPLICATION-MINIMUM-SIZE.
* tests/nar.scm ("restore-file-set with directories (signed, valid)"):
Likewise.
* tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"):
New test.
("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than
%DEDUPLICATION-MINIMUM-SIZE.
* tests/store.scm ("substitute, deduplication"): Likewise.
---
 config-daemon.ac               | 11 -------
 guix/store/deduplication.scm   | 57 ++++++++++++++++++++++++++++------
 nix/libstore/gc.cc             |  4 ++-
 nix/libstore/local-store.hh    |  3 ++
 nix/libstore/optimise-store.cc | 15 +++++----
 tests/derivations.scm          | 14 ++++++---
 tests/nar.scm                  |  7 +++--
 tests/store-deduplication.scm  | 41 ++++++++++++++++++++----
 tests/store.scm                |  4 ++-
 9 files changed, 114 insertions(+), 42 deletions(-)

diff --git a/config-daemon.ac b/config-daemon.ac
index 5ddc740600..86306effe1 100644
--- a/config-daemon.ac
+++ b/config-daemon.ac
@@ -94,17 +94,6 @@ if test "x$guix_build_daemon" = "xyes"; then
   AC_CHECK_FUNCS([lutimes lchown posix_fallocate sched_setaffinity \
      statvfs nanosleep strsignal statx])
 
-  dnl Check whether the store optimiser can optimise symlinks.
-  AC_MSG_CHECKING([whether it is possible to create a link to a symlink])
-  ln -s bla tmp_link
-  if ln tmp_link tmp_link2 2> /dev/null; then
-      AC_MSG_RESULT(yes)
-      AC_DEFINE(CAN_LINK_SYMLINK, 1, [Whether link() works on symlinks.])
-  else
-      AC_MSG_RESULT(no)
-  fi
-  rm -f tmp_link tmp_link2
-
   dnl Check for <locale>.
   AC_LANG_PUSH(C++)
   AC_CHECK_HEADERS([locale])
diff --git a/guix/store/deduplication.scm b/guix/store/deduplication.scm
index cd9660174c..8a59adad39 100644
--- a/guix/store/deduplication.scm
+++ b/guix/store/deduplication.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2017 Caleb Ristvedt <caleb.ristvedt@cune.org>
-;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018-2021 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -22,12 +22,13 @@
 
 (define-module (guix store deduplication)
   #:use-module (gcrypt hash)
-  #:use-module (guix build utils)
+  #:use-module ((guix build utils) #:hide (dump-port))
   #:use-module (guix build syscalls)
   #:use-module (guix base32)
   #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-34)
   #:use-module (srfi srfi-35)
+  #:use-module (rnrs bytevectors)
   #:use-module (rnrs io ports)
   #:use-module (ice-9 ftw)
   #:use-module (ice-9 match)
@@ -37,6 +38,31 @@ (define-module (guix store deduplication)
             dump-file/deduplicate
             copy-file/deduplicate))
 
+;; TODO: Remove once 'dump-port' in (guix build utils) has an optional 'len'
+;; parameter.
+(define* (dump-port in out
+                    #:optional len
+                    #:key (buffer-size 16384))
+  "Read LEN bytes from IN (or as much as possible if LEN is #f) and write it
+to OUT, using chunks of BUFFER-SIZE bytes."
+  (define buffer
+    (make-bytevector buffer-size))
+
+  (let loop ((total 0)
+             (bytes (get-bytevector-n! in buffer 0
+                                       (if len
+                                           (min len buffer-size)
+                                           buffer-size))))
+    (or (eof-object? bytes)
+        (and len (= total len))
+        (let ((total (+ total bytes)))
+          (put-bytevector out buffer 0 bytes)
+          (loop total
+                (get-bytevector-n! in buffer 0
+                                   (if len
+                                       (min (- len total) buffer-size)
+                                       buffer-size)))))))
+
 (define (nar-sha256 file)
   "Gives the sha256 hash of a file and the size of the file in nar form."
   (let-values (((port get-hash) (open-sha256-port)))
@@ -127,6 +153,12 @@ (define temp-link
           (unless (= EMLINK (system-error-errno args))
             (apply throw args)))))))
 
+(define %deduplication-minimum-size
+  ;; Size below which files are not deduplicated.  This avoids adding too many
+  ;; entries to '.links', which would slow down 'removeUnusedLinks' while
+  ;; saving little space.  Keep in sync with optimize-store.cc.
+  4096)
+
 (define* (deduplicate path hash #:key (store (%store-directory)))
   "Check if a store item with sha256 hash HASH already exists.  If so,
 replace PATH with a hardlink to the already-existing one.  If not, register
@@ -144,13 +176,16 @@ (define links-directory
                     ((file . properties)
                      (unless (member file '("." ".."))
                        (let* ((file (string-append path "/" file))
+                              (st   (lstat file))
                               (type (match (assoc-ref properties 'type)
                                       ((or 'unknown #f)
-                                       (stat:type (lstat file)))
+                                       (stat:type st))
                                       (type type))))
-                         (loop file type
-                               (and (not (eq? 'directory type))
-                                    (nar-sha256 file)))))))
+                         (unless (< (stat:size st)
+                                    %deduplication-minimum-size)
+                           (loop file type
+                                 (and (not (eq? 'directory type))
+                                      (nar-sha256 file))))))))
                   (scandir* path))
         (let ((link-file (string-append links-directory "/"
                                         (bytevector->nix-base32-string hash))))
@@ -222,9 +257,9 @@ (define* (dump-file/deduplicate file input size type
 
 This procedure is suitable as a #:dump-file argument to 'restore-file'.  When
 used that way, it deduplicates files on the fly as they are restored, thereby
-removing the need to a deduplication pass that would re-read all the files
+removing the need for a deduplication pass that would re-read all the files
 down the road."
-  (define hash
+  (define (dump-and-compute-hash)
     (call-with-output-file file
       (lambda (output)
         (let-values (((hash-port get-hash)
@@ -236,7 +271,11 @@ (define hash
           (close-port hash-port)
           (get-hash)))))
 
-  (deduplicate file hash #:store store))
+  (if (>= size %deduplication-minimum-size)
+      (deduplicate file (dump-and-compute-hash) #:store store)
+      (call-with-output-file file
+        (lambda (output)
+          (dump-port input output size)))))
 
 (define* (copy-file/deduplicate source target
                                 #:key (store (%store-directory)))
diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
index e1d0765154..16519116e4 100644
--- a/nix/libstore/gc.cc
+++ b/nix/libstore/gc.cc
@@ -606,7 +606,9 @@ void LocalStore::removeUnusedLinks(const GCState & state)
             throw SysError(format("statting `%1%'") % path);
 #endif
 
-        if (st.st_nlink != 1) {
+	/* Drop links for files smaller than 'deduplicationMinSize', even if
+	   they have more than one hard link.  */
+        if (st.st_nlink != 1 && st.st_size >= deduplicationMinSize) {
             actualSize += st.st_size;
             unsharedSize += (st.st_nlink - 1) * st.st_size;
             continue;
diff --git a/nix/libstore/local-store.hh b/nix/libstore/local-store.hh
index 9ba37219da..20d3c3c893 100644
--- a/nix/libstore/local-store.hh
+++ b/nix/libstore/local-store.hh
@@ -292,4 +292,7 @@ void canonicaliseTimestampAndPermissions(const Path & path);
 
 MakeError(PathInUse, Error);
 
+/* Size below which a file is not considered for deduplication.  */
+extern const size_t deduplicationMinSize;
+
 }
diff --git a/nix/libstore/optimise-store.cc b/nix/libstore/optimise-store.cc
index eb303ab4c3..baca1a4890 100644
--- a/nix/libstore/optimise-store.cc
+++ b/nix/libstore/optimise-store.cc
@@ -15,6 +15,9 @@
 
 namespace nix {
 
+/* Any file smaller than this is not considered for deduplication.
+   Keep in sync with (guix store deduplication).  */
+const size_t deduplicationMinSize = 4096;
 
 static void makeWritable(const Path & path)
 {
@@ -105,12 +108,12 @@ void LocalStore::optimisePath_(OptimiseStats & stats, const Path & path, InodeHa
         return;
     }
 
-    /* We can hard link regular files and maybe symlinks. */
-    if (!S_ISREG(st.st_mode)
-#if CAN_LINK_SYMLINK
-        && !S_ISLNK(st.st_mode)
-#endif
-        ) return;
+    /* We can hard link regular files (and maybe symlinks), but do that only
+       for files larger than some threshold.  This avoids adding too many
+       entries to '.links', which would slow down 'removeUnusedLinks' while
+       saving little space.  */
+    if (!S_ISREG(st.st_mode) || ((size_t) st.st_size) < deduplicationMinSize)
+	return;
 
     /* Sometimes SNAFUs can cause files in the store to be
        modified, in particular when running programs as root under
diff --git a/tests/derivations.scm b/tests/derivations.scm
index cd165d1be6..4621098df3 100644
--- a/tests/derivations.scm
+++ b/tests/derivations.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2012-2021 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -170,11 +170,15 @@ (define prefix-len (string-length dir))
         #f))))
 
 (test-assert "identical files are deduplicated"
-  (let* ((build1  (add-text-to-store %store "one.sh"
-                                     "echo hello, world > \"$out\"\n"
+  ;; Note: DATA must be longer than %DEDUPLICATION-MINIMUM-SIZE.
+  (let* ((data    (make-string 4500 #\a))
+         (build1  (add-text-to-store %store "one.sh"
+                                     (string-append "echo -n " data
+                                                    " > \"$out\"\n")
                                      '()))
          (build2  (add-text-to-store %store "two.sh"
-                                     "# Hey!\necho hello, world > \"$out\"\n"
+                                     (string-append "# Hey!\necho -n "
+                                                    data " > \"$out\"\n")
                                      '()))
          (drv1    (derivation %store "foo"
                               %bash `(,build1)
@@ -187,7 +191,7 @@ (define prefix-len (string-length dir))
                (file2 (derivation->output-path drv2)))
            (and (valid-path? %store file1) (valid-path? %store file2)
                 (string=? (call-with-input-file file1 get-string-all)
-                          "hello, world\n")
+                          data)
                 (= (stat:ino (lstat file1))
                    (stat:ino (lstat file2))))))))
 
diff --git a/tests/nar.scm b/tests/nar.scm
index ba4881caaa..bd2bf6e6e0 100644
--- a/tests/nar.scm
+++ b/tests/nar.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2012-2021 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -486,8 +486,9 @@ (define-values (port get-bytevector)
   ;; their mtime and permissions were not reset.  Ensure that this bug is
   ;; gone.
   (with-store store
-    (let* ((text1 (random-text))
-           (text2 (random-text))
+    ;; Note: TEXT1 and TEXT2 must be longer than %DEDUPLICATION-MINIMUM-SIZE.
+    (let* ((text1 (string-concatenate (make-list 100 (random-text))))
+           (text2 (string-concatenate (make-list 100 (random-text))))
            (tree  `("tree" directory
                     ("a" regular (data ,text1))
                     ("b" directory
diff --git a/tests/store-deduplication.scm b/tests/store-deduplication.scm
index b1c2d93bbd..b2b7c36622 100644
--- a/tests/store-deduplication.scm
+++ b/tests/store-deduplication.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2018, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018, 2020-2021 Ludovic Courtès <ludo@gnu.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -30,13 +30,40 @@ (define-module (test-store-deduplication)
 
 (test-begin "store-deduplication")
 
+(test-equal "deduplicate, below %deduplication-minimum-size"
+  (list #t (make-list 5 1))
+
+  (call-with-temporary-directory
+   (lambda (store)
+     ;; Note: DATA must be longer than %DEDUPLICATION-MINIMUM-SIZE.
+     (let ((data      "Hello, world!")
+           (identical (map (lambda (n)
+                             (string-append store "/" (number->string n)
+                                            "/a/b/c"))
+                           (iota 5))))
+       (for-each (lambda (file)
+                   (mkdir-p (dirname file))
+                   (call-with-output-file file
+                     (lambda (port)
+                       (put-bytevector port (string->utf8 data)))))
+                 identical)
+
+       (deduplicate store (nar-sha256 store) #:store store)
+
+       ;; (system (string-append "ls -lRia " store))
+       (list (= (length (delete-duplicates
+                         (map (compose stat:ino stat) identical)))
+                (length identical))
+             (map (compose stat:nlink stat) identical))))))
+
 (test-equal "deduplicate"
   (cons* #t #f                                    ;inode comparisons
          2 (make-list 5 6))                       ;'nlink' values
 
   (call-with-temporary-directory
    (lambda (store)
-     (let ((data      (string->utf8 "Hello, world!"))
+     ;; Note: DATA must be longer than %DEDUPLICATION-MINIMUM-SIZE.
+     (let ((data      (string-concatenate (make-list 500 "Hello, world!")))
            (identical (map (lambda (n)
                              (string-append store "/" (number->string n)
                                             "/a/b/c"))
@@ -46,7 +73,7 @@ (define-module (test-store-deduplication)
                    (mkdir-p (dirname file))
                    (call-with-output-file file
                      (lambda (port)
-                       (put-bytevector port data))))
+                       (put-bytevector port (string->utf8 data)))))
                  identical)
        ;; Make the parent of IDENTICAL read-only.  This should not prevent
        ;; deduplication from inserting its hard link.
@@ -54,7 +81,7 @@ (define-module (test-store-deduplication)
 
        (call-with-output-file unique
          (lambda (port)
-           (put-bytevector port (string->utf8 "This is unique."))))
+           (put-bytevector port (string->utf8 (string-reverse data)))))
 
        (deduplicate store (nar-sha256 store) #:store store)
 
@@ -77,8 +104,10 @@ (define-module (test-store-deduplication)
    (lambda (store)
      (let ((true-link link)
            (links     0)
-           (data1     (string->utf8 "Hello, world!"))
-           (data2     (string->utf8 "Hi, world!"))
+           (data1     (string->utf8
+                       (string-concatenate (make-list 500 "Hello, world!"))))
+           (data2     (string->utf8
+                       (string-concatenate (make-list 500 "Hi, world!"))))
            (identical (map (lambda (n)
                              (string-append store "/" (number->string n)
                                             "/a/b/c"))
diff --git a/tests/store.scm b/tests/store.scm
index 2150a0048c..5089909362 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -759,7 +759,9 @@ (define lst
 
 (test-assert "substitute, deduplication"
   (with-store s
-    (let* ((c   (random-text))                     ; contents of the output
+    ;; Note: C must be longer than %DEDUPLICATION-MINIMUM-SIZE.
+    (let* ((c   (string-concatenate
+                 (make-list 100 (random-text))))  ; contents of the output
            (g   (package-derivation s %bootstrap-guile))
            (d1  (build-expression->derivation s "substitute-me"
                                               `(begin ,c (exit 1))
-- 
2.33.0





^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2021-11-13 16:56     ` Ludovic Courtès
  2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
@ 2021-11-13 21:45       ` Ludovic Courtès
  1 sibling, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-13 21:45 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 24937

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> In a nutshell:
>
>   • Files < 1KiB contribute to 0.3% of the space savings;
>
>   • Files < 4KiB contribute to 2.5% of the space savings;

I get similar results on bayfront.guix.gnu.org (with 3.2M entries):


[-- Attachment #2: bayfront-space-saving-contribution.png --]
[-- Type: image/png, Size: 22704 bytes --]

[-- Attachment #3: bayfront-size.png --]
[-- Type: image/png, Size: 21799 bytes --]

[-- Attachment #4: Type: text/plain, Size: 55 bytes --]


… and on guix.bordeaux.inria.fr (2.0M entries):


[-- Attachment #5: guix-hpc4-space-saving-contribution.png --]
[-- Type: image/png, Size: 23344 bytes --]

[-- Attachment #6: guix-hpc4-size.png --]
[-- Type: image/png, Size: 22621 bytes --]

[-- Attachment #7: Type: text/plain, Size: 175 bytes --]


Files < 4KiB represent between 60% and 75% of the /gnu/store/.links
entries here.

I’ve sent patches that implement a cutoff threshold at 4 KiB.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
@ 2021-11-16 13:54           ` Ludovic Courtès
  0 siblings, 0 replies; 25+ messages in thread
From: Ludovic Courtès @ 2021-11-16 13:54 UTC (permalink / raw)
  To: 24937

Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

> Files smaller than 4 KiB typically represent ~60% of the entries in
> /gnu/store/.links but only contribute to ~2.5% of the space savings
> afforded by deduplication.
>
> Not considering these files for deduplication speeds up file insertion
> in the store and, more importantly, leaves 'removeUnusedLinks' with
> fewer entries to traverse, thereby speeding it up proportionally.
>
> Partly fixes <https://issues.guix.gnu.org/24937>.

Pushed a variant of this as commit
472a0e82a52a3d5d841e1dfad6b13e26082a5750, with a threshold of 8 KiB.

Concretely, the number of .links entries shrinks by ~70%, from
2M to 700K on my laptop, and (presumably) from 64M to 19M on berlin.

I’ll deploy it within a few days on berlin.  I hope the speedup will
reduce pressure there, though obviously it’ll still be an expensive
operation (but fundamentally I think it’ll always be linear in the size
of the store.)

I’m preparing an update of the ‘guix’ package to make this readily
available.  When you deploy the new daemon, .links will be trimmed of
entries for files smaller than 8 KiB the first time you run ‘guix gc’.

Ludo’.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#24937: "deleting unused links" GC phase is too slow
  2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
                   ` (2 preceding siblings ...)
  2021-11-09 14:44 ` Ludovic Courtès
@ 2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix
  3 siblings, 0 replies; 25+ messages in thread
From: John Kehayias via Bug reports for GNU Guix @ 2021-11-22  2:30 UTC (permalink / raw)
  To: 24937@debbugs.gnu.org

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

Hello,

A little late, but wanted to add my results here, from before the commit was made. I had reported some of this on IRC before and had some outlying results. Since then I finally did some generation deleting and gc-ing, though perhaps still have a bit of cruft. I've been on core-updates-frozen for a while, so keeping a lot from before making the branch switch, as well as lots of stuff piling up in trying to fix things or trying out changes in core-updates-frozen.

Anyway, attached are the plots from the above code. Running

ls -1A /gnu/store/.links | wc -l

showed 15776256 links at the time. Still quite a bit I think, but I've had 1.5-2x as much in the past, easily. (This had caused some earlier warnings on ext4 and enabling large_dir, which will make a system unbootable due to Grub not being up to speed on this old feature. I'm now on btrfs.)

John

[-- Attachment #2: nlink.png --]
[-- Type: image/png, Size: 19573 bytes --]

[-- Attachment #3: space-saving-contribution.png --]
[-- Type: image/png, Size: 22370 bytes --]

[-- Attachment #4: size-deduplicated.png --]
[-- Type: image/png, Size: 23091 bytes --]

[-- Attachment #5: nlink-small.png --]
[-- Type: image/png, Size: 20596 bytes --]

[-- Attachment #6: size-savings.png --]
[-- Type: image/png, Size: 19990 bytes --]

[-- Attachment #7: size.png --]
[-- Type: image/png, Size: 22996 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-11-22  2:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2016-12-09 22:43 ` Ludovic Courtès
2016-12-11 13:46 ` Ludovic Courtès
2016-12-11 14:23   ` Mark H Weaver
2016-12-11 18:02     ` Ludovic Courtès
2016-12-11 19:27       ` Mark H Weaver
2016-12-13  0:00         ` Ludovic Courtès
2016-12-13 12:48           ` Mark H Weaver
2016-12-13 17:02             ` Ludovic Courtès
2016-12-13 17:18               ` Ricardo Wurmus
2020-04-16 13:26                 ` Ricardo Wurmus
2020-04-16 14:27                   ` Ricardo Wurmus
2020-04-17  8:16                     ` Ludovic Courtès
2020-04-17  8:28                       ` Ricardo Wurmus
2016-12-13  4:09         ` Mark H Weaver
2016-12-15  1:19           ` Mark H Weaver
2021-11-09 14:44 ` Ludovic Courtès
2021-11-09 15:00   ` Ludovic Courtès
2021-11-11 20:59   ` Maxim Cournoyer
2021-11-13 16:56     ` Ludovic Courtès
2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
2021-11-16 13:54           ` bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2021-11-13 21:45       ` Ludovic Courtès
2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).