unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 24937@debbugs.gnu.org
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Thu, 16 Apr 2020 16:27:27 +0200	[thread overview]
Message-ID: <87eesnmrow.fsf@elephly.net> (raw)
In-Reply-To: <87ftd3muhp.fsf@elephly.net>

[-- Attachment #1: Type: text/plain, Size: 6110 bytes --]

Here are more benchmarks on one of the build nodes.  It doesn’t nearly
have as many used inodes as ci.guix.gnu.org, but I could fill it up if
necessary.

  root@hydra-guix-127 ~# df -i /gnu/
  Filesystem       Inodes   IUsed    IFree IUse% Mounted on
  /dev/sda3      28950528 2796829 26153699   10% /

  root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l
  2017395

I tested all three modes with statx and with lstat.  The
links-traversal-statx.c is attached below.

* mode 1 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversal
links-traversal-statx.c:53:8: warning: �stat_entries� defined but not used [-Wunused-function]
   53 |   void stat_entries (void)
      |        ^~~~~~~~~~~~
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 9 seconds (including stat)

real    0m9.176s
user    0m0.801s
sys     0m4.236s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.556s
user    0m0.708s
sys     0m2.848s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.553s
user    0m0.599s
sys     0m2.954s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---


* mode 2 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 10 seconds (including stat)

real    0m9.598s
user    0m1.210s
sys     0m4.257s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.094s
user    0m0.988s
sys     0m3.107s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.095s
user    0m0.933s
sys     0m3.162s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---


* mode 3 + statx

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversal^C
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 7 seconds
stat took 3 seconds

real    0m9.992s
user    0m1.411s
sys     0m4.221s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 1 seconds
stat took 2 seconds

real    0m4.265s
user    0m1.120s
sys     0m3.145s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 2 seconds
stat took 2 seconds

real    0m4.267s
user    0m1.072s
sys     0m3.195s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

Now with just lstat:

* mode 1 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversal
links-traversal.c:49:8: warning: �stat_entries� defined but not used [-Wunused-function]
   49 |   void stat_entries (void)
      |        ^~~~~~~~~~~~
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 9 seconds (including stat)

real    0m9.303s
user    0m0.748s
sys     0m4.397s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 4 seconds (including stat)

real    0m3.526s
user    0m0.540s
sys     0m2.987s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 3 seconds (including stat)

real    0m3.519s
user    0m0.600s
sys     0m2.919s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

* mode 2 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 9 seconds (including stat)

real    0m9.614s
user    0m1.205s
sys     0m4.250s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.060s
user    0m1.052s
sys     0m3.008s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
17377 dir_entries, 4 seconds (including stat)

real    0m4.057s
user    0m0.984s
sys     0m3.073s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

* mode 3 + lstat

--8<---------------cut here---------------start------------->8---
root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversal
root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_caches
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 6 seconds
stat took 3 seconds

real    0m9.767s
user    0m1.270s
sys     0m4.339s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 2 seconds
stat took 2 seconds

real    0m4.234s
user    0m1.136s
sys     0m3.097s
root@hydra-guix-127 ~ [env]# time ./links-traversal 
2017397 dir_entries, 1 seconds
stat took 2 seconds

real    0m4.222s
user    0m1.052s
sys     0m3.170s
root@hydra-guix-127 ~ [env]# 
--8<---------------cut here---------------end--------------->8---

They are all very close, so I think I need to work with a bigger store
to see a difference.

Or perhaps I did something silly because I don’t know C…  If so please
let me know.

--
Ricardo


[-- Attachment #2: links-traversal-statx.c --]
[-- Type: application/octet-stream, Size: 2327 bytes --]

#include <unistd.h>
#include <dirent.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>



#define STAT_INTERLEAVED 1
#define STAT_SEMI_INTERLEAVED 2
#define STAT_OPTIMAL 3

struct entry
{
  char *name;
  ino_t inode;
};

#define MAX_ENTRIES 135000000
static struct entry dir_entries[MAX_ENTRIES];

int
main ()
{
  struct timeval start, end;

  /* For useful timings, do:
     sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'  */
  gettimeofday (&start, NULL);
  DIR *links = opendir ("/gnu/store/.links");

  size_t count = 0;

#if MODE != STAT_INTERLEAVED
  void sort_entries (void)
  {
    int entry_lower (const void *a, const void *b)
    {
      return ((struct entry *)a)->inode < ((struct entry *)b)->inode;
    }

    qsort (dir_entries, count, sizeof (struct entry),
	   entry_lower);
  }
#endif

  void stat_entries (void)
  {
    for (size_t i = 0; i < count; i++)
      {
	struct statx st;
	statx(AT_FDCWD, dir_entries[i].name,
              AT_SYMLINK_NOFOLLOW | AT_STATX_DONT_SYNC,
              STATX_SIZE | STATX_NLINK, &st);
	//lstat (dir_entries[i].name, &st);
      }
  }

  for (struct dirent *entry = readdir (links);
       entry != NULL;
       entry = readdir (links))
    {
      assert (count < MAX_ENTRIES);
      dir_entries[count].name = strdup (entry->d_name);
      dir_entries[count].inode = entry->d_ino;
#if MODE == STAT_INTERLEAVED
      struct statx st;
      statx(AT_FDCWD, entry->d_name,
            AT_SYMLINK_NOFOLLOW | AT_STATX_DONT_SYNC, STATX_SIZE | STATX_NLINK, &st);

      //lstat (entry->d_name, &st);
#endif

#if MODE == STAT_SEMI_INTERLEAVED
      if (count++ >= 100000)
	{
	  sort_entries ();
	  stat_entries ();
	  count = 0;
	}
#else
      count++;
#endif
    }

#if MODE == STAT_SEMI_INTERLEAVED
  sort_entries ();
  stat_entries ();
#endif

  gettimeofday (&end, NULL);
  printf ("%zi dir_entries, %zi seconds"
#if MODE != STAT_OPTIMAL
	  " (including stat)"
#endif
	  "\n", count,
	  end.tv_sec - start.tv_sec);

#if MODE == STAT_OPTIMAL
  sort_entries ();
  gettimeofday (&start, NULL);
  stat_entries ();
  gettimeofday (&end, NULL);

  printf ("stat took %zi seconds\n", end.tv_sec - start.tv_sec);
#endif

  return EXIT_SUCCESS;
}

  reply	other threads:[~2020-04-16 14:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2016-12-09 22:43 ` Ludovic Courtès
2016-12-11 13:46 ` Ludovic Courtès
2016-12-11 14:23   ` Mark H Weaver
2016-12-11 18:02     ` Ludovic Courtès
2016-12-11 19:27       ` Mark H Weaver
2016-12-13  0:00         ` Ludovic Courtès
2016-12-13 12:48           ` Mark H Weaver
2016-12-13 17:02             ` Ludovic Courtès
2016-12-13 17:18               ` Ricardo Wurmus
2020-04-16 13:26                 ` Ricardo Wurmus
2020-04-16 14:27                   ` Ricardo Wurmus [this message]
2020-04-17  8:16                     ` Ludovic Courtès
2020-04-17  8:28                       ` Ricardo Wurmus
2016-12-13  4:09         ` Mark H Weaver
2016-12-15  1:19           ` Mark H Weaver
2021-11-09 14:44 ` Ludovic Courtès
2021-11-09 15:00   ` Ludovic Courtès
2021-11-11 20:59   ` Maxim Cournoyer
2021-11-13 16:56     ` Ludovic Courtès
2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
2021-11-16 13:54           ` bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2021-11-13 21:45       ` Ludovic Courtès
2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eesnmrow.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=24937@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).