unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: 24937@debbugs.gnu.org
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Tue, 13 Dec 2016 18:02:18 +0100	[thread overview]
Message-ID: <87fulrsqxx.fsf@gnu.org> (raw)
In-Reply-To: <87wpf4yoz0.fsf@netris.org> (Mark H. Weaver's message of "Tue, 13 Dec 2016 07:48:19 -0500")

Hello Mark,

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> I did some measurements with the attached program on chapters, which is
>> a Xen VM with spinning disks underneath, similar to hydra.gnu.org.  It
>> has 600k entries in /gnu/store/.links.
>
> I just want to point out that 600k inodes use 150 megabytes of disk
> space on ext4, which is small enough to fit in the cache, so the disk
> I/O will not be multiplied for such a small test case.

Right.  That’s the only spinning-disk machine I could access without
problem.  :-/

Ricardo, Roel: would you be able to run that links-traversal.c from
<https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>
on a machine with a big store, as described at
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?

>> Semi-interleaved is ~12% slower here (not sure how reproducible that is
>> though).
>
> This directory you're testing on is more than an order of magnitude
> smaller than Hydra's when it's full.  Unlike in your test above, all of
> the inodes in Hydra's store won't fit in the cache.

Good point.  I’m trying my best to get performance figures, there’s no
doubt we could do better!

> In my opinion, the reason Hydra performs so poorly is because efficiency
> and scalability are apparently very low priorities in the design of the
> software running on it.  Unfortunately, I feel that my advice in this
> area is discarded more often than not.

Well, as you know, I’m currently traveling, yet I take the time to
answer your email at night; I think this should suggest that far from
discarding your advice, I very much value it.

I’m a maintainer though, so I’m trying to understand the problem better.
It’s not just about finding the “optimal” solution, but also about
finding a tradeoff between the benefits and the maintainability costs.

>> sort.c in Coreutils is very big, and we surely don’t want to duplicate
>> all that.  Yet, I’d rather not shell out to ‘sort’.
>
> The "shell" would not be involved here at all, just the "sort" program.
> I guess you dislike launching external processes?  Can you explain why?

I find that passing strings around among programs is inelegant
(subjective), but I don’t think you’re really looking to argue about
that, are you?  :-)

It remains that, if invoking ‘sort’ appears to be preferable *both* from
performance and maintenance viewpoints, then it’s a good choice.  That
may be the case, but again, I prefer to have figures to back that.

>> Do you know how many entries are in .links on hydra.gnu.org?
>
> "df -i /gnu" indicates that it currently has about 5.5M inodes, but
> that's with only 29% of the disk in use.  A few days ago, when the disk
> was full, assuming that the average file size is the same, it may have
> had closer to 5.5M / 0.29 ~= 19M inodes,

OK, good to know.

Thanks!

Ludo’.

  reply	other threads:[~2016-12-13 17:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-13 17:41 bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2016-12-09 22:43 ` Ludovic Courtès
2016-12-11 13:46 ` Ludovic Courtès
2016-12-11 14:23   ` Mark H Weaver
2016-12-11 18:02     ` Ludovic Courtès
2016-12-11 19:27       ` Mark H Weaver
2016-12-13  0:00         ` Ludovic Courtès
2016-12-13 12:48           ` Mark H Weaver
2016-12-13 17:02             ` Ludovic Courtès [this message]
2016-12-13 17:18               ` Ricardo Wurmus
2020-04-16 13:26                 ` Ricardo Wurmus
2020-04-16 14:27                   ` Ricardo Wurmus
2020-04-17  8:16                     ` Ludovic Courtès
2020-04-17  8:28                       ` Ricardo Wurmus
2016-12-13  4:09         ` Mark H Weaver
2016-12-15  1:19           ` Mark H Weaver
2021-11-09 14:44 ` Ludovic Courtès
2021-11-09 15:00   ` Ludovic Courtès
2021-11-11 20:59   ` Maxim Cournoyer
2021-11-13 16:56     ` Ludovic Courtès
2021-11-13 21:37       ` bug#24937: [PATCH 1/2] tests: Factorize 'file=?' Ludovic Courtès
2021-11-13 21:37         ` bug#24937: [PATCH 2/2] daemon: Do not deduplicate files smaller than 4 KiB Ludovic Courtès
2021-11-16 13:54           ` bug#24937: "deleting unused links" GC phase is too slow Ludovic Courtès
2021-11-13 21:45       ` Ludovic Courtès
2021-11-22  2:30 ` John Kehayias via Bug reports for GNU Guix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fulrsqxx.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=24937@debbugs.gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).