From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#24937: "deleting unused links" GC phase is too slow Date: Tue, 13 Dec 2016 18:02:18 +0100 Message-ID: <87fulrsqxx.fsf@gnu.org> References: <87wpg7ffbm.fsf@gnu.org> <87lgvm4lzu.fsf@gnu.org> <87twaaa6j9.fsf@netris.org> <87twaa2vjx.fsf@gnu.org> <87lgvm9sgq.fsf@netris.org> <87d1gwvgu0.fsf@gnu.org> <87wpf4yoz0.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:41919) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cGqTr-0003XV-To for bug-guix@gnu.org; Tue, 13 Dec 2016 12:03:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cGqTm-0006bx-5c for bug-guix@gnu.org; Tue, 13 Dec 2016 12:03:07 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:53772) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cGqTm-0006bt-1g for bug-guix@gnu.org; Tue, 13 Dec 2016 12:03:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cGqTl-0001FS-QX for bug-guix@gnu.org; Tue, 13 Dec 2016 12:03:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87wpf4yoz0.fsf@netris.org> (Mark H. Weaver's message of "Tue, 13 Dec 2016 07:48:19 -0500") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 24937@debbugs.gnu.org Hello Mark, Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> I did some measurements with the attached program on chapters, which is >> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It >> has 600k entries in /gnu/store/.links. > > I just want to point out that 600k inodes use 150 megabytes of disk > space on ext4, which is small enough to fit in the cache, so the disk > I/O will not be multiplied for such a small test case. Right. That=E2=80=99s the only spinning-disk machine I could access without problem. :-/ Ricardo, Roel: would you be able to run that links-traversal.c from on a machine with a big store, as described at ? >> Semi-interleaved is ~12% slower here (not sure how reproducible that is >> though). > > This directory you're testing on is more than an order of magnitude > smaller than Hydra's when it's full. Unlike in your test above, all of > the inodes in Hydra's store won't fit in the cache. Good point. I=E2=80=99m trying my best to get performance figures, there= =E2=80=99s no doubt we could do better! > In my opinion, the reason Hydra performs so poorly is because efficiency > and scalability are apparently very low priorities in the design of the > software running on it. Unfortunately, I feel that my advice in this > area is discarded more often than not. Well, as you know, I=E2=80=99m currently traveling, yet I take the time to answer your email at night; I think this should suggest that far from discarding your advice, I very much value it. I=E2=80=99m a maintainer though, so I=E2=80=99m trying to understand the pr= oblem better. It=E2=80=99s not just about finding the =E2=80=9Coptimal=E2=80=9D solution,= but also about finding a tradeoff between the benefits and the maintainability costs. >> sort.c in Coreutils is very big, and we surely don=E2=80=99t want to dup= licate >> all that. Yet, I=E2=80=99d rather not shell out to =E2=80=98sort=E2=80= =99. > > The "shell" would not be involved here at all, just the "sort" program. > I guess you dislike launching external processes? Can you explain why? I find that passing strings around among programs is inelegant (subjective), but I don=E2=80=99t think you=E2=80=99re really looking to ar= gue about that, are you? :-) It remains that, if invoking =E2=80=98sort=E2=80=99 appears to be preferabl= e *both* from performance and maintenance viewpoints, then it=E2=80=99s a good choice. T= hat may be the case, but again, I prefer to have figures to back that. >> Do you know how many entries are in .links on hydra.gnu.org? > > "df -i /gnu" indicates that it currently has about 5.5M inodes, but > that's with only 29% of the disk in use. A few days ago, when the disk > was full, assuming that the average file size is the same, it may have > had closer to 5.5M / 0.29 ~=3D 19M inodes, OK, good to know. Thanks! Ludo=E2=80=99.