From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#24937: "deleting unused links" GC phase is too slow Date: Sun, 11 Dec 2016 19:02:42 +0100 Message-ID: <87twaa2vjx.fsf@gnu.org> References: <87wpg7ffbm.fsf@gnu.org> <87lgvm4lzu.fsf@gnu.org> <87twaaa6j9.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:54961) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cG8Sn-0001Ar-3m for bug-guix@gnu.org; Sun, 11 Dec 2016 13:03:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cG8Sk-0002wM-7H for bug-guix@gnu.org; Sun, 11 Dec 2016 13:03:05 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:51196) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cG8Sk-0002w7-4b for bug-guix@gnu.org; Sun, 11 Dec 2016 13:03:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cG8Sj-00033w-TN for bug-guix@gnu.org; Sun, 11 Dec 2016 13:03:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87twaaa6j9.fsf@netris.org> (Mark H. Weaver's message of "Sun, 11 Dec 2016 09:23:38 -0500") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 24937@debbugs.gnu.org Mark H Weaver skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> Here=E2=80=99s a proposed patch that follows your suggestion, Mark, but = places >> an upper bound on the number of directory entries loaded in memory. >> >> On my laptop, which has ~500k entries in /gnu/store/.links, the result >> is something like this (notice the inode numbers in =E2=80=98lstat=E2=80= =99 calls): >> >> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = =3D 48 >> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXE= C) =3D 8 >> 13738 fstat(8, {st_dev=3Dmakedev(8, 2), st_ino=3D4014083, st_mode=3DS_IF= DIR|0755, st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_block= s=3D119224, st_size=3D60977152, st_atime=3D2016/12/11-12:01:59, st_mtime=3D= 2016/12/11-09:39:45, st_ctime=3D2016/12/11-09:39:45}) =3D 0 >> 13738 getdents(8, {{d_ino=3D4014083, d_off=3D4294967296, d_reclen=3D24, = d_name=3D"."} >> [...] >> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9j= s2fcviv2rnc", {st_dev=3Dmakedev(8, 2), st_ino=3D47, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D88, s= t_size=3D41482, st_atime=3D2015/03/10-11:29:06, st_mtime=3D1970/01/01-01:00= :01, st_ctime=3D2015/11/25-11:29:20}) =3D 0 >> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz= 7n6abbws8px", {st_dev=3Dmakedev(8, 2), st_ino=3D65, st_mode=3DS_IFREG|0444,= st_nlink=3D9, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D8, st= _size=3D2313, st_atime=3D2016/12/08-21:02:26, st_mtime=3D1970/01/01-01:00:0= 1, st_ctime=3D2016/12/11-01:44:49}) =3D 0 >> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2= alpdyx3yqz2", {st_dev=3Dmakedev(8, 2), st_ino=3D68, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D32, s= t_size=3D13561, st_atime=3D2015/03/10-11:29:06, st_mtime=3D1970/01/01-01:00= :01, st_ctime=3D2015/11/25-11:29:20}) =3D 0 >> [...] >> 13738 getdents(8, {{d_ino=3D4257114, d_off=3D1734093409898198492, >> [...] >> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6= v6zgkln7f7m", {st_dev=3Dmakedev(8, 2), st_ino=3D19, st_mode=3DS_IFREG|0444,= st_nlink=3D4, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D8, st= _size=3D2553, st_atime=3D2016/12/08-21:02:54, st_mtime=3D1970/01/01-01:00:0= 1, st_ctime=3D2016/12/07-00:05:19}) =3D 0 >> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqri= d2r6kk1d4kb", {st_dev=3Dmakedev(8, 2), st_ino=3D30, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D8, st= _size=3D2090, st_atime=3D2015/03/10-11:29:06, st_mtime=3D1970/01/01-01:00:0= 1, st_ctime=3D2015/11/25-11:29:21}) =3D 0 >> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xr= gn9y91lfvc2", {st_dev=3Dmakedev(8, 2), st_ino=3D33, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D16, s= t_size=3D7958, st_atime=3D2015/11/04-18:55:32, st_mtime=3D1970/01/01-01:00:= 01, st_ctime=3D2016/01/05-11:35:49}) =3D 0 >> [...] >> 13738 getdents(8, {{d_ino=3D328672, >> [...] >> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fd= lwajqd73lnz", {st_dev=3Dmakedev(8, 2), st_ino=3D21, st_mode=3DS_IFREG|0444,= st_nlink=3D31, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D8, s= t_size=3D615, st_atime=3D2016/12/08-21:02:30, st_mtime=3D1970/01/01-01:00:0= 1, st_ctime=3D2016/12/11-01:44:47}) =3D 0 >> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkc= gkx0jirpxsg", {st_dev=3Dmakedev(8, 2), st_ino=3D48, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D360, = st_size=3D176750, st_atime=3D2015/03/10-11:29:06, st_mtime=3D1970/01/01-01:= 00:01, st_ctime=3D2015/11/25-11:29:20}) =3D 0 >> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa= 01zwp06d3f0", {st_dev=3Dmakedev(8, 2), st_ino=3D61, st_mode=3DS_IFREG|0444,= st_nlink=3D2, st_uid=3D0, st_gid=3D0, st_blksize=3D4096, st_blocks=3D8, st= _size=3D1440, st_atime=3D2016/11/03-20:37:50, st_mtime=3D1970/01/01-01:00:0= 1, st_ctime=3D2016/11/07-00:01:57}) =3D 0 >> >> I can=E2=80=99t tell exactly how this affects performance because my mac= hine has >> an SSD where this operation takes ~3 seconds on a cold cache. I suspect >> it has performance comparable to that of doing all the =E2=80=98readdir= =E2=80=99 at once >> followed by all the =E2=80=98lstat=E2=80=99. >> >> Mark, how does that sound? > > I think we should sort the entire directory using merge sort backed to > disk files. If we load chunks of the directory, sort them and process > them individually, I expect that this will increase the amount of I/O > required by a non-trivial factor. In each pass, we would load blocks of > inodes from disk, almost all of which are likely to be present in the > store and thus linked from the directory, but in this scheme we will > process only a small number of them and drop the rest on the floor to be > read again in the next pass. Given that even my fairly optimal > implementation takes about 35 minutes to run on Hydra, I'd prefer to > avoid multiplying that by a non-trivial factor. Sure, though it=E2=80=99s not obvious to me how much of a difference it mak= es; my guess is that processing in large chunks is already a win, but we=E2=80= =99d have to measure. > Why not just use GNU sort? It already exists, and does exactly what we > need. Does =E2=80=98sort=E2=80=99 manage to avoid reading whole files in memory? = If not, I think it wouldn=E2=80=99t buy us anything to use it. > If you object to using an external program for some reason, I would > prefer to re-implement a similar algorithm in the daemon. Yeah, I=E2=80=99d rather avoid serializing the list of file names/inode num= ber pairs just to invoke =E2=80=98sort=E2=80=99 on that. Also, what algorithm are you referring to? Thanks for the quick feedback! Ludo=E2=80=99.