From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id oKJyHd5Q9mRtBQAAG6o9tA:P1 (envelope-from ) for ; Mon, 04 Sep 2023 23:49:18 +0200 Received: from aspmx1.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id oKJyHd5Q9mRtBQAAG6o9tA (envelope-from ) for ; Mon, 04 Sep 2023 23:49:18 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0C94D433BA for ; Mon, 4 Sep 2023 23:49:18 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b="Ec4/3vhf"; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1693864158; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=BurYVm1rImaTmYgUS5+9Jz3dIfEvZ64VE5iDtpY6cmI=; b=Wtwghogq4vD0HYR+ATDUcGdg+gY/GfCldYRTLtwH2WPziOnlS8M8XVgmwnMUum/BPbTyKV 7LMYjL84oCsgDtHM7ikYX8s5ZjUra22vndTmh4ACBxyU5b9sOy1TidHNnOzfO4zJukcy0U C8M7knWwpJg4O5xAtI+Qts1fLe70Fakf6R1YEUyeIEN+qvxHRFoyjPElCsopOjKFyeZMoI YFfqv2Gzu6EHX1zjZtFtMc2MbziCl1SmU+algQZzAGGuGenRI9TQBjnHzhdxjJxjXqRbHN KHah1E0q3fv1P9Wj0JWw0TRDFs/EYQjCRoaGYgR51soD8OafiiQ2LQmC+Avfyw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1693864158; a=rsa-sha256; cv=none; b=uPgqqFkyUmlX9obz8MxqgIAtR5bW8u5odyAiT187Y+QWwYh4NO4ykf7/+NZwmdAQkGtqlG DeflQEMDATCf/5c/3tKGRfDhLDXtNJ8EtTePQTjnx7nKYaa83QKhg4DbxiVvg6881T3YrW QBTDaSftLxpCnpn8PHwc2h5NYxFXL6AU6OhS8t7VhozG6+gfVnQRCm21FuELb7DyGn3JwU b7RZRgKti4GIIXEVlE2IQW2APfGjc/cV3q2jC0H1pCncxO7NVqclJLoSTqzfvXKj4/QC65 eMBf6xt01DSsCRjCw+XFMJ9zuhk9fTu7nNbcDb1Q8RKcxuCkV6Lsivd6SzSDFQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b="Ec4/3vhf"; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qdHRQ-0007za-MJ; Mon, 04 Sep 2023 17:49:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qdHRO-0007zK-Po for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qdHRO-0006yb-HR for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qdHRO-0003z5-Ai for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#65720: Guile-Git-managed checkouts grow way too much Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 04 Sep 2023 21:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65720 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 65720@debbugs.gnu.org Received: via spool by 65720-submit@debbugs.gnu.org id=B65720.169386408315013 (code B ref 65720); Mon, 04 Sep 2023 21:49:02 +0000 Received: (at 65720) by debbugs.gnu.org; 4 Sep 2023 21:48:03 +0000 Received: from localhost ([127.0.0.1]:52859 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qdHQQ-0003u2-Hy for submit@debbugs.gnu.org; Mon, 04 Sep 2023 17:48:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43088) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qdHQM-0003sn-7m for 65720@debbugs.gnu.org; Mon, 04 Sep 2023 17:48:01 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qdHQG-0006vJ-Lm for 65720@debbugs.gnu.org; Mon, 04 Sep 2023 17:47:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=BurYVm1rImaTmYgUS5+9Jz3dIfEvZ64VE5iDtpY6cmI=; b=Ec4/3vhf5zNdNW4lKvpy 4XoBmWZ7JEUZt0CvLm9yNWuCgOJxEBWGa3ratQojSKu6/QiCo0xWe2muZVgsAQCCTWfyVNtLqAHKF DAOJ/s5K4mWL5ZASq59Fa8Lq+8O8w72B7rdh4yMk1RbmjXQm80F3a/25EHfP/buVA5QMtN8s6oRye U3YnB3ZZ7e+NwO+ZS62u/3SJCaOV9xUvkoVHCoLIODb+GfJn+2nIktYkugnmVUBXREel7MpSCI4fq Sk/ncFUDJgI6LMCvtoV2iMcZzu0OJHGXg0By+5B2qk9criXQf+mLk1DFLvHKLk2NyuLBbcs59Q1yq kO4HjFO8jDdi6w==; From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87bkejc7go.fsf@inria.fr> Date: Mon, 04 Sep 2023 23:47:49 +0200 In-Reply-To: <87bkejc7go.fsf@inria.fr> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Sun, 03 Sep 2023 22:44:39 +0200") Message-ID: <87fs3tobju.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Scanner: mx2.migadu.com X-Spam-Score: -6.92 X-Migadu-Queue-Id: 0C94D433BA X-Migadu-Spam-Score: -6.92 X-TUID: mtnBb/F+s+ln --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ludovic Court=C3=A8s skribis: > As reported by Tobias on IRC (in the context of =E2=80=98hpcguix-web=E2= =80=99), > checkouts managed by Guile-Git appear to grow beyond reason. As an > example, here=E2=80=99s the same =E2=80=98.git=E2=80=99 managed with Guil= e-Git and with Git: > > $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27= shj7apsnalwq > 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlay= wmisyr27shj7apsnalwq > $ du -hs .git > 517M .git Unsurprisingly, GC makes a big difference: --8<---------------cut here---------------start------------->8--- $ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj= 7apsnalwq /tmp/checkout $ (cd /tmp/checkout/; git gc) Enumerating objects: 717785, done. Counting objects: 100% (717785/717785), done. Delta compression using up to 4 threads Compressing objects: 100% (154644/154644), done. Writing objects: 100% (717785/717785), done. Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0 Enumerating cruft objects: 103412, done. Traversing cruft objects: 81753, done. Counting objects: 100% (64171/64171), done. Delta compression using up to 4 threads Compressing objects: 100% (17379/17379), done. Writing objects: 100% (64171/64171), done. Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0 Expanding reachable commits in commit graph: 133730, done. $ du -hs /tmp/checkout 539M /tmp/checkout --8<---------------cut here---------------end--------------->8--- > It would seem that libgit2 doesn=E2=80=99t do the equivalent of =E2=80=98= git gc=E2=80=99. Confirmed: . My inclination for the short term would be to work around this limitation by (1) finding a heuristic to determine is a checkout has likely accumulated too much cruft, and (2) considering such checkouts as expired (thereby forcing a re-clone) or running =E2=80=98git gc=E2=80=99 on= them if =E2=80=98git=E2=80=99 is available. I can=E2=80=99t think of a good heuristic for (1). Birth time could be one= , but we=E2=80=99d need statx(2): --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7= apsnalwq | tail -4 Access: 2023-09-04 23:13:54.668279105 +0200 Modify: 2023-09-04 11:34:41.665385000 +0200 Change: 2023-09-04 11:34:41.661629102 +0200 Birth: 2021-08-09 10:48:17.748722151 +0200 --8<---------------cut here---------------end--------------->8--- Lacking statx(2), we can approximate creation time by looking at =E2=80=98.git/config=E2=80=99: --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7= apsnalwq/.git/config | tail -3 Modify: 2021-08-09 10:50:28.031760953 +0200 Change: 2021-08-09 10:50:28.031760953 +0200 Birth: 2021-08-09 10:50:28.031760953 +0200 --8<---------------cut here---------------end--------------->8--- This strategy can be implemented like this: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/guix/git.scm b/guix/git.scm index ebe2600209..ed3fa56bc8 100644 --- a/guix/git.scm +++ b/guix/git.scm @@ -405,7 +405,16 @@ (define cached-checkout-expiration ;; Use the mtime rather than the atime to cope with file systems mounted ;; with 'noatime'. - (file-expiration-time (* 90 24 3600) stat:mtime)) + (let ((ttl (* 90 24 3600)) + (max-checkout-retention (* 9 30 24 3600))) + (lambda (file) + (match (false-if-exception (lstat file)) + (#f 0) ;FILE may have been deleted in the meantime + (st (min (pk 'ttl (+ (stat:mtime st) ttl)) + (pk 'maxttl (match (false-if-exception + (lstat (in-vicinity file ".git/config"))) + (#f +inf.0) + (st (+ (stat:mtime st) max-checkout-retention)))))))))) (define %checkout-cache-cleanup-period ;; Period for the removal of expired cached checkouts. --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Namely, a cached checkout as considered as =E2=80=9Cexpired=E2=80=9D after = 9 months. In my case, it gives this: --8<---------------cut here---------------start------------->8--- scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/chec= kouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/") ;;; (ttl 1701596081) ;;; (maxttl 1651827028) $6 =3D 1651827028 --8<---------------cut here---------------end--------------->8--- Of course having to re-clone entire repositories every 9 months is ridiculous, but storing gigabytes of packs is worse IMO (I=E2=80=99m specifically thinking about the Guix repo, which every users copies via =E2=80=98guix pull=E2=80=99). Thoughts? Thanks, Ludo=E2=80=99. --=-=-=--