From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bug-guix-bounces+larch=yhetil.org@gnu.org>
Received: from mp10.migadu.com ([2001:41d0:306:2d92::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms9.migadu.com with LMTPS
	id oKJyHd5Q9mRtBQAAG6o9tA:P1
	(envelope-from <bug-guix-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Mon, 04 Sep 2023 23:49:18 +0200
Received: from aspmx1.migadu.com ([2001:41d0:306:2d92::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp10.migadu.com with LMTPS
	id oKJyHd5Q9mRtBQAAG6o9tA
	(envelope-from <bug-guix-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Mon, 04 Sep 2023 23:49:18 +0200
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 0C94D433BA
	for <larch@yhetil.org>; Mon,  4 Sep 2023 23:49:18 +0200 (CEST)
Authentication-Results: aspmx1.migadu.com;
	dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b="Ec4/3vhf";
	spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org";
	dmarc=pass (policy=none) header.from=gnu.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1693864158;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:mime-version:mime-version:
	 content-type:content-type:resent-cc:resent-from:resent-sender:
	 resent-message-id:in-reply-to:in-reply-to:references:references:
	 list-id:list-help:list-unsubscribe:list-subscribe:list-post:
	 dkim-signature; bh=BurYVm1rImaTmYgUS5+9Jz3dIfEvZ64VE5iDtpY6cmI=;
	b=Wtwghogq4vD0HYR+ATDUcGdg+gY/GfCldYRTLtwH2WPziOnlS8M8XVgmwnMUum/BPbTyKV
	7LMYjL84oCsgDtHM7ikYX8s5ZjUra22vndTmh4ACBxyU5b9sOy1TidHNnOzfO4zJukcy0U
	C8M7knWwpJg4O5xAtI+Qts1fLe70Fakf6R1YEUyeIEN+qvxHRFoyjPElCsopOjKFyeZMoI
	YFfqv2Gzu6EHX1zjZtFtMc2MbziCl1SmU+algQZzAGGuGenRI9TQBjnHzhdxjJxjXqRbHN
	KHah1E0q3fv1P9Wj0JWw0TRDFs/EYQjCRoaGYgR51soD8OafiiQ2LQmC+Avfyw==
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1693864158; a=rsa-sha256; cv=none;
	b=uPgqqFkyUmlX9obz8MxqgIAtR5bW8u5odyAiT187Y+QWwYh4NO4ykf7/+NZwmdAQkGtqlG
	DeflQEMDATCf/5c/3tKGRfDhLDXtNJ8EtTePQTjnx7nKYaa83QKhg4DbxiVvg6881T3YrW
	QBTDaSftLxpCnpn8PHwc2h5NYxFXL6AU6OhS8t7VhozG6+gfVnQRCm21FuELb7DyGn3JwU
	b7RZRgKti4GIIXEVlE2IQW2APfGjc/cV3q2jC0H1pCncxO7NVqclJLoSTqzfvXKj4/QC65
	eMBf6xt01DSsCRjCw+XFMJ9zuhk9fTu7nNbcDb1Q8RKcxuCkV6Lsivd6SzSDFQ==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b="Ec4/3vhf";
	spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org";
	dmarc=pass (policy=none) header.from=gnu.org
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <bug-guix-bounces@gnu.org>)
	id 1qdHRQ-0007za-MJ; Mon, 04 Sep 2023 17:49:04 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1qdHRO-0007zK-Po
 for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400
Received: from debbugs.gnu.org ([2001:470:142:5::43])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1qdHRO-0006yb-HR
 for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
 (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1qdHRO-0003z5-Ai
 for bug-guix@gnu.org; Mon, 04 Sep 2023 17:49:02 -0400
X-Loop: help-debbugs@gnu.org
Subject: bug#65720: Guile-Git-managed checkouts grow way too much
Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@gnu.org>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-guix@gnu.org
Resent-Date: Mon, 04 Sep 2023 21:49:02 +0000
Resent-Message-ID: <handler.65720.B65720.169386408315013@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 65720
X-GNU-PR-Package: guix
X-GNU-PR-Keywords: 
To: 65720@debbugs.gnu.org
Received: via spool by 65720-submit@debbugs.gnu.org id=B65720.169386408315013
 (code B ref 65720); Mon, 04 Sep 2023 21:49:02 +0000
Received: (at 65720) by debbugs.gnu.org; 4 Sep 2023 21:48:03 +0000
Received: from localhost ([127.0.0.1]:52859 helo=debbugs.gnu.org)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
 id 1qdHQQ-0003u2-Hy
 for submit@debbugs.gnu.org; Mon, 04 Sep 2023 17:48:03 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:43088)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@gnu.org>) id 1qdHQM-0003sn-7m
 for 65720@debbugs.gnu.org; Mon, 04 Sep 2023 17:48:01 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@gnu.org>) id 1qdHQG-0006vJ-Lm
 for 65720@debbugs.gnu.org; Mon, 04 Sep 2023 17:47:52 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=BurYVm1rImaTmYgUS5+9Jz3dIfEvZ64VE5iDtpY6cmI=; b=Ec4/3vhf5zNdNW4lKvpy
 4XoBmWZ7JEUZt0CvLm9yNWuCgOJxEBWGa3ratQojSKu6/QiCo0xWe2muZVgsAQCCTWfyVNtLqAHKF
 DAOJ/s5K4mWL5ZASq59Fa8Lq+8O8w72B7rdh4yMk1RbmjXQm80F3a/25EHfP/buVA5QMtN8s6oRye
 U3YnB3ZZ7e+NwO+ZS62u/3SJCaOV9xUvkoVHCoLIODb+GfJn+2nIktYkugnmVUBXREel7MpSCI4fq
 Sk/ncFUDJgI6LMCvtoV2iMcZzu0OJHGXg0By+5B2qk9criXQf+mLk1DFLvHKLk2NyuLBbcs59Q1yq
 kO4HjFO8jDdi6w==;
From: Ludovic =?UTF-8?Q?Court=C3=A8s?= <ludo@gnu.org>
References: <87bkejc7go.fsf@inria.fr>
Date: Mon, 04 Sep 2023 23:47:49 +0200
In-Reply-To: <87bkejc7go.fsf@inria.fr> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s
 message of "Sun, 03 Sep 2023 22:44:39 +0200")
Message-ID: <87fs3tobju.fsf@gnu.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-BeenThere: bug-guix@gnu.org
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
 <mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-guix>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
 <mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org
Sender: bug-guix-bounces+larch=yhetil.org@gnu.org
X-Migadu-Country: US
X-Migadu-Flow: FLOW_IN
X-Migadu-Scanner: mx2.migadu.com
X-Spam-Score: -6.92
X-Migadu-Queue-Id: 0C94D433BA
X-Migadu-Spam-Score: -6.92
X-TUID: mtnBb/F+s+ln

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Ludovic Court=C3=A8s <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of =E2=80=98hpcguix-web=E2=
=80=99),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here=E2=80=99s the same =E2=80=98.git=E2=80=99 managed with Guil=
e-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27=
shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlay=
wmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

Unsurprisingly, GC makes a big difference:

--8<---------------cut here---------------start------------->8---
$ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj=
7apsnalwq /tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M    /tmp/checkout
--8<---------------cut here---------------end--------------->8---

> It would seem that libgit2 doesn=E2=80=99t do the equivalent of =E2=80=98=
git gc=E2=80=99.

Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running =E2=80=98git gc=E2=80=99 on=
 them if
=E2=80=98git=E2=80=99 is available.

I can=E2=80=99t think of a good heuristic for (1).  Birth time could be one=
, but
we=E2=80=99d need statx(2):

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7=
apsnalwq | tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
 Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---

Lacking statx(2), we can approximate creation time by looking at
=E2=80=98.git/config=E2=80=99:

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7=
apsnalwq/.git/config | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
 Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---

This strategy can be implemented like this:


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline

diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
 
   ;; Use the mtime rather than the atime to cope with file systems mounted
   ;; with 'noatime'.
-  (file-expiration-time (* 90 24 3600) stat:mtime))
+  (let ((ttl (* 90 24 3600))
+        (max-checkout-retention (* 9 30 24 3600)))
+    (lambda (file)
+      (match (false-if-exception (lstat file))
+        (#f 0)                     ;FILE may have been deleted in the meantime
+        (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+                 (pk 'maxttl (match (false-if-exception
+                          (lstat (in-vicinity file ".git/config")))
+                    (#f +inf.0)
+                    (st (+ (stat:mtime st) max-checkout-retention))))))))))
 
 (define %checkout-cache-cleanup-period
   ;; Period for the removal of expired cached checkouts.

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


Namely, a cached checkout as considered as =E2=80=9Cexpired=E2=80=9D after =
9 months.  In
my case, it gives this:

--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/chec=
kouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 =3D 1651827028
--8<---------------cut here---------------end--------------->8---

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I=E2=80=99m
specifically thinking about the Guix repo, which every users copies via
=E2=80=98guix pull=E2=80=99).

Thoughts?

Thanks,
Ludo=E2=80=99.

--=-=-=--