all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: 65720@debbugs.gnu.org
Subject: bug#65720: Guile-Git-managed checkouts grow way too much
Date: Mon, 04 Sep 2023 23:47:49 +0200	[thread overview]
Message-ID: <87fs3tobju.fsf@gnu.org> (raw)
In-Reply-To: <87bkejc7go.fsf@inria.fr> ("Ludovic Courtès"'s message of "Sun, 03 Sep 2023 22:44:39 +0200")

[-- Attachment #1: Type: text/plain, Size: 3006 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

Unsurprisingly, GC makes a big difference:

--8<---------------cut here---------------start------------->8---
$ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M    /tmp/checkout
--8<---------------cut here---------------end--------------->8---

> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.

I can’t think of a good heuristic for (1).  Birth time could be one, but
we’d need statx(2):

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
 Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---

Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
 Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---

This strategy can be implemented like this:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 942 bytes --]

diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
 
   ;; Use the mtime rather than the atime to cope with file systems mounted
   ;; with 'noatime'.
-  (file-expiration-time (* 90 24 3600) stat:mtime))
+  (let ((ttl (* 90 24 3600))
+        (max-checkout-retention (* 9 30 24 3600)))
+    (lambda (file)
+      (match (false-if-exception (lstat file))
+        (#f 0)                     ;FILE may have been deleted in the meantime
+        (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+                 (pk 'maxttl (match (false-if-exception
+                          (lstat (in-vicinity file ".git/config")))
+                    (#f +inf.0)
+                    (st (+ (stat:mtime st) max-checkout-retention))))))))))
 
 (define %checkout-cache-cleanup-period
   ;; Period for the removal of expired cached checkouts.

[-- Attachment #3: Type: text/plain, Size: 707 bytes --]


Namely, a cached checkout as considered as “expired” after 9 months.  In
my case, it gives this:

--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 = 1651827028
--8<---------------cut here---------------end--------------->8---

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).

Thoughts?

Thanks,
Ludo’.

  reply	other threads:[~2023-09-04 21:49 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
2023-09-04 21:47 ` Ludovic Courtès [this message]
2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
2023-09-05 14:18     ` Ludovic Courtès
2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
2023-09-08 17:08         ` Ludovic Courtès
2023-09-11  7:00           ` Csepp
2023-09-11  8:42           ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier
2023-09-11 14:42           ` bug#65720: Guile-Git-managed checkouts grow way too much wolf
2023-09-13 18:10             ` Ludovic Courtès
2023-09-13 22:36               ` Simon Tournier
2023-09-07  0:41       ` Simon Tournier
2023-09-08 17:09         ` Ludovic Courtès
2023-09-09 10:31           ` Simon Tournier
2023-09-11  7:06             ` Csepp
2023-09-11 14:37       ` Ludovic Courtès
2023-10-20 16:15         ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
2023-10-23 10:08           ` Simon Tournier
2023-10-23 22:27             ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2023-10-23 23:28               ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
2023-10-30 12:02           ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines
2023-11-14  9:19             ` Ludovic Courtès
2023-11-14  9:32               ` Simon Tournier
2023-11-16 12:12                 ` [bug#66650] " Ludovic Courtès
2023-11-16 13:24                   ` Simon Tournier
2023-11-22 11:17                     ` bug#65720: " Ludovic Courtès
2023-11-22 11:57                       ` [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
2023-11-22 11:57                         ` Simon Tournier
2023-11-22 16:00                         ` bug#66650: " Ludovic Courtès
2023-09-05  8:22   ` Jelle Licht
2023-09-05 14:20     ` Ludovic Courtès
2023-09-05 18:59   ` Simon Tournier
2023-09-05 14:11 ` Ludovic Courtès
2023-09-18 22:35 ` Ludovic Courtès
2023-09-19  7:19   ` Simon Tournier
2023-11-23 11:35 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fs3tobju.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=65720@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.