unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65720: Guile-Git-managed checkouts grow way too much
@ 2023-09-03 20:44 Ludovic Courtès
  2023-09-04 21:47 ` Ludovic Courtès
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-03 20:44 UTC (permalink / raw)
  To: 65720

Hello!

As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
checkouts managed by Guile-Git appear to grow beyond reason.  As an
example, here’s the same ‘.git’ managed with Guile-Git and with Git:

--8<---------------cut here---------------start------------->8---
$ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ du -hs .git
517M    .git
--8<---------------cut here---------------end--------------->8---

It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
@ 2023-09-04 21:47 ` Ludovic Courtès
  2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
                     ` (2 more replies)
  2023-09-05 14:11 ` Ludovic Courtès
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-04 21:47 UTC (permalink / raw)
  To: 65720

[-- Attachment #1: Type: text/plain, Size: 3006 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

Unsurprisingly, GC makes a big difference:

--8<---------------cut here---------------start------------->8---
$ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M    /tmp/checkout
--8<---------------cut here---------------end--------------->8---

> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.

I can’t think of a good heuristic for (1).  Birth time could be one, but
we’d need statx(2):

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
 Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---------------cut here---------------end--------------->8---

Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:

--8<---------------cut here---------------start------------->8---
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
 Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---------------cut here---------------end--------------->8---

This strategy can be implemented like this:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 942 bytes --]

diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
 
   ;; Use the mtime rather than the atime to cope with file systems mounted
   ;; with 'noatime'.
-  (file-expiration-time (* 90 24 3600) stat:mtime))
+  (let ((ttl (* 90 24 3600))
+        (max-checkout-retention (* 9 30 24 3600)))
+    (lambda (file)
+      (match (false-if-exception (lstat file))
+        (#f 0)                     ;FILE may have been deleted in the meantime
+        (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+                 (pk 'maxttl (match (false-if-exception
+                          (lstat (in-vicinity file ".git/config")))
+                    (#f +inf.0)
+                    (st (+ (stat:mtime st) max-checkout-retention))))))))))
 
 (define %checkout-cache-cleanup-period
   ;; Period for the removal of expired cached checkouts.

[-- Attachment #3: Type: text/plain, Size: 707 bytes --]


Namely, a cached checkout as considered as “expired” after 9 months.  In
my case, it gives this:

--8<---------------cut here---------------start------------->8---
scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 = 1651827028
--8<---------------cut here---------------end--------------->8---

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).

Thoughts?

Thanks,
Ludo’.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-04 21:47 ` Ludovic Courtès
@ 2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
  2023-09-05 14:18     ` Ludovic Courtès
  2023-09-05  8:22   ` Jelle Licht
  2023-09-05 18:59   ` Simon Tournier
  2 siblings, 1 reply; 32+ messages in thread
From: Josselin Poiret via Bug reports for GNU Guix @ 2023-09-05  8:18 UTC (permalink / raw)
  To: Ludovic Courtès, 65720

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts as
> expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

I think using the git binary instead of libgit2 as a workaround is a
good idea.  We can consider building it directly as well, so that people
who don't have it in their profiles can still benefit from it.  We could
even consider using git commands in most places and using libgit2 only
where we really need the tight coupling.  IIUC, libgit2 is eternally
trying to catch up to git and often performs in a counter-intuitive way
(I expect the various bugs with stale deleted files in checkouts to be
caused by this).  Maybe it could also let us use bare repository and
directly extract the refs we want without having to mess with checkouts?

Best,
-- 
Josselin Poiret

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 682 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-04 21:47 ` Ludovic Courtès
  2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
@ 2023-09-05  8:22   ` Jelle Licht
  2023-09-05 14:20     ` Ludovic Courtès
  2023-09-05 18:59   ` Simon Tournier
  2 siblings, 1 reply; 32+ messages in thread
From: Jelle Licht @ 2023-09-05  8:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 65720


Hi Ludo,

> 
> On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote:
> 
> Of course having to re-clone entire repositories every 9 months is
> ridiculous, but storing gigabytes of packs is worse IMO (I’m
> specifically thinking about the Guix repo, which every users copies via
> ‘guix pull’).

Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location.

Kr, Jelle




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
  2023-09-04 21:47 ` Ludovic Courtès
@ 2023-09-05 14:11 ` Ludovic Courtès
  2023-09-18 22:35 ` Ludovic Courtès
  2023-11-23 11:35 ` Ludovic Courtès
  3 siblings, 0 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-05 14:11 UTC (permalink / raw)
  To: 65720

Ludovic Courtès <ludo@gnu.org> skribis:

> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq

Another data point, with Cuirass instances:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
65G     /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@berlin ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1
 Birth: 2022-07-30 23:15:45.582559879 +0200
--8<---------------cut here---------------end--------------->8---

… and:

--8<---------------cut here---------------start------------->8---
ludo@guix-hpc4 ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
86G     /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@guix-hpc4 ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1
  Créé : 2021-06-01 11:48:48.854669310 +0200
--8<---------------cut here---------------end--------------->8---

So yeah, problem we have.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
@ 2023-09-05 14:18     ` Ludovic Courtès
  2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-05 14:18 UTC (permalink / raw)
  To: Josselin Poiret; +Cc: 65720

Hi,

Josselin Poiret <dev@jpoiret.xyz> skribis:

> I think using the git binary instead of libgit2 as a workaround is a
> good idea.  We can consider building it directly as well, so that people
> who don't have it in their profiles can still benefit from it.  We could
> even consider using git commands in most places and using libgit2 only
> where we really need the tight coupling.

Surely you’d agree that it would suck though: depending on two Git
implementations because one doesn’t have a proper API and the other one
lacks a bunch of features.

It would also be pretty bad for closure size:

--8<---------------cut here---------------start------------->8---
$ guix size guile-git | tail -1
total: 106.6 MiB
$ guix size guile-git git-minimal | tail -1
total: 169.8 MiB
--8<---------------cut here---------------end--------------->8---

It’s also not clear concretely how we’d add that dependency.  Try
invoking ‘git’ from $PATH and print a warning if it doesn’t work?
But then, what about applications like Cuirass and hpcguix-web?

Tricky, tricky.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-05  8:22   ` Jelle Licht
@ 2023-09-05 14:20     ` Ludovic Courtès
  0 siblings, 0 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-05 14:20 UTC (permalink / raw)
  To: Jelle Licht; +Cc: 65720

Hello,

Jelle Licht <jlicht@posteo.net> skribis:

>> On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote:
>> 
>> Of course having to re-clone entire repositories every 9 months is
>> ridiculous, but storing gigabytes of packs is worse IMO (I’m
>> specifically thinking about the Guix repo, which every users copies via
>> ‘guix pull’).
>
> Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location.

Good question.

--8<---------------cut here---------------start------------->8---
scheme@(guix git)> ,use(git)
scheme@(guix git)> (clone "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/" "/tmp/fresh-clone")
$7 = #<git-repository ba4240>
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone")
6.7G	/tmp/fresh-clone
$8 = 0
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone/.git")
6.6G	/tmp/fresh-clone/.git
$9 = 0
scheme@(guix git)> (system* "du" "-hs" "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")
6.7G	/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/
$10 = 0
--8<---------------cut here---------------end--------------->8---

Conclusion: it makes no difference.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-04 21:47 ` Ludovic Courtès
  2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
  2023-09-05  8:22   ` Jelle Licht
@ 2023-09-05 18:59   ` Simon Tournier
  2 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-09-05 18:59 UTC (permalink / raw)
  To: Ludovic Courtès, 65720

Hi,

On Mon, 04 Sep 2023 at 23:47, Ludovic Courtès <ludo@gnu.org> wrote:

>> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.
>
> Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

Ouch!

        The goals of the project haven't changed, and neither have the
        tradeoffs. If one were to rewrite git-gc on top of libgit2, the
        best-case scenario is ending up with what we already had.

        If you want to use regular maintenance on some repostories, use
        git gc, that's what it's there for.

        https://github.com/libgit2/libgit2/issues/3247#issuecomment-152508040

> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts
> as expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

About (1) maybe we could add a “counter” and teach after X updates of
the checkout then let run (2).  Well, I guess the number of crufts is
more or less proportional with the number of checkout updates; that’s
the heuristic I would use.

The most annoying is (2).  Because forcing a re-clone does not appear to
me a solution; I prefer to waste disk space (and probably run myself and
manually ‘git gc’) than re-clone… Somehow this re-clone would always
happen when I am using a poor network.

Moreover, assuming this clean-up (2) would be run once every while, we
could imagine to invoke something like,

    guix shell -C git-minimal
         -- git
         -C ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
         gc

when the checkout is updated.  And maybe we could provide another “guix
pull” command-line option for turning off this and mark it as done
(reset the “counter”).

Well, that’s a poor solution but we can assume that git-minimal is at
worse available using “guix shell git-minimal”.  Note that the closure
of git-minimal is far less than re-cloning the full Guix repository.

Cheers,
simon




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-05 14:18     ` Ludovic Courtès
@ 2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
  2023-09-08 17:08         ` Ludovic Courtès
  2023-09-07  0:41       ` Simon Tournier
  2023-09-11 14:37       ` Ludovic Courtès
  2 siblings, 1 reply; 32+ messages in thread
From: Josselin Poiret via Bug reports for GNU Guix @ 2023-09-06  8:04 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 65720

[-- Attachment #1: Type: text/plain, Size: 1585 bytes --]

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Surely you’d agree that it would suck though: depending on two Git
> implementations because one doesn’t have a proper API and the other one
> lacks a bunch of features.

Right, although I wouldn't necessarily say that the former doesn't have
a proper API, but rather that it has a Unix-oriented API.  That leads to
performance issues on e.g. Windows but on Linux I'm not sure there's
much of a difference.

> It would also be pretty bad for closure size:
>
> --8<---------------cut here---------------start------------->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---------------cut here---------------end--------------->8---
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?
>
> Tricky, tricky.

We could consider replacing the guile-git dependency with another
library built directly on top of git-minimal, and have this be a
dependency of Guix.  Not ideal though, and not really scalable either:
we can't just add every VCS as direct dependencies.

From what I've seen, people are now scaling back on their use of
libgit2 because of the impedence mismatch and are resorting more and
more to git plumbing.  From a pragmatic point of view, I'd prefer the
latter, since it is more stable and feature-complete.

Best,
-- 
Josselin Poiret

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 682 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-05 14:18     ` Ludovic Courtès
  2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
@ 2023-09-07  0:41       ` Simon Tournier
  2023-09-08 17:09         ` Ludovic Courtès
  2023-09-11 14:37       ` Ludovic Courtès
  2 siblings, 1 reply; 32+ messages in thread
From: Simon Tournier @ 2023-09-07  0:41 UTC (permalink / raw)
  To: Ludovic Courtès, Josselin Poiret; +Cc: 65720

Hi,

On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote:

> It would also be pretty bad for closure size:
>
> --8<---------------cut here---------------start------------->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---------------cut here---------------end--------------->8---
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?

I think we can rely on something like,

    guix shell -C git-minimal -- git gc

It would be invoked internally using the Scheme API for inferiors and
friends.  Doing so, it would add nothing to the closure size.

It appears to me safe to assume that this command can be run from any
Guix installation.  Since the Git GC would only be done once every X Git
fetches, the overhead would be much lower.

Hum, am I repeating myself [1]? :-)

And I would run this “git gc” via “guix gc”, not via “guix pull”.  Well,
I do not like all these automatic removals happening based on date
(last-expiry-cleanup) with some usual commands.  It always happens when
I do not want. ;-) Contrary to “guix gc”.  Bah, another story. :-)

Cheers,
simon


1: bug#65720: Guile-Git-managed checkouts grow way too much
Simon Tournier <zimon.toutoune@gmail.com>
Tue, 05 Sep 2023 20:59:07 +0200
id:86edjcqwec.fsf@gmail.com
https://issues.guix.gnu.org//65720
https://issues.guix.gnu.org/msgid/86edjcqwec.fsf@gmail.com
https://yhetil.org/guix/86edjcqwec.fsf@gmail.com






^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
@ 2023-09-08 17:08         ` Ludovic Courtès
  2023-09-11  7:00           ` Csepp
                             ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-08 17:08 UTC (permalink / raw)
  To: Josselin Poiret; +Cc: 65720

Hello!

Josselin Poiret <dev@jpoiret.xyz> skribis:

> Right, although I wouldn't necessarily say that the former doesn't have
> a proper API, but rather that it has a Unix-oriented API.  That leads to
> performance issues on e.g. Windows but on Linux I'm not sure there's
> much of a difference.

[...]

> We could consider replacing the guile-git dependency with another
> library built directly on top of git-minimal, and have this be a
> dependency of Guix.  Not ideal though, and not really scalable either:
> we can't just add every VCS as direct dependencies.

I cannot imagine a viable implementation of things like ‘commit-closure’
and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
I’m quite confident this would be slow and brittle.

It looks like there’s no option other than carrying the two
implementations.

~~~

Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
in pure Scheme.  That was on April 1st though, so people mistakenly
assumed it was a joke and the project was never carried out.

I digress, but I wonder: is there not even a viable Haskell or OCaml
implementation of Git?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-07  0:41       ` Simon Tournier
@ 2023-09-08 17:09         ` Ludovic Courtès
  2023-09-09 10:31           ` Simon Tournier
  0 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-08 17:09 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Josselin Poiret, 65720

Hi!

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> It would also be pretty bad for closure size:
>>
>> --8<---------------cut here---------------start------------->8---
>> $ guix size guile-git | tail -1
>> total: 106.6 MiB
>> $ guix size guile-git git-minimal | tail -1
>> total: 169.8 MiB
>> --8<---------------cut here---------------end--------------->8---
>>
>> It’s also not clear concretely how we’d add that dependency.  Try
>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>> But then, what about applications like Cuirass and hpcguix-web?
>
> I think we can rely on something like,
>
>     guix shell -C git-minimal -- git gc

We’re talking about the implementation of a cache (meant to speed up
operations), that would actually fill said cache plus do a whole bunch
of expensive operations?  Nah.  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-08 17:09         ` Ludovic Courtès
@ 2023-09-09 10:31           ` Simon Tournier
  2023-09-11  7:06             ` Csepp
  0 siblings, 1 reply; 32+ messages in thread
From: Simon Tournier @ 2023-09-09 10:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Josselin Poiret, 65720

Hi,

On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote:

>>> It would also be pretty bad for closure size:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> $ guix size guile-git | tail -1
>>> total: 106.6 MiB
>>> $ guix size guile-git git-minimal | tail -1
>>> total: 169.8 MiB
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> It’s also not clear concretely how we’d add that dependency.  Try
>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>> But then, what about applications like Cuirass and hpcguix-web?
>>
>> I think we can rely on something like,
>>
>>     guix shell -C git-minimal -- git gc
>
> We’re talking about the implementation of a cache (meant to speed up
> operations), that would actually fill said cache plus do a whole bunch
> of expensive operations?  Nah.  :-)

I do not think.  If I understand correctly, we need to run “git gc” at
some point, therefore git-minimal needs to me around.  The question is
how and when.

Well, maybe I am missing what the bug is about.  For me, it is about
running ‘git gc’ for cleaning the Git checkout cache, no?


Solution #1.  Add git-minimal as inputs.  It increases the closure and
the extra load (on average) is about the ratio between the rate of “guix
pull” and the rate of the git-minimal changes.

Assuming, that people are running “guix pull” once per week and say “git
gc” is run after 50 pulls.  (These both number are totally arbitrary and
based on my personal estimate).

Data Service [1] tells:

        2023-07-07 15:45:22 2023-09-08 21:22:08
        2023-05-11 16:10:48 2023-07-07 14:21:45
        2023-05-01 16:40:08 2023-05-11 14:36:16
        2023-04-25 13:34:54 2023-05-01 15:19:55
        2023-04-25 13:34:54 2023-09-08 21:22:08        
        2023-03-06 17:22:28 2023-04-25 12:27:33
        2023-01-17 23:49:19 2023-03-06 16:48:43
        2022-11-08 13:06:42 2023-01-17 15:11:47
        2022-10-08 05:14:46 2022-11-08 09:56:31
        2022-09-06 15:00:08 2022-10-08 04:15:43
        2022-08-13 22:02:31 2022-09-06 12:58:52
        …

It means that an user will download ~10 times git-minimal for nothing.


Solution #2.  The one I am proposing. :-)  Download git-minimal only
when Guix needs it for running “git gc”.  Yeah, there is probably a
small overload with some operations.  But, I bet this overload is much
smaller than the one of solution #1.

Well, it depends on the number of times people are updating the cache vs
the rate of change of git-minimal.

For sure, if one updates 100 times per week the cache, having
git-minimal as inputs is far better.  But I do not think that the
regular usage on average. :-)

That’s why I am proposing to have an option for turning off this “git
gc“ operation.

Well, we have lived since years without running ‘git gc’ so running it
once per year on average is probably enough to keep the cache size
reasonable.  And git-minimal is changing every month.


Maybe, there is some solution #3. ;-)

Cheers,
simon


1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-08 17:08         ` Ludovic Courtès
@ 2023-09-11  7:00           ` Csepp
  2023-09-11  8:42           ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier
  2023-09-11 14:42           ` bug#65720: Guile-Git-managed checkouts grow way too much wolf
  2 siblings, 0 replies; 32+ messages in thread
From: Csepp @ 2023-09-11  7:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: dev, 65720


Ludovic Courtès <ludo@gnu.org> writes:

> Hello!
>
> Josselin Poiret <dev@jpoiret.xyz> skribis:
>
>> Right, although I wouldn't necessarily say that the former doesn't have
>> a proper API, but rather that it has a Unix-oriented API.  That leads to
>> performance issues on e.g. Windows but on Linux I'm not sure there's
>> much of a difference.
>
> [...]
>
>> We could consider replacing the guile-git dependency with another
>> library built directly on top of git-minimal, and have this be a
>> dependency of Guix.  Not ideal though, and not really scalable either:
>> we can't just add every VCS as direct dependencies.
>
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
> I’m quite confident this would be slow and brittle.
>
> It looks like there’s no option other than carrying the two
> implementations.
>
> ~~~
>
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
>
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
>
> Thanks,
> Ludo’.

For sake of completeness:
There is an alternative implentation in C for Plan 9 that I've used and
is now mature enough that the 9front project switched to it from
Mercurial.
It might be possible to compile it with the plan9port compiler wrapper.

There is also a Git implementation in OCaml that some MirageOS
unikernels use to serve static content from a git repository.
Also the Irmin "database" is based on git and is written in OCaml.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-09 10:31           ` Simon Tournier
@ 2023-09-11  7:06             ` Csepp
  0 siblings, 0 replies; 32+ messages in thread
From: Csepp @ 2023-09-11  7:06 UTC (permalink / raw)
  To: Simon Tournier; +Cc: ludo, 65720, dev


Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote:
>
>>>> It would also be pretty bad for closure size:
>>>>
>>>> --8<---------------cut here---------------start------------->8---
>>>> $ guix size guile-git | tail -1
>>>> total: 106.6 MiB
>>>> $ guix size guile-git git-minimal | tail -1
>>>> total: 169.8 MiB
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> It’s also not clear concretely how we’d add that dependency.  Try
>>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>>> But then, what about applications like Cuirass and hpcguix-web?
>>>
>>> I think we can rely on something like,
>>>
>>>     guix shell -C git-minimal -- git gc
>>
>> We’re talking about the implementation of a cache (meant to speed up
>> operations), that would actually fill said cache plus do a whole bunch
>> of expensive operations?  Nah.  :-)
>
> I do not think.  If I understand correctly, we need to run “git gc” at
> some point, therefore git-minimal needs to me around.  The question is
> how and when.
>
> Well, maybe I am missing what the bug is about.  For me, it is about
> running ‘git gc’ for cleaning the Git checkout cache, no?
>
>
> Solution #1.  Add git-minimal as inputs.  It increases the closure and
> the extra load (on average) is about the ratio between the rate of “guix
> pull” and the rate of the git-minimal changes.
>
> Assuming, that people are running “guix pull” once per week and say “git
> gc” is run after 50 pulls.  (These both number are totally arbitrary and
> based on my personal estimate).
>
> Data Service [1] tells:
>
>         2023-07-07 15:45:22 2023-09-08 21:22:08
>         2023-05-11 16:10:48 2023-07-07 14:21:45
>         2023-05-01 16:40:08 2023-05-11 14:36:16
>         2023-04-25 13:34:54 2023-05-01 15:19:55
>         2023-04-25 13:34:54 2023-09-08 21:22:08        
>         2023-03-06 17:22:28 2023-04-25 12:27:33
>         2023-01-17 23:49:19 2023-03-06 16:48:43
>         2022-11-08 13:06:42 2023-01-17 15:11:47
>         2022-10-08 05:14:46 2022-11-08 09:56:31
>         2022-09-06 15:00:08 2022-10-08 04:15:43
>         2022-08-13 22:02:31 2022-09-06 12:58:52
>         …
>
> It means that an user will download ~10 times git-minimal for nothing.
>
>
> Solution #2.  The one I am proposing. :-)  Download git-minimal only
> when Guix needs it for running “git gc”.  Yeah, there is probably a
> small overload with some operations.  But, I bet this overload is much
> smaller than the one of solution #1.
>
> Well, it depends on the number of times people are updating the cache vs
> the rate of change of git-minimal.
>
> For sure, if one updates 100 times per week the cache, having
> git-minimal as inputs is far better.  But I do not think that the
> regular usage on average. :-)
>
> That’s why I am proposing to have an option for turning off this “git
> gc“ operation.
>
> Well, we have lived since years without running ‘git gc’ so running it
> once per year on average is probably enough to keep the cache size
> reasonable.  And git-minimal is changing every month.
>
>
> Maybe, there is some solution #3. ;-)
>
> Cheers,
> simon
>
>
> 1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history

Please don't create another situation like with guix system roll-back,
where a crucial sysadmin operation doesn't work without network access.
Or at least make it configurable, so things that are likely to be needed
for future operations are pre-fetched.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much)
  2023-09-08 17:08         ` Ludovic Courtès
  2023-09-11  7:00           ` Csepp
@ 2023-09-11  8:42           ` Simon Tournier
  2023-09-11 14:42           ` bug#65720: Guile-Git-managed checkouts grow way too much wolf
  2 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-09-11  8:42 UTC (permalink / raw)
  To: Ludovic Courtès, Josselin Poiret; +Cc: 65720

Hi Ludo,

On Fri, 08 Sep 2023 at 19:08, Ludovic Courtès <ludo@gnu.org> wrote:

> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.

Well, that is a piece of work. :-)

Maybe there is an hope with: git-std-lib.

    Subject: Proposal/Discussion: Turning parts of Git into libraries
    From: Emily Shaffer <nasamuffin@google.com>
    To: Git List <git@vger.kernel.org>
    Date: Fri, 17 Feb 2023 13:12:23 -0800	    
    https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com/

And some patches are starting to float around.
https://public-inbox.org/git/20230810163346.274132-1-calvinwan@google.com/


> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?

It depends on what means “viable”. :-)

https://github.com/mirage/ocaml-git
https://hackage.haskell.org/package/git

Irmin [1] is an OCaml library for building mergeable, branchable
distributed data stores – A Distributed Database Built on the Same
Principles as Git.  And irmin relies on ocaml-git.

1: https://github.com/mirage/irmin

Then there is a pure Go implementation and another using Java.

https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-go-git
https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-JGit

I do not know all that are “viable”.  Well, I do not know if ’git gc’ is
implemented.  And I do not know which plumbing is implemented and which
porcelain is available.

Last, SWH uses dulwich [2] which is a pure Python implementation of Git.

2: https://www.dulwich.io/

To my knowledge, there is no “dulwich gc” but they implement “dulwich
fsck” and “dulwich repack”.

Back on 10 Years of Guix or at UNESCO on February – I do not remember
exactly when – we were discussing about implementation of Git.  And we
mentioned an implementation in Rust.  Maybe this one:

https://github.com/Byron/gitoxide

Cheers,
simon





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-05 14:18     ` Ludovic Courtès
  2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
  2023-09-07  0:41       ` Simon Tournier
@ 2023-09-11 14:37       ` Ludovic Courtès
  2023-10-20 16:15         ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
  2 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-11 14:37 UTC (permalink / raw)
  To: Josselin Poiret; +Cc: 65720

Ludovic Courtès <ludo@gnu.org> skribis:

> It would also be pretty bad for closure size:
>
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?

A solution to this particular problem is coming:

  https://issues.guix.gnu.org/65866

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-08 17:08         ` Ludovic Courtès
  2023-09-11  7:00           ` Csepp
  2023-09-11  8:42           ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier
@ 2023-09-11 14:42           ` wolf
  2023-09-13 18:10             ` Ludovic Courtès
  2 siblings, 1 reply; 32+ messages in thread
From: wolf @ 2023-09-11 14:42 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Josselin Poiret, 65720


[-- Attachment #1.1: Type: text/plain, Size: 2814 bytes --]

On 2023-09-08 19:08:05 +0200, Ludovic Courtès wrote:
> Hello!
> 
> Josselin Poiret <dev@jpoiret.xyz> skribis:
> 
> > Right, although I wouldn't necessarily say that the former doesn't have
> > a proper API, but rather that it has a Unix-oriented API.  That leads to
> > performance issues on e.g. Windows but on Linux I'm not sure there's
> > much of a difference.
> 
> [...]
> 
> > We could consider replacing the guile-git dependency with another
> > library built directly on top of git-minimal, and have this be a
> > dependency of Guix.  Not ideal though, and not really scalable either:
> > we can't just add every VCS as direct dependencies.
> 
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.

I am sure I must be missing some part of the contract of the function, but at
least the commit-relation seems fairly straightforward:

    (define (shelling-commit-relation old new)
      (let ((h-old (oid->string (commit-id old)))
            (h-new (oid->string (commit-id new))))
        (cond ((eq? old new)
               'self)
              ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new))
               'ancestor)
              ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old))
               'descendant)
              (else
               'unrelated))))

I would argue it is even somewhat more readable than the current implementation.

> I’m quite confident this would be slow

My version is ~2000x faster compared to (guix git):

    Guix: 1048.620992ms
    Git:  0.532143ms

Again, I am sure I must have miss something, either in the implementation or in
the measurements, because it is pretty hard to believe there is so much room for
improvement.

The full script I used is attached to this email.

> and brittle.

In general git plumbing command are design to have stable CLI interface in order
to be usable in scripting.  So I am not sure where the brittleness would come
from.

> 
> It looks like there’s no option other than carrying the two
> implementations.

Assuming I made no mistake (hard to believe), it is probably worth exploring the
feasibility of just shelling out to the git binary some more.

> 
> ~~~
> 
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
> 
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
> 
> Thanks,
> Ludo’.
>

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #1.2: test.scm --]
[-- Type: text/plain, Size: 1986 bytes --]

#!/bin/sh
# -*-scheme-*-
exec guile -s "$0" "$@"
!#

(use-modules (git)
             (guix git))

(define %repo "/tmp/guix-fork")

(define h1 "72745172d155e489936f694d6b9013cb76272370")
(define h2 "6d60d7ccba5a8e06c17d55a1772fa7f4529b5eff")
(define h3 "c3db650680f995f0556d3ddce567cdc1c33e4603")

;;; r has to still be defined when the commit-relation is called.  There is *no*
;;; error, but it always returns 'unrelated.  Quite a footgun.
(define r (repository-open %repo))
(define c1 (commit-lookup r (string->oid h1)))
(define c2 (commit-lookup r (string->oid h2)))
(define c3 (commit-lookup r (string->oid h3)))

(define (git-C dir . args)
  (apply system* "git" "-C" dir args))

(define (shelling-commit-relation old new)
  (let ((h-old (oid->string (commit-id old)))
        (h-new (oid->string (commit-id new))))
    (cond ((eq? old new)
           'self)
          ;; In real code, git-C should probably return #t (for 0), #f (for 1)
          ;; or raise (for anything else).
          ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new))
           'ancestor)
          ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old))
           'descendant)
          (else
           'unrelated))))

;;; Make sure it actually works.
(let ((tests `((,c1 . ,c1)
               (,c1 . ,c2)
               (,c2 . ,c1)
               (,c1 . ,c3))))
  (for-each (λ (c)
              (format #t "Guix: ~a\nGit:  ~a\n\n"
                      (commit-relation (car c) (cdr c))
                      (shelling-commit-relation (car c) (cdr c))))
            tests))

(define (time proc)
  (let* ((start (get-internal-run-time))
         (_ (proc))
         (end   (get-internal-run-time)))
    (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second)))))

(format #t "Guix: ~ams\nGit:  ~ams\n"
        (time (λ () (commit-relation c1 c2)))
        (time (λ () (shelling-commit-relation c1 c2))))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-11 14:42           ` bug#65720: Guile-Git-managed checkouts grow way too much wolf
@ 2023-09-13 18:10             ` Ludovic Courtès
  2023-09-13 22:36               ` Simon Tournier
  0 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-13 18:10 UTC (permalink / raw)
  To: wolf; +Cc: Josselin Poiret, 65720

Hi,

wolf <wolf@wolfsden.cz> skribis:

> (define (time proc)
>   (let* ((start (get-internal-run-time))
>          (_ (proc))
>          (end   (get-internal-run-time)))
>     (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second)))))
>
> (format #t "Guix: ~ams\nGit:  ~ams\n"
>         (time (λ () (commit-relation c1 c2)))
>         (time (λ () (shelling-commit-relation c1 c2))))

‘get-internal-run-time’ returns “units of processor time” used by the
current process (info "(guile) Time").  When shelling out, the process
calls waitpid(2) and does nothing, so naturally its processor time is
close to zero.

‘get-internal-real-time’ should give something closer to elapsed time.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-13 18:10             ` Ludovic Courtès
@ 2023-09-13 22:36               ` Simon Tournier
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-09-13 22:36 UTC (permalink / raw)
  To: Ludovic Courtès, wolf; +Cc: Josselin Poiret, 65720

Hi Ludo,

On Wed, 13 Sep 2023 at 20:10, Ludovic Courtès <ludo@gnu.org> wrote:

> ‘get-internal-run-time’ returns “units of processor time” used by the
> current process (info "(guile) Time").  When shelling out, the process
> calls waitpid(2) and does nothing, so naturally its processor time is
> close to zero.
>
> ‘get-internal-real-time’ should give something closer to elapsed time.

Well, let avoid to mix unrelated discussion. :-)  For discussing that
specific part, I reported on guix-devel my timing using ,time.

        comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git
        Simon Tournier <zimon.toutoune@gmail.com>
        Tue, 12 Sep 2023 00:48:30 +0200
        id:865y4gz5q9.fsf@gmail.com
        https://lists.gnu.org/archive/html/guix-devel/2023-09
        https://yhetil.org/guix/865y4gz5q9.fsf@gmail.com

The result is still significantly less and discussion is welcome
overthere. :-)

Cheers,
simon




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
  2023-09-04 21:47 ` Ludovic Courtès
  2023-09-05 14:11 ` Ludovic Courtès
@ 2023-09-18 22:35 ` Ludovic Courtès
  2023-09-19  7:19   ` Simon Tournier
  2023-11-23 11:35 ` Ludovic Courtès
  3 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-09-18 22:35 UTC (permalink / raw)
  To: 65720

Ludovic Courtès <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

More data…  The biggest file in that repo is a pack that was created
when that repo was first cloned (Aug. 2021):

--8<---------------cut here---------------start------------->8---
$ du /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/* |sort -k1 -n| tail -3
44272	/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-3c2f1857501b01c321bc67ba1f30704deb9e18e9.pack
47272	/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-30d5b35ad14a8398464e49e224811b162f673d66.pack
191492	/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -l /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
-r--r--r-- 1 ludo users 196079671 Aug  9  2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -ld /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
-rw-r--r-- 1 ludo users 266 Aug  9  2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
--8<---------------cut here---------------end--------------->8---

The pack starts with things from Aug. 2021:

--8<---------------cut here---------------start------------->8---
$ git show-index < pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.idx|sort -k1 -n|head -3
12 30289f4d4638452520f52c1a36240220d0d940ff (852d8cb3)
927 d7ffc535c52f49177a8e5553569cdb1e321b5bc6 (2007c5d0)
1800 0a379de3249d5e9ff66fb404f7e5aa8ce2cb3d24 (b1e69aa4)
$ git show 30289f4d4638452520f52c1a36240220d0d940ff
commit 30289f4d4638452520f52c1a36240220d0d940ff
Author: Milkey Mouse <milkeymouse@meme.institute>
Date:   Sun Aug 8 22:15:40 2021 -0700

[…]
--8<---------------cut here---------------end--------------->8---

… and at the bottom (large offsets) it contains very old blogs from the
Nix repo that somehow made it here.

I figured we still had a ‘nix’ branch from the early days, that contains
the history of Nix.  I’ve now removed it, which helps a bit:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(git)
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix")
$5 = #<git-repository 91a7b0>
;; 600.534529s real time, 435.260926s run time.  0.000000s spent in GC.
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch")
$6 = #<git-repository 4465a50>
;; 420.321511s real time, 398.772963s run time.  0.000000s spent in GC.
--8<---------------cut here---------------end--------------->8---

… and more importantly:

--8<---------------cut here---------------start------------->8---
$ du -hs /tmp/guix/.git
373M	/tmp/guix/.git
$ du -hs /tmp/guix-after-removing-nix-branch/.git
362M	/tmp/guix-after-removing-nix-branch/.git
--8<---------------cut here---------------end--------------->8---

Anyway, what seems to happen is that every pull (every call to
‘remote-fetch’) creates a new pack (see ‘git_fetch_download_pack’ in
libgit2), which becomes inefficient in the long run (lots of small
poorly-compressed packs).  That’s at least one possible explanation.

To be continued…

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-18 22:35 ` Ludovic Courtès
@ 2023-09-19  7:19   ` Simon Tournier
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-09-19  7:19 UTC (permalink / raw)
  To: Ludovic Courtès, 65720

Hi Ludo.

On Tue, 19 Sep 2023 at 00:35, Ludovic Courtès <ludo@gnu.org> wrote:

> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> ,use(git)
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix")
> $5 = #<git-repository 91a7b0>
> ;; 600.534529s real time, 435.260926s run time.  0.000000s spent in GC.
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch")
> $6 = #<git-repository 4465a50>
> ;; 420.321511s real time, 398.772963s run time.  0.000000s spent in GC.
> --8<---------------cut here---------------end--------------->8---

[...]

> --8<---------------cut here---------------start------------->8---
> $ du -hs /tmp/guix/.git
> 373M	/tmp/guix/.git
> $ du -hs /tmp/guix-after-removing-nix-branch/.git
> 362M	/tmp/guix-after-removing-nix-branch/.git
> --8<---------------cut here---------------end--------------->8---

Just to also point [1] that using shallow clone and restrict to the
oldest reachable commit by the time-machine, it saves 25% of bits to
download, and similarly on disk.

--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-guile")
$1 = #<git-repository df3710>
;; 383.186818s real time, 278.060733s run time.  0.000000s spent in GC.

$ time git clone https://git.savannah.gnu.org/git/guix.git guix-full
Receiving objects: 100% (693699/693699), 342.14 MiB | 2.87 MiB/s, done.
real    2m40,830s
user    3m4,683s
sys     0m8,189s

$ time git clone --shallow-since=2019-04-30 https://git.savannah.gnu.org/git/guix.git guix-oldest
Receiving objects: 100% (428646/428646), 259.41 MiB | 3.87 MiB/s, done.
real    1m45,604s
user    2m32,370s
sys     0m5,916s

$ du -sh guix-*/.git
362M    guix-full/.git
362M    guix-guile/.git
272M    guix-oldest/.git
--8<---------------cut here---------------end--------------->8---

Cheers,
simon


1: Re: hard dependency on Git? (was bug#65866: [PATCH 0/8] Add built-in builder for Git checkouts)
Simon Tournier <zimon.toutoune@gmail.com>
Mon, 11 Sep 2023 19:52:34 +0200
id:871qf4ha1p.fsf@gmail.com
https://lists.gnu.org/archive/html/guix-devel/2023-09
https://yhetil.org/guix/871qf4ha1p.fsf@gmail.com





^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-09-11 14:37       ` Ludovic Courtès
@ 2023-10-20 16:15         ` Ludovic Courtès
  2023-10-23 10:08           ` Simon Tournier
  2023-10-30 12:02           ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines
  0 siblings, 2 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-10-20 16:15 UTC (permalink / raw)
  To: guix-patches
  Cc: Ludovic Courtès, 65720, Josselin Poiret, Simon Tournier,
	Christopher Baines, Josselin Poiret, Ludovic Courtès,
	Mathieu Othacehe, Ricardo Wurmus, Simon Tournier,
	Tobias Geerinckx-Rice

Fixes <https://issues.guix.gnu.org/65720>.

This fixes a bug whereby libgit2-managed checkouts would keep growing as
we fetch.

* guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
procedures.
(update-cached-checkout): Use it.
---
 guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

Hi!

This is a radical fix/workaround for the unbounded Git checkout growth
problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
pack files around).

I thought we might be able to implement a ‘git gc’ approximation using
the libgit2 “packbuilder” interface, but I haven’t got around to doing
it: <https://libgit2.org/libgit2/#HEAD/search/pack>.

Once again, shelling out is not my favorite option, but it’s a bug we
should fix sooner rather than later, hence this compromise.

Thoughts?

Ludo’.

diff --git a/guix/git.scm b/guix/git.scm
index b7182305cf..d704b62333 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
-;;; Copyright © 2018-2022 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018-2023 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2021 Kyle Meyer <kyle@kyleam.com>
 ;;; Copyright © 2021 Marius Bakke <marius@gnu.org>
 ;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
@@ -29,15 +29,16 @@ (define-module (guix git)
   #:use-module (guix cache)
   #:use-module (gcrypt hash)
   #:use-module ((guix build utils)
-                #:select (mkdir-p delete-file-recursively))
+                #:select (mkdir-p delete-file-recursively invoke/quiet))
   #:use-module (guix store)
   #:use-module (guix utils)
   #:use-module (guix records)
   #:use-module (guix gexp)
   #:autoload   (guix git-download)
   (git-reference-url git-reference-commit git-reference-recursive?)
+  #:autoload   (guix config) (%git)
   #:use-module (guix sets)
-  #:use-module ((guix diagnostics) #:select (leave warning))
+  #:use-module ((guix diagnostics) #:select (leave warning info))
   #:use-module (guix progress)
   #:autoload   (guix swh) (swh-download commit-id?)
   #:use-module (rnrs bytevectors)
@@ -428,6 +429,35 @@ (define (delete-checkout directory)
     (rename-file directory trashed)
     (delete-file-recursively trashed)))
 
+(define (packs-in-git-repository directory)
+  "Return the number of pack files under DIRECTORY, a Git checkout."
+  (catch 'system-error
+    (lambda ()
+      (let ((directory (opendir (in-vicinity directory ".git/objects/pack"))))
+        (let loop ((count 0))
+          (match (readdir directory)
+            ((? eof-object?)
+             (closedir directory)
+             count)
+            (str
+             (loop (if (string-suffix? ".pack" str)
+                       (+ 1 count)
+                       count)))))))
+    (const 0)))
+
+(define (maybe-run-git-gc directory)
+  "Run 'git gc' in DIRECTORY if needed."
+  ;; XXX: As of libgit2 1.3.x (used by Guile-Git), there's no support for GC.
+  ;; Each time a checkout is pulled, a new pack is created, which eventually
+  ;; takes up a lot of space (lots of small, poorly-compressed packs).  As a
+  ;; workaround, shell out to 'git gc' when the number of packs in a
+  ;; repository has become "too large", potentially wasting a lot of space.
+  ;; See <https://issues.guix.gnu.org/65720>.
+  (when (> (packs-in-git-repository directory) 25)
+    (info (G_ "compressing cached Git repository at '~a'...~%")
+          directory)
+    (invoke/quiet %git "-C" directory "gc")))
+
 (define* (update-cached-checkout url
                                  #:key
                                  (ref '())
@@ -515,6 +545,9 @@ (define* (update-cached-checkout url
                    seconds seconds
                    nanoseconds nanoseconds))))
 
+       ;; Run 'git gc' if needed.
+       (maybe-run-git-gc cache-directory)
+
        ;; When CACHE-DIRECTORY is a sub-directory of the default cache
        ;; directory, remove expired checkouts that are next to it.
        (let ((parent (dirname cache-directory)))

base-commit: 6b0a32196982a0a2f4dbb59d35e55833a5545ac6
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 32+ messages in thread

* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-20 16:15         ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
@ 2023-10-23 10:08           ` Simon Tournier
  2023-10-23 22:27             ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
  2023-10-30 12:02           ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines
  1 sibling, 1 reply; 32+ messages in thread
From: Simon Tournier @ 2023-10-23 10:08 UTC (permalink / raw)
  To: Ludovic Courtès, guix-patches
  Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès,
	Tobias Geerinckx-Rice, Ricardo Wurmus, Christopher Baines

Hi Ludo,

On Fri, 20 Oct 2023 at 18:15, Ludovic Courtès <ludo@gnu.org> wrote:

> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
> procedures.
> (update-cached-checkout): Use it.
> ---
>  guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)

LGTM.  Just two colors for the bikeshed. :-)


> +  (when (> (packs-in-git-repository directory) 25)

Why 25?  And not 10 or 50 or 100?


>  (define* (update-cached-checkout url
>                                   #:key
>                                   (ref '())
> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url
>                     seconds seconds
>                     nanoseconds nanoseconds))))
>  
> +       ;; Run 'git gc' if needed.
> +       (maybe-run-git-gc cache-directory)

Why not trigger it by “guix gc”?

Well, I expect “guix gc” to take some time and I choose when.  However,
I want “guix pull” or “guix time-machine” to be as fast as possible and
here some extra time is added, and I cannot control exactly when.

Cheers,
simon




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-23 10:08           ` Simon Tournier
@ 2023-10-23 22:27             ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
  2023-10-23 23:28               ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
  0 siblings, 1 reply; 32+ messages in thread
From: Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2023-10-23 22:27 UTC (permalink / raw)
  To: Simon Tournier, Ludovic Courtès, guix-patches
  Cc: Ricardo Wurmus, Christopher Baines, Josselin Poiret, 65720,
	Mathieu Othacehe

>Why not trigger it by “guix gc”?

Unless there's a new option I missed, guix gc doesn't handle this.

>Well, I expect “guix gc” to take some time and I choose when.  However,
>I want “guix pull” or “guix time-machine” to be as fast as possible

I don't think that things should be pushed into guix gc merely because they are slow.

This is not a great post (I'd look at the git code if I were at a computer) but I remember git printing something like 'optimising repository in the background'.  Maybe something similar would be appropriate here, to better hide such housekeeping from the user.


Kind regards,

T G-R

Sent on the go.  Excuse or enjoy my brevity.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-10-23 22:27             ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2023-10-23 23:28               ` Simon Tournier
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-10-23 23:28 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice
  Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès,
	Ricardo Wurmus, Christopher Baines, guix-patches

Hi,

On Mon, 23 Oct 2023 at 22:27, Tobias Geerinckx-Rice <me@tobias.gr> wrote:

>>Why not trigger it by “guix gc”?
>
> Unless there's a new option I missed, guix gc doesn't handle this.

Maybe I missed something but “guix gc” handles what we implement, no? :-)

Well, I run “guix gc” when I need some space.  And this
“maybe-run-git-gc” does exactly that: collect some spaces when I need
them.

For me, they are part of “guix gc” and not part of some update.


Aside, re-thinking about other features, I am consistent with other
comments I made when introducing ’maybe-remove-expired-cache-entries’;
see <https://issues.guix.gnu.org/45327#4>.  And consistent because most
probably I still think the same: cache cleanup should be handled by
“guix gc” and not by the commands themselves.  And maybe we are having
the same discussion. ;-)


>>Well, I expect “guix gc” to take some time and I choose when.  However,
>>I want “guix pull” or “guix time-machine” to be as fast as possible
>
> I don't think that things should be pushed into guix gc merely because
> they are slow.

Maybe I misread, somehow it appears to me that you miss the key part: I
choose when some extra work is done and I keep “guix pull” and “guix
time-machine” as fast as possible.


Cheers,
simon




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-20 16:15         ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
  2023-10-23 10:08           ` Simon Tournier
@ 2023-10-30 12:02           ` Christopher Baines
  2023-11-14  9:19             ` Ludovic Courtès
  1 sibling, 1 reply; 32+ messages in thread
From: Christopher Baines @ 2023-10-30 12:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 65720, 66650

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Fixes <https://issues.guix.gnu.org/65720>.
>
> This fixes a bug whereby libgit2-managed checkouts would keep growing as
> we fetch.
>
> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
> procedures.
> (update-cached-checkout): Use it.
> ---
>  guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
>
> Hi!
>
> This is a radical fix/workaround for the unbounded Git checkout growth
> problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
> pack files around).
>
> I thought we might be able to implement a ‘git gc’ approximation using
> the libgit2 “packbuilder” interface, but I haven’t got around to doing
> it: <https://libgit2.org/libgit2/#HEAD/search/pack>.
>
> Once again, shelling out is not my favorite option, but it’s a bug we
> should fix sooner rather than later, hence this compromise.
>
> Thoughts?

This sounds good to me, the data service has this problem as well of
cached checkouts that grow to be too large and this sounds like it'll
address it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-30 12:02           ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines
@ 2023-11-14  9:19             ` Ludovic Courtès
  2023-11-14  9:32               ` Simon Tournier
  0 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-11-14  9:19 UTC (permalink / raw)
  To: Christopher Baines; +Cc: Josselin Poiret, 65720, 66650

Hello,

Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Fixes <https://issues.guix.gnu.org/65720>.
>>
>> This fixes a bug whereby libgit2-managed checkouts would keep growing as
>> we fetch.

[...]

> This sounds good to me, the data service has this problem as well of
> cached checkouts that grow to be too large and this sounds like it'll
> address it.

Thanks for your input, Chris.

Any other comments?  I’d like to push the patch within a few days if
there are no objections.

  https://issues.guix.gnu.org/66650

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-11-14  9:19             ` Ludovic Courtès
@ 2023-11-14  9:32               ` Simon Tournier
       [not found]                 ` <87h6ll28yh.fsf@gnu.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Simon Tournier @ 2023-11-14  9:32 UTC (permalink / raw)
  To: Ludovic Courtès, Christopher Baines; +Cc: Josselin Poiret, 65720, 66650

Hi,

On Tue, 14 Nov 2023 at 10:19, Ludovic Courtès <ludo@gnu.org> wrote:

> Any other comments?  I’d like to push the patch within a few days if
> there are no objections.

As mentioned in [1],

>> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
>> procedures.
>> (update-cached-checkout): Use it.
>> ---
>>  guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 36 insertions(+), 3 deletions(-)

LGTM.  Just two colors for the bikeshed. :-)


>> +  (when (> (packs-in-git-repository directory) 25)

Why 25?  And not 10 or 50 or 100?


>>  (define* (update-cached-checkout url
>>                                   #:key
>>                                   (ref '())
>> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url
>>                     seconds seconds
>>                     nanoseconds nanoseconds))))
>>  
>> +       ;; Run 'git gc' if needed.
>> +       (maybe-run-git-gc cache-directory)

Why not trigger it by “guix gc”?

Well, I expect “guix gc” to take some time and I choose when.  However,
I want “guix pull” or “guix time-machine” to be as fast as possible and
here some extra time is added, and I cannot control exactly when.


Cheers,
simon


1: bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
Simon Tournier <zimon.toutoune@gmail.com>
Mon, 23 Oct 2023 12:08:07 +0200
id:87il6xlkhk.fsf@gmail.com
https://issues.guix.gnu.org/65720
https://issues.guix.gnu.org/msgid/87il6xlkhk.fsf@gmail.com
https://yhetil.org/guix/87il6xlkhk.fsf@gmail.com




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
       [not found]                   ` <CAJ3okZ2-W_Me-Gao44+LeKGCm7dhb8VkLfC2doL4NE9VO88HYg@mail.gmail.com>
@ 2023-11-22 11:17                     ` Ludovic Courtès
  2023-11-22 11:57                       ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
  0 siblings, 1 reply; 32+ messages in thread
From: Ludovic Courtès @ 2023-11-22 11:17 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650

Hi,

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> > Well, I expect “guix gc” to take some time and I choose when.  However,
>> > I want “guix pull” or “guix time-machine” to be as fast as possible and
>> > here some extra time is added, and I cannot control exactly when.
>>
>> Yes, I see.  The thing is ‘maybe-run-git-gc’ is only called on the slow
>> path; so for example, it’s not called on a ‘time-machine’ cache hit, but
>> only on a cache miss, which is already expensive anyway.
>
> What you mean as "only called on the slow path" is each time
> 'update-cached-checkout' is called, right?

Yes, which usually indicates we’re on a cache miss (for example a cache
miss of ‘guix time-machine’) and thus are going to do potentially more
work (updating a Git repo, building things, etc.).  That’s why I think
it’s on the “slow path” and shouldn’t make much of a difference.  More
importantly, unless I’m mistaken, it’s rarely going to fire.

> So, somehow when 'maybe-run-git-gc' is called appears to me
> "unpredictable".  But anyway. :-)

Sure, but the way I see it, that’s the nature of caches.

> Let move it elsewhere if I am really annoyed.

:-/

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-11-22 11:17                     ` bug#65720: [bug#66650] " Ludovic Courtès
@ 2023-11-22 11:57                       ` Simon Tournier
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Tournier @ 2023-11-22 11:57 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650

Hi Ludo,

Thanks for explaining.

On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès <ludo@gnu.org> wrote:

>                                   it’s rarely going to fire.

[...]

>> Let move it elsewhere if I am really annoyed.
>
> :-/

Sorry, I poorly worded my last comment. :-)

Somehow I was expressing: my view probably falls into the “Premature
optimization is the root of all evil” category.  Other said, I have no
objection and I will revisit the issue when I will be on fire, if I am,
or annoyed for real.

Cheers,
simon

PS:

Aside this patch:

>> So, somehow when 'maybe-run-git-gc' is called appears to me
>> "unpredictable".  But anyway. :-)
>
> Sure, but the way I see it, that’s the nature of caches.

What makes cache unpredictable is their current state.  However, this
does not imply that *all* the actions modifying from one state to
another must also be triggered in unpredictable moment.

For instance, I choose when I wash family’s clothes and the wash-machine
does not start by itself when the unpredictable stack of family’s dirty
clothes is enough.  Because, maybe today it’s rainy so drying is
difficult and tomorrow will be sunny so it will be a better moment. :-)

For me, “guix gc” should be the driver for cleaning all the various Guix
caches.  Anyway. :-D




^ permalink raw reply	[flat|nested] 32+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
                   ` (2 preceding siblings ...)
  2023-09-18 22:35 ` Ludovic Courtès
@ 2023-11-23 11:35 ` Ludovic Courtès
  3 siblings, 0 replies; 32+ messages in thread
From: Ludovic Courtès @ 2023-11-23 11:35 UTC (permalink / raw)
  To: 65720-done

Ludovic Courtès <ludo@gnu.org> skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G    /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M    .git

Fixed by b150c546b04c9ebb09de9f2c39789221054f5eea.

We still need to update the ‘guix’ package so that tools that rely on
(guix git) such as the Data Service, hpcguix-web, and Cuirass, can
benefit from this change.

Ludo’.




^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-11-23 11:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès
2023-09-04 21:47 ` Ludovic Courtès
2023-09-05  8:18   ` Josselin Poiret via Bug reports for GNU Guix
2023-09-05 14:18     ` Ludovic Courtès
2023-09-06  8:04       ` Josselin Poiret via Bug reports for GNU Guix
2023-09-08 17:08         ` Ludovic Courtès
2023-09-11  7:00           ` Csepp
2023-09-11  8:42           ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier
2023-09-11 14:42           ` bug#65720: Guile-Git-managed checkouts grow way too much wolf
2023-09-13 18:10             ` Ludovic Courtès
2023-09-13 22:36               ` Simon Tournier
2023-09-07  0:41       ` Simon Tournier
2023-09-08 17:09         ` Ludovic Courtès
2023-09-09 10:31           ` Simon Tournier
2023-09-11  7:06             ` Csepp
2023-09-11 14:37       ` Ludovic Courtès
2023-10-20 16:15         ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
2023-10-23 10:08           ` Simon Tournier
2023-10-23 22:27             ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2023-10-23 23:28               ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
2023-10-30 12:02           ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines
2023-11-14  9:19             ` Ludovic Courtès
2023-11-14  9:32               ` Simon Tournier
     [not found]                 ` <87h6ll28yh.fsf@gnu.org>
     [not found]                   ` <CAJ3okZ2-W_Me-Gao44+LeKGCm7dhb8VkLfC2doL4NE9VO88HYg@mail.gmail.com>
2023-11-22 11:17                     ` bug#65720: [bug#66650] " Ludovic Courtès
2023-11-22 11:57                       ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
2023-09-05  8:22   ` Jelle Licht
2023-09-05 14:20     ` Ludovic Courtès
2023-09-05 18:59   ` Simon Tournier
2023-09-05 14:11 ` Ludovic Courtès
2023-09-18 22:35 ` Ludovic Courtès
2023-09-19  7:19   ` Simon Tournier
2023-11-23 11:35 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).