* bug#65720: Guile-Git-managed checkouts grow way too much @ 2023-09-03 20:44 Ludovic Courtès 2023-09-04 21:47 ` Ludovic Courtès ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-03 20:44 UTC (permalink / raw) To: 65720 Hello! As reported by Tobias on IRC (in the context of ‘hpcguix-web’), checkouts managed by Guile-Git appear to grow beyond reason. As an example, here’s the same ‘.git’ managed with Guile-Git and with Git: --8<---------------cut here---------------start------------->8--- $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq $ du -hs .git 517M .git --8<---------------cut here---------------end--------------->8--- It would seem that libgit2 doesn’t do the equivalent of ‘git gc’. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès @ 2023-09-04 21:47 ` Ludovic Courtès 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix ` (2 more replies) 2023-09-05 14:11 ` Ludovic Courtès ` (2 subsequent siblings) 3 siblings, 3 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-04 21:47 UTC (permalink / raw) To: 65720 [-- Attachment #1: Type: text/plain, Size: 3006 bytes --] Ludovic Courtès <ludo@gnu.org> skribis: > As reported by Tobias on IRC (in the context of ‘hpcguix-web’), > checkouts managed by Guile-Git appear to grow beyond reason. As an > example, here’s the same ‘.git’ managed with Guile-Git and with Git: > > $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > $ du -hs .git > 517M .git Unsurprisingly, GC makes a big difference: --8<---------------cut here---------------start------------->8--- $ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout $ (cd /tmp/checkout/; git gc) Enumerating objects: 717785, done. Counting objects: 100% (717785/717785), done. Delta compression using up to 4 threads Compressing objects: 100% (154644/154644), done. Writing objects: 100% (717785/717785), done. Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0 Enumerating cruft objects: 103412, done. Traversing cruft objects: 81753, done. Counting objects: 100% (64171/64171), done. Delta compression using up to 4 threads Compressing objects: 100% (17379/17379), done. Writing objects: 100% (64171/64171), done. Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0 Expanding reachable commits in commit graph: 133730, done. $ du -hs /tmp/checkout 539M /tmp/checkout --8<---------------cut here---------------end--------------->8--- > It would seem that libgit2 doesn’t do the equivalent of ‘git gc’. Confirmed: <https://github.com/libgit2/libgit2/issues/3247>. My inclination for the short term would be to work around this limitation by (1) finding a heuristic to determine is a checkout has likely accumulated too much cruft, and (2) considering such checkouts as expired (thereby forcing a re-clone) or running ‘git gc’ on them if ‘git’ is available. I can’t think of a good heuristic for (1). Birth time could be one, but we’d need statx(2): --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4 Access: 2023-09-04 23:13:54.668279105 +0200 Modify: 2023-09-04 11:34:41.665385000 +0200 Change: 2023-09-04 11:34:41.661629102 +0200 Birth: 2021-08-09 10:48:17.748722151 +0200 --8<---------------cut here---------------end--------------->8--- Lacking statx(2), we can approximate creation time by looking at ‘.git/config’: --8<---------------cut here---------------start------------->8--- $ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3 Modify: 2021-08-09 10:50:28.031760953 +0200 Change: 2021-08-09 10:50:28.031760953 +0200 Birth: 2021-08-09 10:50:28.031760953 +0200 --8<---------------cut here---------------end--------------->8--- This strategy can be implemented like this: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 942 bytes --] diff --git a/guix/git.scm b/guix/git.scm index ebe2600209..ed3fa56bc8 100644 --- a/guix/git.scm +++ b/guix/git.scm @@ -405,7 +405,16 @@ (define cached-checkout-expiration ;; Use the mtime rather than the atime to cope with file systems mounted ;; with 'noatime'. - (file-expiration-time (* 90 24 3600) stat:mtime)) + (let ((ttl (* 90 24 3600)) + (max-checkout-retention (* 9 30 24 3600))) + (lambda (file) + (match (false-if-exception (lstat file)) + (#f 0) ;FILE may have been deleted in the meantime + (st (min (pk 'ttl (+ (stat:mtime st) ttl)) + (pk 'maxttl (match (false-if-exception + (lstat (in-vicinity file ".git/config"))) + (#f +inf.0) + (st (+ (stat:mtime st) max-checkout-retention)))))))))) (define %checkout-cache-cleanup-period ;; Period for the removal of expired cached checkouts. [-- Attachment #3: Type: text/plain, Size: 707 bytes --] Namely, a cached checkout as considered as “expired” after 9 months. In my case, it gives this: --8<---------------cut here---------------start------------->8--- scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/") ;;; (ttl 1701596081) ;;; (maxttl 1651827028) $6 = 1651827028 --8<---------------cut here---------------end--------------->8--- Of course having to re-clone entire repositories every 9 months is ridiculous, but storing gigabytes of packs is worse IMO (I’m specifically thinking about the Guix repo, which every users copies via ‘guix pull’). Thoughts? Thanks, Ludo’. ^ permalink raw reply related [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-04 21:47 ` Ludovic Courtès @ 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-05 14:18 ` Ludovic Courtès 2023-09-05 8:22 ` Jelle Licht 2023-09-05 18:59 ` Simon Tournier 2 siblings, 1 reply; 36+ messages in thread From: Josselin Poiret via Bug reports for GNU Guix @ 2023-09-05 8:18 UTC (permalink / raw) To: Ludovic Courtès, 65720 [-- Attachment #1: Type: text/plain, Size: 1053 bytes --] Hi Ludo, Ludovic Courtès <ludo@gnu.org> writes: > My inclination for the short term would be to work around this > limitation by (1) finding a heuristic to determine is a checkout has > likely accumulated too much cruft, and (2) considering such checkouts as > expired (thereby forcing a re-clone) or running ‘git gc’ on them if > ‘git’ is available. I think using the git binary instead of libgit2 as a workaround is a good idea. We can consider building it directly as well, so that people who don't have it in their profiles can still benefit from it. We could even consider using git commands in most places and using libgit2 only where we really need the tight coupling. IIUC, libgit2 is eternally trying to catch up to git and often performs in a counter-intuitive way (I expect the various bugs with stale deleted files in checkouts to be caused by this). Maybe it could also let us use bare repository and directly extract the refs we want without having to mess with checkouts? Best, -- Josselin Poiret [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 682 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix @ 2023-09-05 14:18 ` Ludovic Courtès 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-05 14:18 UTC (permalink / raw) To: Josselin Poiret; +Cc: 65720 Hi, Josselin Poiret <dev@jpoiret.xyz> skribis: > I think using the git binary instead of libgit2 as a workaround is a > good idea. We can consider building it directly as well, so that people > who don't have it in their profiles can still benefit from it. We could > even consider using git commands in most places and using libgit2 only > where we really need the tight coupling. Surely you’d agree that it would suck though: depending on two Git implementations because one doesn’t have a proper API and the other one lacks a bunch of features. It would also be pretty bad for closure size: --8<---------------cut here---------------start------------->8--- $ guix size guile-git | tail -1 total: 106.6 MiB $ guix size guile-git git-minimal | tail -1 total: 169.8 MiB --8<---------------cut here---------------end--------------->8--- It’s also not clear concretely how we’d add that dependency. Try invoking ‘git’ from $PATH and print a warning if it doesn’t work? But then, what about applications like Cuirass and hpcguix-web? Tricky, tricky. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-05 14:18 ` Ludovic Courtès @ 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-08 17:08 ` Ludovic Courtès 2023-09-07 0:41 ` Simon Tournier 2023-09-11 14:37 ` Ludovic Courtès 2 siblings, 1 reply; 36+ messages in thread From: Josselin Poiret via Bug reports for GNU Guix @ 2023-09-06 8:04 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 65720 [-- Attachment #1: Type: text/plain, Size: 1585 bytes --] Hi Ludo, Ludovic Courtès <ludo@gnu.org> writes: > Surely you’d agree that it would suck though: depending on two Git > implementations because one doesn’t have a proper API and the other one > lacks a bunch of features. Right, although I wouldn't necessarily say that the former doesn't have a proper API, but rather that it has a Unix-oriented API. That leads to performance issues on e.g. Windows but on Linux I'm not sure there's much of a difference. > It would also be pretty bad for closure size: > > --8<---------------cut here---------------start------------->8--- > $ guix size guile-git | tail -1 > total: 106.6 MiB > $ guix size guile-git git-minimal | tail -1 > total: 169.8 MiB > --8<---------------cut here---------------end--------------->8--- > > It’s also not clear concretely how we’d add that dependency. Try > invoking ‘git’ from $PATH and print a warning if it doesn’t work? > But then, what about applications like Cuirass and hpcguix-web? > > Tricky, tricky. We could consider replacing the guile-git dependency with another library built directly on top of git-minimal, and have this be a dependency of Guix. Not ideal though, and not really scalable either: we can't just add every VCS as direct dependencies. From what I've seen, people are now scaling back on their use of libgit2 because of the impedence mismatch and are resorting more and more to git plumbing. From a pragmatic point of view, I'd prefer the latter, since it is more stable and feature-complete. Best, -- Josselin Poiret [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 682 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix @ 2023-09-08 17:08 ` Ludovic Courtès 2023-09-11 7:00 ` Csepp ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-08 17:08 UTC (permalink / raw) To: Josselin Poiret; +Cc: 65720 Hello! Josselin Poiret <dev@jpoiret.xyz> skribis: > Right, although I wouldn't necessarily say that the former doesn't have > a proper API, but rather that it has a Unix-oriented API. That leads to > performance issues on e.g. Windows but on Linux I'm not sure there's > much of a difference. [...] > We could consider replacing the guile-git dependency with another > library built directly on top of git-minimal, and have this be a > dependency of Guix. Not ideal though, and not really scalable either: > we can't just add every VCS as direct dependencies. I cannot imagine a viable implementation of things like ‘commit-closure’ and ‘commit-relation’ from (guix git) done by shelling out to ‘git’. I’m quite confident this would be slow and brittle. It looks like there’s no option other than carrying the two implementations. ~~~ Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git in pure Scheme. That was on April 1st though, so people mistakenly assumed it was a joke and the project was never carried out. I digress, but I wonder: is there not even a viable Haskell or OCaml implementation of Git? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-08 17:08 ` Ludovic Courtès @ 2023-09-11 7:00 ` Csepp 2023-09-11 8:42 ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier 2023-09-11 14:42 ` bug#65720: Guile-Git-managed checkouts grow way too much wolf 2 siblings, 0 replies; 36+ messages in thread From: Csepp @ 2023-09-11 7:00 UTC (permalink / raw) To: Ludovic Courtès; +Cc: dev, 65720 Ludovic Courtès <ludo@gnu.org> writes: > Hello! > > Josselin Poiret <dev@jpoiret.xyz> skribis: > >> Right, although I wouldn't necessarily say that the former doesn't have >> a proper API, but rather that it has a Unix-oriented API. That leads to >> performance issues on e.g. Windows but on Linux I'm not sure there's >> much of a difference. > > [...] > >> We could consider replacing the guile-git dependency with another >> library built directly on top of git-minimal, and have this be a >> dependency of Guix. Not ideal though, and not really scalable either: >> we can't just add every VCS as direct dependencies. > > I cannot imagine a viable implementation of things like ‘commit-closure’ > and ‘commit-relation’ from (guix git) done by shelling out to ‘git’. > I’m quite confident this would be slow and brittle. > > It looks like there’s no option other than carrying the two > implementations. > > ~~~ > > Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git > in pure Scheme. That was on April 1st though, so people mistakenly > assumed it was a joke and the project was never carried out. > > I digress, but I wonder: is there not even a viable Haskell or OCaml > implementation of Git? > > Thanks, > Ludo’. For sake of completeness: There is an alternative implentation in C for Plan 9 that I've used and is now mature enough that the 9front project switched to it from Mercurial. It might be possible to compile it with the plan9port compiler wrapper. There is also a Git implementation in OCaml that some MirageOS unikernels use to serve static content from a git repository. Also the Irmin "database" is based on git and is written in OCaml. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) 2023-09-08 17:08 ` Ludovic Courtès 2023-09-11 7:00 ` Csepp @ 2023-09-11 8:42 ` Simon Tournier 2023-09-11 14:42 ` bug#65720: Guile-Git-managed checkouts grow way too much wolf 2 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-09-11 8:42 UTC (permalink / raw) To: Ludovic Courtès, Josselin Poiret; +Cc: 65720 Hi Ludo, On Fri, 08 Sep 2023 at 19:08, Ludovic Courtès <ludo@gnu.org> wrote: > Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git > in pure Scheme. That was on April 1st though, so people mistakenly > assumed it was a joke and the project was never carried out. Well, that is a piece of work. :-) Maybe there is an hope with: git-std-lib. Subject: Proposal/Discussion: Turning parts of Git into libraries From: Emily Shaffer <nasamuffin@google.com> To: Git List <git@vger.kernel.org> Date: Fri, 17 Feb 2023 13:12:23 -0800 https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com/ And some patches are starting to float around. https://public-inbox.org/git/20230810163346.274132-1-calvinwan@google.com/ > I digress, but I wonder: is there not even a viable Haskell or OCaml > implementation of Git? It depends on what means “viable”. :-) https://github.com/mirage/ocaml-git https://hackage.haskell.org/package/git Irmin [1] is an OCaml library for building mergeable, branchable distributed data stores – A Distributed Database Built on the Same Principles as Git. And irmin relies on ocaml-git. 1: https://github.com/mirage/irmin Then there is a pure Go implementation and another using Java. https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-go-git https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-JGit I do not know all that are “viable”. Well, I do not know if ’git gc’ is implemented. And I do not know which plumbing is implemented and which porcelain is available. Last, SWH uses dulwich [2] which is a pure Python implementation of Git. 2: https://www.dulwich.io/ To my knowledge, there is no “dulwich gc” but they implement “dulwich fsck” and “dulwich repack”. Back on 10 Years of Guix or at UNESCO on February – I do not remember exactly when – we were discussing about implementation of Git. And we mentioned an implementation in Rust. Maybe this one: https://github.com/Byron/gitoxide Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-08 17:08 ` Ludovic Courtès 2023-09-11 7:00 ` Csepp 2023-09-11 8:42 ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier @ 2023-09-11 14:42 ` wolf 2023-09-13 18:10 ` Ludovic Courtès 2 siblings, 1 reply; 36+ messages in thread From: wolf @ 2023-09-11 14:42 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Josselin Poiret, 65720 [-- Attachment #1.1: Type: text/plain, Size: 2814 bytes --] On 2023-09-08 19:08:05 +0200, Ludovic Courtès wrote: > Hello! > > Josselin Poiret <dev@jpoiret.xyz> skribis: > > > Right, although I wouldn't necessarily say that the former doesn't have > > a proper API, but rather that it has a Unix-oriented API. That leads to > > performance issues on e.g. Windows but on Linux I'm not sure there's > > much of a difference. > > [...] > > > We could consider replacing the guile-git dependency with another > > library built directly on top of git-minimal, and have this be a > > dependency of Guix. Not ideal though, and not really scalable either: > > we can't just add every VCS as direct dependencies. > > I cannot imagine a viable implementation of things like ‘commit-closure’ > and ‘commit-relation’ from (guix git) done by shelling out to ‘git’. I am sure I must be missing some part of the contract of the function, but at least the commit-relation seems fairly straightforward: (define (shelling-commit-relation old new) (let ((h-old (oid->string (commit-id old))) (h-new (oid->string (commit-id new)))) (cond ((eq? old new) 'self) ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new)) 'ancestor) ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old)) 'descendant) (else 'unrelated)))) I would argue it is even somewhat more readable than the current implementation. > I’m quite confident this would be slow My version is ~2000x faster compared to (guix git): Guix: 1048.620992ms Git: 0.532143ms Again, I am sure I must have miss something, either in the implementation or in the measurements, because it is pretty hard to believe there is so much room for improvement. The full script I used is attached to this email. > and brittle. In general git plumbing command are design to have stable CLI interface in order to be usable in scripting. So I am not sure where the brittleness would come from. > > It looks like there’s no option other than carrying the two > implementations. Assuming I made no mistake (hard to believe), it is probably worth exploring the feasibility of just shelling out to the git binary some more. > > ~~~ > > Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git > in pure Scheme. That was on April 1st though, so people mistakenly > assumed it was a joke and the project was never carried out. > > I digress, but I wonder: is there not even a viable Haskell or OCaml > implementation of Git? > > Thanks, > Ludo’. > W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #1.2: test.scm --] [-- Type: text/plain, Size: 1986 bytes --] #!/bin/sh # -*-scheme-*- exec guile -s "$0" "$@" !# (use-modules (git) (guix git)) (define %repo "/tmp/guix-fork") (define h1 "72745172d155e489936f694d6b9013cb76272370") (define h2 "6d60d7ccba5a8e06c17d55a1772fa7f4529b5eff") (define h3 "c3db650680f995f0556d3ddce567cdc1c33e4603") ;;; r has to still be defined when the commit-relation is called. There is *no* ;;; error, but it always returns 'unrelated. Quite a footgun. (define r (repository-open %repo)) (define c1 (commit-lookup r (string->oid h1))) (define c2 (commit-lookup r (string->oid h2))) (define c3 (commit-lookup r (string->oid h3))) (define (git-C dir . args) (apply system* "git" "-C" dir args)) (define (shelling-commit-relation old new) (let ((h-old (oid->string (commit-id old))) (h-new (oid->string (commit-id new)))) (cond ((eq? old new) 'self) ;; In real code, git-C should probably return #t (for 0), #f (for 1) ;; or raise (for anything else). ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new)) 'ancestor) ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old)) 'descendant) (else 'unrelated)))) ;;; Make sure it actually works. (let ((tests `((,c1 . ,c1) (,c1 . ,c2) (,c2 . ,c1) (,c1 . ,c3)))) (for-each (λ (c) (format #t "Guix: ~a\nGit: ~a\n\n" (commit-relation (car c) (cdr c)) (shelling-commit-relation (car c) (cdr c)))) tests)) (define (time proc) (let* ((start (get-internal-run-time)) (_ (proc)) (end (get-internal-run-time))) (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second))))) (format #t "Guix: ~ams\nGit: ~ams\n" (time (λ () (commit-relation c1 c2))) (time (λ () (shelling-commit-relation c1 c2)))) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-11 14:42 ` bug#65720: Guile-Git-managed checkouts grow way too much wolf @ 2023-09-13 18:10 ` Ludovic Courtès 2023-09-13 22:36 ` Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-09-13 18:10 UTC (permalink / raw) To: wolf; +Cc: Josselin Poiret, 65720 Hi, wolf <wolf@wolfsden.cz> skribis: > (define (time proc) > (let* ((start (get-internal-run-time)) > (_ (proc)) > (end (get-internal-run-time))) > (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second))))) > > (format #t "Guix: ~ams\nGit: ~ams\n" > (time (λ () (commit-relation c1 c2))) > (time (λ () (shelling-commit-relation c1 c2)))) ‘get-internal-run-time’ returns “units of processor time” used by the current process (info "(guile) Time"). When shelling out, the process calls waitpid(2) and does nothing, so naturally its processor time is close to zero. ‘get-internal-real-time’ should give something closer to elapsed time. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-13 18:10 ` Ludovic Courtès @ 2023-09-13 22:36 ` Simon Tournier 0 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-09-13 22:36 UTC (permalink / raw) To: Ludovic Courtès, wolf; +Cc: Josselin Poiret, 65720 Hi Ludo, On Wed, 13 Sep 2023 at 20:10, Ludovic Courtès <ludo@gnu.org> wrote: > ‘get-internal-run-time’ returns “units of processor time” used by the > current process (info "(guile) Time"). When shelling out, the process > calls waitpid(2) and does nothing, so naturally its processor time is > close to zero. > > ‘get-internal-real-time’ should give something closer to elapsed time. Well, let avoid to mix unrelated discussion. :-) For discussing that specific part, I reported on guix-devel my timing using ,time. comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git Simon Tournier <zimon.toutoune@gmail.com> Tue, 12 Sep 2023 00:48:30 +0200 id:865y4gz5q9.fsf@gmail.com https://lists.gnu.org/archive/html/guix-devel/2023-09 https://yhetil.org/guix/865y4gz5q9.fsf@gmail.com The result is still significantly less and discussion is welcome overthere. :-) Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-05 14:18 ` Ludovic Courtès 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix @ 2023-09-07 0:41 ` Simon Tournier 2023-09-08 17:09 ` Ludovic Courtès 2023-09-11 14:37 ` Ludovic Courtès 2 siblings, 1 reply; 36+ messages in thread From: Simon Tournier @ 2023-09-07 0:41 UTC (permalink / raw) To: Ludovic Courtès, Josselin Poiret; +Cc: 65720 Hi, On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote: > It would also be pretty bad for closure size: > > --8<---------------cut here---------------start------------->8--- > $ guix size guile-git | tail -1 > total: 106.6 MiB > $ guix size guile-git git-minimal | tail -1 > total: 169.8 MiB > --8<---------------cut here---------------end--------------->8--- > > It’s also not clear concretely how we’d add that dependency. Try > invoking ‘git’ from $PATH and print a warning if it doesn’t work? > But then, what about applications like Cuirass and hpcguix-web? I think we can rely on something like, guix shell -C git-minimal -- git gc It would be invoked internally using the Scheme API for inferiors and friends. Doing so, it would add nothing to the closure size. It appears to me safe to assume that this command can be run from any Guix installation. Since the Git GC would only be done once every X Git fetches, the overhead would be much lower. Hum, am I repeating myself [1]? :-) And I would run this “git gc” via “guix gc”, not via “guix pull”. Well, I do not like all these automatic removals happening based on date (last-expiry-cleanup) with some usual commands. It always happens when I do not want. ;-) Contrary to “guix gc”. Bah, another story. :-) Cheers, simon 1: bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier <zimon.toutoune@gmail.com> Tue, 05 Sep 2023 20:59:07 +0200 id:86edjcqwec.fsf@gmail.com https://issues.guix.gnu.org//65720 https://issues.guix.gnu.org/msgid/86edjcqwec.fsf@gmail.com https://yhetil.org/guix/86edjcqwec.fsf@gmail.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-07 0:41 ` Simon Tournier @ 2023-09-08 17:09 ` Ludovic Courtès 2023-09-09 10:31 ` Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-09-08 17:09 UTC (permalink / raw) To: Simon Tournier; +Cc: Josselin Poiret, 65720 Hi! Simon Tournier <zimon.toutoune@gmail.com> skribis: > On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote: > >> It would also be pretty bad for closure size: >> >> --8<---------------cut here---------------start------------->8--- >> $ guix size guile-git | tail -1 >> total: 106.6 MiB >> $ guix size guile-git git-minimal | tail -1 >> total: 169.8 MiB >> --8<---------------cut here---------------end--------------->8--- >> >> It’s also not clear concretely how we’d add that dependency. Try >> invoking ‘git’ from $PATH and print a warning if it doesn’t work? >> But then, what about applications like Cuirass and hpcguix-web? > > I think we can rely on something like, > > guix shell -C git-minimal -- git gc We’re talking about the implementation of a cache (meant to speed up operations), that would actually fill said cache plus do a whole bunch of expensive operations? Nah. :-) Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-08 17:09 ` Ludovic Courtès @ 2023-09-09 10:31 ` Simon Tournier 2023-09-11 7:06 ` Csepp 0 siblings, 1 reply; 36+ messages in thread From: Simon Tournier @ 2023-09-09 10:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Josselin Poiret, 65720 Hi, On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote: >>> It would also be pretty bad for closure size: >>> >>> --8<---------------cut here---------------start------------->8--- >>> $ guix size guile-git | tail -1 >>> total: 106.6 MiB >>> $ guix size guile-git git-minimal | tail -1 >>> total: 169.8 MiB >>> --8<---------------cut here---------------end--------------->8--- >>> >>> It’s also not clear concretely how we’d add that dependency. Try >>> invoking ‘git’ from $PATH and print a warning if it doesn’t work? >>> But then, what about applications like Cuirass and hpcguix-web? >> >> I think we can rely on something like, >> >> guix shell -C git-minimal -- git gc > > We’re talking about the implementation of a cache (meant to speed up > operations), that would actually fill said cache plus do a whole bunch > of expensive operations? Nah. :-) I do not think. If I understand correctly, we need to run “git gc” at some point, therefore git-minimal needs to me around. The question is how and when. Well, maybe I am missing what the bug is about. For me, it is about running ‘git gc’ for cleaning the Git checkout cache, no? Solution #1. Add git-minimal as inputs. It increases the closure and the extra load (on average) is about the ratio between the rate of “guix pull” and the rate of the git-minimal changes. Assuming, that people are running “guix pull” once per week and say “git gc” is run after 50 pulls. (These both number are totally arbitrary and based on my personal estimate). Data Service [1] tells: 2023-07-07 15:45:22 2023-09-08 21:22:08 2023-05-11 16:10:48 2023-07-07 14:21:45 2023-05-01 16:40:08 2023-05-11 14:36:16 2023-04-25 13:34:54 2023-05-01 15:19:55 2023-04-25 13:34:54 2023-09-08 21:22:08 2023-03-06 17:22:28 2023-04-25 12:27:33 2023-01-17 23:49:19 2023-03-06 16:48:43 2022-11-08 13:06:42 2023-01-17 15:11:47 2022-10-08 05:14:46 2022-11-08 09:56:31 2022-09-06 15:00:08 2022-10-08 04:15:43 2022-08-13 22:02:31 2022-09-06 12:58:52 … It means that an user will download ~10 times git-minimal for nothing. Solution #2. The one I am proposing. :-) Download git-minimal only when Guix needs it for running “git gc”. Yeah, there is probably a small overload with some operations. But, I bet this overload is much smaller than the one of solution #1. Well, it depends on the number of times people are updating the cache vs the rate of change of git-minimal. For sure, if one updates 100 times per week the cache, having git-minimal as inputs is far better. But I do not think that the regular usage on average. :-) That’s why I am proposing to have an option for turning off this “git gc“ operation. Well, we have lived since years without running ‘git gc’ so running it once per year on average is probably enough to keep the cache size reasonable. And git-minimal is changing every month. Maybe, there is some solution #3. ;-) Cheers, simon 1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-09 10:31 ` Simon Tournier @ 2023-09-11 7:06 ` Csepp 0 siblings, 0 replies; 36+ messages in thread From: Csepp @ 2023-09-11 7:06 UTC (permalink / raw) To: Simon Tournier; +Cc: ludo, 65720, dev Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi, > > On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote: > >>>> It would also be pretty bad for closure size: >>>> >>>> --8<---------------cut here---------------start------------->8--- >>>> $ guix size guile-git | tail -1 >>>> total: 106.6 MiB >>>> $ guix size guile-git git-minimal | tail -1 >>>> total: 169.8 MiB >>>> --8<---------------cut here---------------end--------------->8--- >>>> >>>> It’s also not clear concretely how we’d add that dependency. Try >>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work? >>>> But then, what about applications like Cuirass and hpcguix-web? >>> >>> I think we can rely on something like, >>> >>> guix shell -C git-minimal -- git gc >> >> We’re talking about the implementation of a cache (meant to speed up >> operations), that would actually fill said cache plus do a whole bunch >> of expensive operations? Nah. :-) > > I do not think. If I understand correctly, we need to run “git gc” at > some point, therefore git-minimal needs to me around. The question is > how and when. > > Well, maybe I am missing what the bug is about. For me, it is about > running ‘git gc’ for cleaning the Git checkout cache, no? > > > Solution #1. Add git-minimal as inputs. It increases the closure and > the extra load (on average) is about the ratio between the rate of “guix > pull” and the rate of the git-minimal changes. > > Assuming, that people are running “guix pull” once per week and say “git > gc” is run after 50 pulls. (These both number are totally arbitrary and > based on my personal estimate). > > Data Service [1] tells: > > 2023-07-07 15:45:22 2023-09-08 21:22:08 > 2023-05-11 16:10:48 2023-07-07 14:21:45 > 2023-05-01 16:40:08 2023-05-11 14:36:16 > 2023-04-25 13:34:54 2023-05-01 15:19:55 > 2023-04-25 13:34:54 2023-09-08 21:22:08 > 2023-03-06 17:22:28 2023-04-25 12:27:33 > 2023-01-17 23:49:19 2023-03-06 16:48:43 > 2022-11-08 13:06:42 2023-01-17 15:11:47 > 2022-10-08 05:14:46 2022-11-08 09:56:31 > 2022-09-06 15:00:08 2022-10-08 04:15:43 > 2022-08-13 22:02:31 2022-09-06 12:58:52 > … > > It means that an user will download ~10 times git-minimal for nothing. > > > Solution #2. The one I am proposing. :-) Download git-minimal only > when Guix needs it for running “git gc”. Yeah, there is probably a > small overload with some operations. But, I bet this overload is much > smaller than the one of solution #1. > > Well, it depends on the number of times people are updating the cache vs > the rate of change of git-minimal. > > For sure, if one updates 100 times per week the cache, having > git-minimal as inputs is far better. But I do not think that the > regular usage on average. :-) > > That’s why I am proposing to have an option for turning off this “git > gc“ operation. > > Well, we have lived since years without running ‘git gc’ so running it > once per year on average is probably enough to keep the cache size > reasonable. And git-minimal is changing every month. > > > Maybe, there is some solution #3. ;-) > > Cheers, > simon > > > 1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history Please don't create another situation like with guix system roll-back, where a crucial sysadmin operation doesn't work without network access. Or at least make it configurable, so things that are likely to be needed for future operations are pre-fetched. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-05 14:18 ` Ludovic Courtès 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-07 0:41 ` Simon Tournier @ 2023-09-11 14:37 ` Ludovic Courtès 2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès 2 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-09-11 14:37 UTC (permalink / raw) To: Josselin Poiret; +Cc: 65720 Ludovic Courtès <ludo@gnu.org> skribis: > It would also be pretty bad for closure size: > > $ guix size guile-git | tail -1 > total: 106.6 MiB > $ guix size guile-git git-minimal | tail -1 > total: 169.8 MiB > > It’s also not clear concretely how we’d add that dependency. Try > invoking ‘git’ from $PATH and print a warning if it doesn’t work? A solution to this particular problem is coming: https://issues.guix.gnu.org/65866 Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-09-11 14:37 ` Ludovic Courtès @ 2023-10-20 16:15 ` Ludovic Courtès 2023-10-23 10:08 ` Simon Tournier 2023-10-30 12:02 ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines 0 siblings, 2 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-10-20 16:15 UTC (permalink / raw) To: guix-patches Cc: Ludovic Courtès, 65720, Josselin Poiret, Simon Tournier, Christopher Baines, Josselin Poiret, Ludovic Courtès, Mathieu Othacehe, Ricardo Wurmus, Simon Tournier, Tobias Geerinckx-Rice Fixes <https://issues.guix.gnu.org/65720>. This fixes a bug whereby libgit2-managed checkouts would keep growing as we fetch. * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New procedures. (update-cached-checkout): Use it. --- guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) Hi! This is a radical fix/workaround for the unbounded Git checkout growth problem, shelling out to ‘git gc’ when it’s likely needed (“too many” pack files around). I thought we might be able to implement a ‘git gc’ approximation using the libgit2 “packbuilder” interface, but I haven’t got around to doing it: <https://libgit2.org/libgit2/#HEAD/search/pack>. Once again, shelling out is not my favorite option, but it’s a bug we should fix sooner rather than later, hence this compromise. Thoughts? Ludo’. diff --git a/guix/git.scm b/guix/git.scm index b7182305cf..d704b62333 100644 --- a/guix/git.scm +++ b/guix/git.scm @@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com> -;;; Copyright © 2018-2022 Ludovic Courtès <ludo@gnu.org> +;;; Copyright © 2018-2023 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2021 Kyle Meyer <kyle@kyleam.com> ;;; Copyright © 2021 Marius Bakke <marius@gnu.org> ;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be> @@ -29,15 +29,16 @@ (define-module (guix git) #:use-module (guix cache) #:use-module (gcrypt hash) #:use-module ((guix build utils) - #:select (mkdir-p delete-file-recursively)) + #:select (mkdir-p delete-file-recursively invoke/quiet)) #:use-module (guix store) #:use-module (guix utils) #:use-module (guix records) #:use-module (guix gexp) #:autoload (guix git-download) (git-reference-url git-reference-commit git-reference-recursive?) + #:autoload (guix config) (%git) #:use-module (guix sets) - #:use-module ((guix diagnostics) #:select (leave warning)) + #:use-module ((guix diagnostics) #:select (leave warning info)) #:use-module (guix progress) #:autoload (guix swh) (swh-download commit-id?) #:use-module (rnrs bytevectors) @@ -428,6 +429,35 @@ (define (delete-checkout directory) (rename-file directory trashed) (delete-file-recursively trashed))) +(define (packs-in-git-repository directory) + "Return the number of pack files under DIRECTORY, a Git checkout." + (catch 'system-error + (lambda () + (let ((directory (opendir (in-vicinity directory ".git/objects/pack")))) + (let loop ((count 0)) + (match (readdir directory) + ((? eof-object?) + (closedir directory) + count) + (str + (loop (if (string-suffix? ".pack" str) + (+ 1 count) + count))))))) + (const 0))) + +(define (maybe-run-git-gc directory) + "Run 'git gc' in DIRECTORY if needed." + ;; XXX: As of libgit2 1.3.x (used by Guile-Git), there's no support for GC. + ;; Each time a checkout is pulled, a new pack is created, which eventually + ;; takes up a lot of space (lots of small, poorly-compressed packs). As a + ;; workaround, shell out to 'git gc' when the number of packs in a + ;; repository has become "too large", potentially wasting a lot of space. + ;; See <https://issues.guix.gnu.org/65720>. + (when (> (packs-in-git-repository directory) 25) + (info (G_ "compressing cached Git repository at '~a'...~%") + directory) + (invoke/quiet %git "-C" directory "gc"))) + (define* (update-cached-checkout url #:key (ref '()) @@ -515,6 +545,9 @@ (define* (update-cached-checkout url seconds seconds nanoseconds nanoseconds)))) + ;; Run 'git gc' if needed. + (maybe-run-git-gc cache-directory) + ;; When CACHE-DIRECTORY is a sub-directory of the default cache ;; directory, remove expired checkouts that are next to it. (let ((parent (dirname cache-directory))) base-commit: 6b0a32196982a0a2f4dbb59d35e55833a5545ac6 -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès @ 2023-10-23 10:08 ` Simon Tournier 2023-10-23 22:27 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix 2023-10-30 12:02 ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines 1 sibling, 1 reply; 36+ messages in thread From: Simon Tournier @ 2023-10-23 10:08 UTC (permalink / raw) To: Ludovic Courtès, guix-patches Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès, Tobias Geerinckx-Rice, Ricardo Wurmus, Christopher Baines Hi Ludo, On Fri, 20 Oct 2023 at 18:15, Ludovic Courtès <ludo@gnu.org> wrote: > * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New > procedures. > (update-cached-checkout): Use it. > --- > guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++--- > 1 file changed, 36 insertions(+), 3 deletions(-) LGTM. Just two colors for the bikeshed. :-) > + (when (> (packs-in-git-repository directory) 25) Why 25? And not 10 or 50 or 100? > (define* (update-cached-checkout url > #:key > (ref '()) > @@ -515,6 +545,9 @@ (define* (update-cached-checkout url > seconds seconds > nanoseconds nanoseconds)))) > > + ;; Run 'git gc' if needed. > + (maybe-run-git-gc cache-directory) Why not trigger it by “guix gc”? Well, I expect “guix gc” to take some time and I choose when. However, I want “guix pull” or “guix time-machine” to be as fast as possible and here some extra time is added, and I cannot control exactly when. Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-10-23 10:08 ` Simon Tournier @ 2023-10-23 22:27 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix 2023-10-23 23:28 ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2023-10-23 22:27 UTC (permalink / raw) To: Simon Tournier, Ludovic Courtès, guix-patches Cc: Ricardo Wurmus, Christopher Baines, Josselin Poiret, 65720, Mathieu Othacehe >Why not trigger it by “guix gc”? Unless there's a new option I missed, guix gc doesn't handle this. >Well, I expect “guix gc” to take some time and I choose when. However, >I want “guix pull” or “guix time-machine” to be as fast as possible I don't think that things should be pushed into guix gc merely because they are slow. This is not a great post (I'd look at the git code if I were at a computer) but I remember git printing something like 'optimising repository in the background'. Maybe something similar would be appropriate here, to better hide such housekeeping from the user. Kind regards, T G-R Sent on the go. Excuse or enjoy my brevity. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-10-23 22:27 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2023-10-23 23:28 ` Simon Tournier 0 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-10-23 23:28 UTC (permalink / raw) To: Tobias Geerinckx-Rice Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès, Ricardo Wurmus, Christopher Baines, guix-patches Hi, On Mon, 23 Oct 2023 at 22:27, Tobias Geerinckx-Rice <me@tobias.gr> wrote: >>Why not trigger it by “guix gc”? > > Unless there's a new option I missed, guix gc doesn't handle this. Maybe I missed something but “guix gc” handles what we implement, no? :-) Well, I run “guix gc” when I need some space. And this “maybe-run-git-gc” does exactly that: collect some spaces when I need them. For me, they are part of “guix gc” and not part of some update. Aside, re-thinking about other features, I am consistent with other comments I made when introducing ’maybe-remove-expired-cache-entries’; see <https://issues.guix.gnu.org/45327#4>. And consistent because most probably I still think the same: cache cleanup should be handled by “guix gc” and not by the commands themselves. And maybe we are having the same discussion. ;-) >>Well, I expect “guix gc” to take some time and I choose when. However, >>I want “guix pull” or “guix time-machine” to be as fast as possible > > I don't think that things should be pushed into guix gc merely because > they are slow. Maybe I misread, somehow it appears to me that you miss the key part: I choose when some extra work is done and I keep “guix pull” and “guix time-machine” as fast as possible. Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès 2023-10-23 10:08 ` Simon Tournier @ 2023-10-30 12:02 ` Christopher Baines 2023-11-14 9:19 ` Ludovic Courtès 1 sibling, 1 reply; 36+ messages in thread From: Christopher Baines @ 2023-10-30 12:02 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 65720, 66650 [-- Attachment #1: Type: text/plain, Size: 1150 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > Fixes <https://issues.guix.gnu.org/65720>. > > This fixes a bug whereby libgit2-managed checkouts would keep growing as > we fetch. > > * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New > procedures. > (update-cached-checkout): Use it. > --- > guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++--- > 1 file changed, 36 insertions(+), 3 deletions(-) > > Hi! > > This is a radical fix/workaround for the unbounded Git checkout growth > problem, shelling out to ‘git gc’ when it’s likely needed (“too many” > pack files around). > > I thought we might be able to implement a ‘git gc’ approximation using > the libgit2 “packbuilder” interface, but I haven’t got around to doing > it: <https://libgit2.org/libgit2/#HEAD/search/pack>. > > Once again, shelling out is not my favorite option, but it’s a bug we > should fix sooner rather than later, hence this compromise. > > Thoughts? This sounds good to me, the data service has this problem as well of cached checkouts that grow to be too large and this sounds like it'll address it. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-10-30 12:02 ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines @ 2023-11-14 9:19 ` Ludovic Courtès 2023-11-14 9:32 ` Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-11-14 9:19 UTC (permalink / raw) To: Christopher Baines; +Cc: Josselin Poiret, 65720, 66650 Hello, Christopher Baines <mail@cbaines.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> Fixes <https://issues.guix.gnu.org/65720>. >> >> This fixes a bug whereby libgit2-managed checkouts would keep growing as >> we fetch. [...] > This sounds good to me, the data service has this problem as well of > cached checkouts that grow to be too large and this sounds like it'll > address it. Thanks for your input, Chris. Any other comments? I’d like to push the patch within a few days if there are no objections. https://issues.guix.gnu.org/66650 Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-11-14 9:19 ` Ludovic Courtès @ 2023-11-14 9:32 ` Simon Tournier 2023-11-16 12:12 ` [bug#66650] " Ludovic Courtès 0 siblings, 1 reply; 36+ messages in thread From: Simon Tournier @ 2023-11-14 9:32 UTC (permalink / raw) To: Ludovic Courtès, Christopher Baines; +Cc: Josselin Poiret, 65720, 66650 Hi, On Tue, 14 Nov 2023 at 10:19, Ludovic Courtès <ludo@gnu.org> wrote: > Any other comments? I’d like to push the patch within a few days if > there are no objections. As mentioned in [1], >> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New >> procedures. >> (update-cached-checkout): Use it. >> --- >> guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 36 insertions(+), 3 deletions(-) LGTM. Just two colors for the bikeshed. :-) >> + (when (> (packs-in-git-repository directory) 25) Why 25? And not 10 or 50 or 100? >> (define* (update-cached-checkout url >> #:key >> (ref '()) >> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url >> seconds seconds >> nanoseconds nanoseconds)))) >> >> + ;; Run 'git gc' if needed. >> + (maybe-run-git-gc cache-directory) Why not trigger it by “guix gc”? Well, I expect “guix gc” to take some time and I choose when. However, I want “guix pull” or “guix time-machine” to be as fast as possible and here some extra time is added, and I cannot control exactly when. Cheers, simon 1: bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary. Simon Tournier <zimon.toutoune@gmail.com> Mon, 23 Oct 2023 12:08:07 +0200 id:87il6xlkhk.fsf@gmail.com https://issues.guix.gnu.org/65720 https://issues.guix.gnu.org/msgid/87il6xlkhk.fsf@gmail.com https://yhetil.org/guix/87il6xlkhk.fsf@gmail.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-11-14 9:32 ` Simon Tournier @ 2023-11-16 12:12 ` Ludovic Courtès 2023-11-16 13:24 ` Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-11-16 12:12 UTC (permalink / raw) To: Simon Tournier; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650 Hi, Simon Tournier <zimon.toutoune@gmail.com> skribis: >>> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New >>> procedures. >>> (update-cached-checkout): Use it. >>> --- >>> guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 36 insertions(+), 3 deletions(-) > > LGTM. Thanks! > Just two colors for the bikeshed. :-) > > >>> + (when (> (packs-in-git-repository directory) 25) > > Why 25? And not 10 or 50 or 100? Totally arbitrary. :-) I sampled the checkouts I had on my laptop and that seems like a reasonable heuristic. In particular, it seems that Git-managed checkouts never have this many packs; only libgit2-managed checkouts do, precisely because libgit2 doesn’t repack/GC. >>> + ;; Run 'git gc' if needed. >>> + (maybe-run-git-gc cache-directory) > > Why not trigger it by “guix gc”? Because so far the idea is that ~/.cache/guix/checkouts is automatically managed without user intervention; it’s really a cache in that sense. > Well, I expect “guix gc” to take some time and I choose when. However, > I want “guix pull” or “guix time-machine” to be as fast as possible and > here some extra time is added, and I cannot control exactly when. Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow path; so for example, it’s not called on a ‘time-machine’ cache hit, but only on a cache miss, which is already expensive anyway. Does that make sense? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-11-16 12:12 ` [bug#66650] " Ludovic Courtès @ 2023-11-16 13:24 ` Simon Tournier 2023-11-22 11:17 ` bug#65720: " Ludovic Courtès 0 siblings, 1 reply; 36+ messages in thread From: Simon Tournier @ 2023-11-16 13:24 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650 Hi, On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo@gnu.org> wrote: > > Well, I expect “guix gc” to take some time and I choose when. However, > > I want “guix pull” or “guix time-machine” to be as fast as possible and > > here some extra time is added, and I cannot control exactly when. > > Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow > path; so for example, it’s not called on a ‘time-machine’ cache hit, but > only on a cache miss, which is already expensive anyway. What you mean as "only called on the slow path" is each time 'update-cached-checkout' is called, right? So, somehow when 'maybe-run-git-gc' is called appears to me "unpredictable". But anyway. :-) Let move it elsewhere if I am really annoyed. Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary. 2023-11-16 13:24 ` Simon Tournier @ 2023-11-22 11:17 ` Ludovic Courtès 2023-11-22 11:57 ` Simon Tournier 0 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-11-22 11:17 UTC (permalink / raw) To: Simon Tournier; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650 Hi, Simon Tournier <zimon.toutoune@gmail.com> skribis: > On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo@gnu.org> wrote: > >> > Well, I expect “guix gc” to take some time and I choose when. However, >> > I want “guix pull” or “guix time-machine” to be as fast as possible and >> > here some extra time is added, and I cannot control exactly when. >> >> Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow >> path; so for example, it’s not called on a ‘time-machine’ cache hit, but >> only on a cache miss, which is already expensive anyway. > > What you mean as "only called on the slow path" is each time > 'update-cached-checkout' is called, right? Yes, which usually indicates we’re on a cache miss (for example a cache miss of ‘guix time-machine’) and thus are going to do potentially more work (updating a Git repo, building things, etc.). That’s why I think it’s on the “slow path” and shouldn’t make much of a difference. More importantly, unless I’m mistaken, it’s rarely going to fire. > So, somehow when 'maybe-run-git-gc' is called appears to me > "unpredictable". But anyway. :-) Sure, but the way I see it, that’s the nature of caches. > Let move it elsewhere if I am really annoyed. :-/ Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much 2023-11-22 11:17 ` bug#65720: " Ludovic Courtès @ 2023-11-22 11:57 ` Simon Tournier 0 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-11-22 11:57 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650 Hi Ludo, Thanks for explaining. On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès <ludo@gnu.org> wrote: > it’s rarely going to fire. [...] >> Let move it elsewhere if I am really annoyed. > > :-/ Sorry, I poorly worded my last comment. :-) Somehow I was expressing: my view probably falls into the “Premature optimization is the root of all evil” category. Other said, I have no objection and I will revisit the issue when I will be on fire, if I am, or annoyed for real. Cheers, simon PS: Aside this patch: >> So, somehow when 'maybe-run-git-gc' is called appears to me >> "unpredictable". But anyway. :-) > > Sure, but the way I see it, that’s the nature of caches. What makes cache unpredictable is their current state. However, this does not imply that *all* the actions modifying from one state to another must also be triggered in unpredictable moment. For instance, I choose when I wash family’s clothes and the wash-machine does not start by itself when the unpredictable stack of family’s dirty clothes is enough. Because, maybe today it’s rainy so drying is difficult and tomorrow will be sunny so it will be a better moment. :-) For me, “guix gc” should be the driver for cleaning all the various Guix caches. Anyway. :-D ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much @ 2023-11-22 11:57 ` Simon Tournier 0 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-11-22 11:57 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650 Hi Ludo, Thanks for explaining. On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès <ludo@gnu.org> wrote: > it’s rarely going to fire. [...] >> Let move it elsewhere if I am really annoyed. > > :-/ Sorry, I poorly worded my last comment. :-) Somehow I was expressing: my view probably falls into the “Premature optimization is the root of all evil” category. Other said, I have no objection and I will revisit the issue when I will be on fire, if I am, or annoyed for real. Cheers, simon PS: Aside this patch: >> So, somehow when 'maybe-run-git-gc' is called appears to me >> "unpredictable". But anyway. :-) > > Sure, but the way I see it, that’s the nature of caches. What makes cache unpredictable is their current state. However, this does not imply that *all* the actions modifying from one state to another must also be triggered in unpredictable moment. For instance, I choose when I wash family’s clothes and the wash-machine does not start by itself when the unpredictable stack of family’s dirty clothes is enough. Because, maybe today it’s rainy so drying is difficult and tomorrow will be sunny so it will be a better moment. :-) For me, “guix gc” should be the driver for cleaning all the various Guix caches. Anyway. :-D ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#66650: bug#65720: Guile-Git-managed checkouts grow way too much 2023-11-22 11:57 ` Simon Tournier (?) @ 2023-11-22 16:00 ` Ludovic Courtès -1 siblings, 0 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-11-22 16:00 UTC (permalink / raw) To: Simon Tournier Cc: Josselin Poiret, Christopher Baines, 65720-done, 66650-done Hi, Simon Tournier <zimon.toutoune@gmail.com> skribis: > Somehow I was expressing: my view probably falls into the “Premature > optimization is the root of all evil” category. Other said, I have no > objection and I will revisit the issue when I will be on fire, if I am, > or annoyed for real. Alright! Pushed as b150c546b04c9ebb09de9f2c39789221054f5eea. Let’s see how it behaves and if there are problems we had overlooked… Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-04 21:47 ` Ludovic Courtès 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix @ 2023-09-05 8:22 ` Jelle Licht 2023-09-05 14:20 ` Ludovic Courtès 2023-09-05 18:59 ` Simon Tournier 2 siblings, 1 reply; 36+ messages in thread From: Jelle Licht @ 2023-09-05 8:22 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 65720 Hi Ludo, > > On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote: > > Of course having to re-clone entire repositories every 9 months is > ridiculous, but storing gigabytes of packs is worse IMO (I’m > specifically thinking about the Guix repo, which every users copies via > ‘guix pull’). Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location. Kr, Jelle ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-05 8:22 ` Jelle Licht @ 2023-09-05 14:20 ` Ludovic Courtès 0 siblings, 0 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-05 14:20 UTC (permalink / raw) To: Jelle Licht; +Cc: 65720 Hello, Jelle Licht <jlicht@posteo.net> skribis: >> On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote: >> >> Of course having to re-clone entire repositories every 9 months is >> ridiculous, but storing gigabytes of packs is worse IMO (I’m >> specifically thinking about the Guix repo, which every users copies via >> ‘guix pull’). > > Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location. Good question. --8<---------------cut here---------------start------------->8--- scheme@(guix git)> ,use(git) scheme@(guix git)> (clone "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/" "/tmp/fresh-clone") $7 = #<git-repository ba4240> scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone") 6.7G /tmp/fresh-clone $8 = 0 scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone/.git") 6.6G /tmp/fresh-clone/.git $9 = 0 scheme@(guix git)> (system* "du" "-hs" "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/") 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/ $10 = 0 --8<---------------cut here---------------end--------------->8--- Conclusion: it makes no difference. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-04 21:47 ` Ludovic Courtès 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-05 8:22 ` Jelle Licht @ 2023-09-05 18:59 ` Simon Tournier 2 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-09-05 18:59 UTC (permalink / raw) To: Ludovic Courtès, 65720 Hi, On Mon, 04 Sep 2023 at 23:47, Ludovic Courtès <ludo@gnu.org> wrote: >> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’. > > Confirmed: <https://github.com/libgit2/libgit2/issues/3247>. Ouch! The goals of the project haven't changed, and neither have the tradeoffs. If one were to rewrite git-gc on top of libgit2, the best-case scenario is ending up with what we already had. If you want to use regular maintenance on some repostories, use git gc, that's what it's there for. https://github.com/libgit2/libgit2/issues/3247#issuecomment-152508040 > My inclination for the short term would be to work around this > limitation by (1) finding a heuristic to determine is a checkout has > likely accumulated too much cruft, and (2) considering such checkouts > as expired (thereby forcing a re-clone) or running ‘git gc’ on them if > ‘git’ is available. About (1) maybe we could add a “counter” and teach after X updates of the checkout then let run (2). Well, I guess the number of crufts is more or less proportional with the number of checkout updates; that’s the heuristic I would use. The most annoying is (2). Because forcing a re-clone does not appear to me a solution; I prefer to waste disk space (and probably run myself and manually ‘git gc’) than re-clone… Somehow this re-clone would always happen when I am using a poor network. Moreover, assuming this clean-up (2) would be run once every while, we could imagine to invoke something like, guix shell -C git-minimal -- git -C ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq gc when the checkout is updated. And maybe we could provide another “guix pull” command-line option for turning off this and mark it as done (reset the “counter”). Well, that’s a poor solution but we can assume that git-minimal is at worse available using “guix shell git-minimal”. Note that the closure of git-minimal is far less than re-cloning the full Guix repository. Cheers, simon ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès 2023-09-04 21:47 ` Ludovic Courtès @ 2023-09-05 14:11 ` Ludovic Courtès 2023-09-18 22:35 ` Ludovic Courtès 2023-11-23 11:35 ` Ludovic Courtès 3 siblings, 0 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-09-05 14:11 UTC (permalink / raw) To: 65720 Ludovic Courtès <ludo@gnu.org> skribis: > $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq Another data point, with Cuirass instances: --8<---------------cut here---------------start------------->8--- ludo@berlin ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq 65G /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq ludo@berlin ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1 Birth: 2022-07-30 23:15:45.582559879 +0200 --8<---------------cut here---------------end--------------->8--- … and: --8<---------------cut here---------------start------------->8--- ludo@guix-hpc4 ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq 86G /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq ludo@guix-hpc4 ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1 Créé : 2021-06-01 11:48:48.854669310 +0200 --8<---------------cut here---------------end--------------->8--- So yeah, problem we have. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès 2023-09-04 21:47 ` Ludovic Courtès 2023-09-05 14:11 ` Ludovic Courtès @ 2023-09-18 22:35 ` Ludovic Courtès 2023-09-19 7:19 ` Simon Tournier 2023-11-23 11:35 ` Ludovic Courtès 3 siblings, 1 reply; 36+ messages in thread From: Ludovic Courtès @ 2023-09-18 22:35 UTC (permalink / raw) To: 65720 Ludovic Courtès <ludo@gnu.org> skribis: > As reported by Tobias on IRC (in the context of ‘hpcguix-web’), > checkouts managed by Guile-Git appear to grow beyond reason. As an > example, here’s the same ‘.git’ managed with Guile-Git and with Git: > > $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > $ du -hs .git > 517M .git More data… The biggest file in that repo is a pack that was created when that repo was first cloned (Aug. 2021): --8<---------------cut here---------------start------------->8--- $ du /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/* |sort -k1 -n| tail -3 44272 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-3c2f1857501b01c321bc67ba1f30704deb9e18e9.pack 47272 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-30d5b35ad14a8398464e49e224811b162f673d66.pack 191492 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack $ ls -l /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack -r--r--r-- 1 ludo users 196079671 Aug 9 2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack $ ls -ld /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config -rw-r--r-- 1 ludo users 266 Aug 9 2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config --8<---------------cut here---------------end--------------->8--- The pack starts with things from Aug. 2021: --8<---------------cut here---------------start------------->8--- $ git show-index < pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.idx|sort -k1 -n|head -3 12 30289f4d4638452520f52c1a36240220d0d940ff (852d8cb3) 927 d7ffc535c52f49177a8e5553569cdb1e321b5bc6 (2007c5d0) 1800 0a379de3249d5e9ff66fb404f7e5aa8ce2cb3d24 (b1e69aa4) $ git show 30289f4d4638452520f52c1a36240220d0d940ff commit 30289f4d4638452520f52c1a36240220d0d940ff Author: Milkey Mouse <milkeymouse@meme.institute> Date: Sun Aug 8 22:15:40 2021 -0700 […] --8<---------------cut here---------------end--------------->8--- … and at the bottom (large offsets) it contains very old blogs from the Nix repo that somehow made it here. I figured we still had a ‘nix’ branch from the early days, that contains the history of Nix. I’ve now removed it, which helps a bit: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(git) scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix") $5 = #<git-repository 91a7b0> ;; 600.534529s real time, 435.260926s run time. 0.000000s spent in GC. scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch") $6 = #<git-repository 4465a50> ;; 420.321511s real time, 398.772963s run time. 0.000000s spent in GC. --8<---------------cut here---------------end--------------->8--- … and more importantly: --8<---------------cut here---------------start------------->8--- $ du -hs /tmp/guix/.git 373M /tmp/guix/.git $ du -hs /tmp/guix-after-removing-nix-branch/.git 362M /tmp/guix-after-removing-nix-branch/.git --8<---------------cut here---------------end--------------->8--- Anyway, what seems to happen is that every pull (every call to ‘remote-fetch’) creates a new pack (see ‘git_fetch_download_pack’ in libgit2), which becomes inefficient in the long run (lots of small poorly-compressed packs). That’s at least one possible explanation. To be continued… Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-18 22:35 ` Ludovic Courtès @ 2023-09-19 7:19 ` Simon Tournier 0 siblings, 0 replies; 36+ messages in thread From: Simon Tournier @ 2023-09-19 7:19 UTC (permalink / raw) To: Ludovic Courtès, 65720 Hi Ludo. On Tue, 19 Sep 2023 at 00:35, Ludovic Courtès <ludo@gnu.org> wrote: > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> ,use(git) > scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix") > $5 = #<git-repository 91a7b0> > ;; 600.534529s real time, 435.260926s run time. 0.000000s spent in GC. > scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch") > $6 = #<git-repository 4465a50> > ;; 420.321511s real time, 398.772963s run time. 0.000000s spent in GC. > --8<---------------cut here---------------end--------------->8--- [...] > --8<---------------cut here---------------start------------->8--- > $ du -hs /tmp/guix/.git > 373M /tmp/guix/.git > $ du -hs /tmp/guix-after-removing-nix-branch/.git > 362M /tmp/guix-after-removing-nix-branch/.git > --8<---------------cut here---------------end--------------->8--- Just to also point [1] that using shallow clone and restrict to the oldest reachable commit by the time-machine, it saves 25% of bits to download, and similarly on disk. --8<---------------cut here---------------start------------->8--- scheme@(guix-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-guile") $1 = #<git-repository df3710> ;; 383.186818s real time, 278.060733s run time. 0.000000s spent in GC. $ time git clone https://git.savannah.gnu.org/git/guix.git guix-full Receiving objects: 100% (693699/693699), 342.14 MiB | 2.87 MiB/s, done. real 2m40,830s user 3m4,683s sys 0m8,189s $ time git clone --shallow-since=2019-04-30 https://git.savannah.gnu.org/git/guix.git guix-oldest Receiving objects: 100% (428646/428646), 259.41 MiB | 3.87 MiB/s, done. real 1m45,604s user 2m32,370s sys 0m5,916s $ du -sh guix-*/.git 362M guix-full/.git 362M guix-guile/.git 272M guix-oldest/.git --8<---------------cut here---------------end--------------->8--- Cheers, simon 1: Re: hard dependency on Git? (was bug#65866: [PATCH 0/8] Add built-in builder for Git checkouts) Simon Tournier <zimon.toutoune@gmail.com> Mon, 11 Sep 2023 19:52:34 +0200 id:871qf4ha1p.fsf@gmail.com https://lists.gnu.org/archive/html/guix-devel/2023-09 https://yhetil.org/guix/871qf4ha1p.fsf@gmail.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* bug#65720: Guile-Git-managed checkouts grow way too much 2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès ` (2 preceding siblings ...) 2023-09-18 22:35 ` Ludovic Courtès @ 2023-11-23 11:35 ` Ludovic Courtès 3 siblings, 0 replies; 36+ messages in thread From: Ludovic Courtès @ 2023-11-23 11:35 UTC (permalink / raw) To: 65720-done Ludovic Courtès <ludo@gnu.org> skribis: > As reported by Tobias on IRC (in the context of ‘hpcguix-web’), > checkouts managed by Guile-Git appear to grow beyond reason. As an > example, here’s the same ‘.git’ managed with Guile-Git and with Git: > > $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq > $ du -hs .git > 517M .git Fixed by b150c546b04c9ebb09de9f2c39789221054f5eea. We still need to update the ‘guix’ package so that tools that rely on (guix git) such as the Data Service, hpcguix-web, and Cuirass, can benefit from this change. Ludo’. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2023-11-23 11:36 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-03 20:44 bug#65720: Guile-Git-managed checkouts grow way too much Ludovic Courtès 2023-09-04 21:47 ` Ludovic Courtès 2023-09-05 8:18 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-05 14:18 ` Ludovic Courtès 2023-09-06 8:04 ` Josselin Poiret via Bug reports for GNU Guix 2023-09-08 17:08 ` Ludovic Courtès 2023-09-11 7:00 ` Csepp 2023-09-11 8:42 ` bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much) Simon Tournier 2023-09-11 14:42 ` bug#65720: Guile-Git-managed checkouts grow way too much wolf 2023-09-13 18:10 ` Ludovic Courtès 2023-09-13 22:36 ` Simon Tournier 2023-09-07 0:41 ` Simon Tournier 2023-09-08 17:09 ` Ludovic Courtès 2023-09-09 10:31 ` Simon Tournier 2023-09-11 7:06 ` Csepp 2023-09-11 14:37 ` Ludovic Courtès 2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès 2023-10-23 10:08 ` Simon Tournier 2023-10-23 22:27 ` Tobias Geerinckx-Rice via Bug reports for GNU Guix 2023-10-23 23:28 ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier 2023-10-30 12:02 ` bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Christopher Baines 2023-11-14 9:19 ` Ludovic Courtès 2023-11-14 9:32 ` Simon Tournier 2023-11-16 12:12 ` [bug#66650] " Ludovic Courtès 2023-11-16 13:24 ` Simon Tournier 2023-11-22 11:17 ` bug#65720: " Ludovic Courtès 2023-11-22 11:57 ` [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier 2023-11-22 11:57 ` Simon Tournier 2023-11-22 16:00 ` bug#66650: " Ludovic Courtès 2023-09-05 8:22 ` Jelle Licht 2023-09-05 14:20 ` Ludovic Courtès 2023-09-05 18:59 ` Simon Tournier 2023-09-05 14:11 ` Ludovic Courtès 2023-09-18 22:35 ` Ludovic Courtès 2023-09-19 7:19 ` Simon Tournier 2023-11-23 11:35 ` Ludovic Courtès
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.