unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
       [not found] <87jzswsrlt.fsf@gnu.org>
@ 2023-10-20 16:15 ` Ludovic Courtès
  2023-10-23 10:08   ` Simon Tournier
       [not found]   ` <87sf5swc3j.fsf@cbaines.net>
  0 siblings, 2 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-20 16:15 UTC (permalink / raw)
  To: guix-patches
  Cc: Ludovic Courtès, 65720, Josselin Poiret, Simon Tournier,
	Christopher Baines, Josselin Poiret, Ludovic Courtès,
	Mathieu Othacehe, Ricardo Wurmus, Simon Tournier,
	Tobias Geerinckx-Rice

Fixes <https://issues.guix.gnu.org/65720>.

This fixes a bug whereby libgit2-managed checkouts would keep growing as
we fetch.

* guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
procedures.
(update-cached-checkout): Use it.
---
 guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

Hi!

This is a radical fix/workaround for the unbounded Git checkout growth
problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
pack files around).

I thought we might be able to implement a ‘git gc’ approximation using
the libgit2 “packbuilder” interface, but I haven’t got around to doing
it: <https://libgit2.org/libgit2/#HEAD/search/pack>.

Once again, shelling out is not my favorite option, but it’s a bug we
should fix sooner rather than later, hence this compromise.

Thoughts?

Ludo’.

diff --git a/guix/git.scm b/guix/git.scm
index b7182305cf..d704b62333 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
-;;; Copyright © 2018-2022 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018-2023 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2021 Kyle Meyer <kyle@kyleam.com>
 ;;; Copyright © 2021 Marius Bakke <marius@gnu.org>
 ;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
@@ -29,15 +29,16 @@ (define-module (guix git)
   #:use-module (guix cache)
   #:use-module (gcrypt hash)
   #:use-module ((guix build utils)
-                #:select (mkdir-p delete-file-recursively))
+                #:select (mkdir-p delete-file-recursively invoke/quiet))
   #:use-module (guix store)
   #:use-module (guix utils)
   #:use-module (guix records)
   #:use-module (guix gexp)
   #:autoload   (guix git-download)
   (git-reference-url git-reference-commit git-reference-recursive?)
+  #:autoload   (guix config) (%git)
   #:use-module (guix sets)
-  #:use-module ((guix diagnostics) #:select (leave warning))
+  #:use-module ((guix diagnostics) #:select (leave warning info))
   #:use-module (guix progress)
   #:autoload   (guix swh) (swh-download commit-id?)
   #:use-module (rnrs bytevectors)
@@ -428,6 +429,35 @@ (define (delete-checkout directory)
     (rename-file directory trashed)
     (delete-file-recursively trashed)))
 
+(define (packs-in-git-repository directory)
+  "Return the number of pack files under DIRECTORY, a Git checkout."
+  (catch 'system-error
+    (lambda ()
+      (let ((directory (opendir (in-vicinity directory ".git/objects/pack"))))
+        (let loop ((count 0))
+          (match (readdir directory)
+            ((? eof-object?)
+             (closedir directory)
+             count)
+            (str
+             (loop (if (string-suffix? ".pack" str)
+                       (+ 1 count)
+                       count)))))))
+    (const 0)))
+
+(define (maybe-run-git-gc directory)
+  "Run 'git gc' in DIRECTORY if needed."
+  ;; XXX: As of libgit2 1.3.x (used by Guile-Git), there's no support for GC.
+  ;; Each time a checkout is pulled, a new pack is created, which eventually
+  ;; takes up a lot of space (lots of small, poorly-compressed packs).  As a
+  ;; workaround, shell out to 'git gc' when the number of packs in a
+  ;; repository has become "too large", potentially wasting a lot of space.
+  ;; See <https://issues.guix.gnu.org/65720>.
+  (when (> (packs-in-git-repository directory) 25)
+    (info (G_ "compressing cached Git repository at '~a'...~%")
+          directory)
+    (invoke/quiet %git "-C" directory "gc")))
+
 (define* (update-cached-checkout url
                                  #:key
                                  (ref '())
@@ -515,6 +545,9 @@ (define* (update-cached-checkout url
                    seconds seconds
                    nanoseconds nanoseconds))))
 
+       ;; Run 'git gc' if needed.
+       (maybe-run-git-gc cache-directory)
+
        ;; When CACHE-DIRECTORY is a sub-directory of the default cache
        ;; directory, remove expired checkouts that are next to it.
        (let ((parent (dirname cache-directory)))

base-commit: 6b0a32196982a0a2f4dbb59d35e55833a5545ac6
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
@ 2023-10-23 10:08   ` Simon Tournier
  2023-10-23 22:27     ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
       [not found]   ` <87sf5swc3j.fsf@cbaines.net>
  1 sibling, 1 reply; 8+ messages in thread
From: Simon Tournier @ 2023-10-23 10:08 UTC (permalink / raw)
  To: Ludovic Courtès, guix-patches
  Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès,
	Tobias Geerinckx-Rice, Ricardo Wurmus, Christopher Baines

Hi Ludo,

On Fri, 20 Oct 2023 at 18:15, Ludovic Courtès <ludo@gnu.org> wrote:

> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
> procedures.
> (update-cached-checkout): Use it.
> ---
>  guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)

LGTM.  Just two colors for the bikeshed. :-)


> +  (when (> (packs-in-git-repository directory) 25)

Why 25?  And not 10 or 50 or 100?


>  (define* (update-cached-checkout url
>                                   #:key
>                                   (ref '())
> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url
>                     seconds seconds
>                     nanoseconds nanoseconds))))
>  
> +       ;; Run 'git gc' if needed.
> +       (maybe-run-git-gc cache-directory)

Why not trigger it by “guix gc”?

Well, I expect “guix gc” to take some time and I choose when.  However,
I want “guix pull” or “guix time-machine” to be as fast as possible and
here some extra time is added, and I cannot control exactly when.

Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-10-23 10:08   ` Simon Tournier
@ 2023-10-23 22:27     ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
  2023-10-23 23:28       ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
  0 siblings, 1 reply; 8+ messages in thread
From: Tobias Geerinckx-Rice via Bug reports for GNU Guix @ 2023-10-23 22:27 UTC (permalink / raw)
  To: Simon Tournier, Ludovic Courtès, guix-patches
  Cc: Ricardo Wurmus, Christopher Baines, Josselin Poiret, 65720,
	Mathieu Othacehe

>Why not trigger it by “guix gc”?

Unless there's a new option I missed, guix gc doesn't handle this.

>Well, I expect “guix gc” to take some time and I choose when.  However,
>I want “guix pull” or “guix time-machine” to be as fast as possible

I don't think that things should be pushed into guix gc merely because they are slow.

This is not a great post (I'd look at the git code if I were at a computer) but I remember git printing something like 'optimising repository in the background'.  Maybe something similar would be appropriate here, to better hide such housekeeping from the user.


Kind regards,

T G-R

Sent on the go.  Excuse or enjoy my brevity.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#65720: Guile-Git-managed checkouts grow way too much
  2023-10-23 22:27     ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
@ 2023-10-23 23:28       ` Simon Tournier
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Tournier @ 2023-10-23 23:28 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice
  Cc: Josselin Poiret, 65720, Mathieu Othacehe, Ludovic Courtès,
	Ricardo Wurmus, Christopher Baines, guix-patches

Hi,

On Mon, 23 Oct 2023 at 22:27, Tobias Geerinckx-Rice <me@tobias.gr> wrote:

>>Why not trigger it by “guix gc”?
>
> Unless there's a new option I missed, guix gc doesn't handle this.

Maybe I missed something but “guix gc” handles what we implement, no? :-)

Well, I run “guix gc” when I need some space.  And this
“maybe-run-git-gc” does exactly that: collect some spaces when I need
them.

For me, they are part of “guix gc” and not part of some update.


Aside, re-thinking about other features, I am consistent with other
comments I made when introducing ’maybe-remove-expired-cache-entries’;
see <https://issues.guix.gnu.org/45327#4>.  And consistent because most
probably I still think the same: cache cleanup should be handled by
“guix gc” and not by the commands themselves.  And maybe we are having
the same discussion. ;-)


>>Well, I expect “guix gc” to take some time and I choose when.  However,
>>I want “guix pull” or “guix time-machine” to be as fast as possible
>
> I don't think that things should be pushed into guix gc merely because
> they are slow.

Maybe I misread, somehow it appears to me that you miss the key part: I
choose when some extra work is done and I keep “guix pull” and “guix
time-machine” as fast as possible.


Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
       [not found]       ` <87v8a4el3a.fsf@gmail.com>
@ 2023-11-16 12:12         ` Ludovic Courtès
  2023-11-16 13:24           ` Simon Tournier
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2023-11-16 12:12 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650

Hi,

Simon Tournier <zimon.toutoune@gmail.com> skribis:

>>> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
>>> procedures.
>>> (update-cached-checkout): Use it.
>>> ---
>>>  guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>>>  1 file changed, 36 insertions(+), 3 deletions(-)
>
> LGTM.

Thanks!

> Just two colors for the bikeshed. :-)
>
>
>>> +  (when (> (packs-in-git-repository directory) 25)
>
> Why 25?  And not 10 or 50 or 100?

Totally arbitrary.  :-)  I sampled the checkouts I had on my laptop and
that seems like a reasonable heuristic.  In particular, it seems that
Git-managed checkouts never have this many packs; only libgit2-managed
checkouts do, precisely because libgit2 doesn’t repack/GC.

>>> +       ;; Run 'git gc' if needed.
>>> +       (maybe-run-git-gc cache-directory)
>
> Why not trigger it by “guix gc”?

Because so far the idea is that ~/.cache/guix/checkouts is automatically
managed without user intervention; it’s really a cache in that sense.

> Well, I expect “guix gc” to take some time and I choose when.  However,
> I want “guix pull” or “guix time-machine” to be as fast as possible and
> here some extra time is added, and I cannot control exactly when.

Yes, I see.  The thing is ‘maybe-run-git-gc’ is only called on the slow
path; so for example, it’s not called on a ‘time-machine’ cache hit, but
only on a cache miss, which is already expensive anyway.

Does that make sense?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary.
  2023-11-16 12:12         ` [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
@ 2023-11-16 13:24           ` Simon Tournier
       [not found]             ` <874jhem3z0.fsf@gnu.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Tournier @ 2023-11-16 13:24 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650

Hi,

On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo@gnu.org> wrote:

> > Well, I expect “guix gc” to take some time and I choose when.  However,
> > I want “guix pull” or “guix time-machine” to be as fast as possible and
> > here some extra time is added, and I cannot control exactly when.
>
> Yes, I see.  The thing is ‘maybe-run-git-gc’ is only called on the slow
> path; so for example, it’s not called on a ‘time-machine’ cache hit, but
> only on a cache miss, which is already expensive anyway.

What you mean as "only called on the slow path" is each time
'update-cached-checkout' is called, right?  So, somehow when
'maybe-run-git-gc' is called appears to me "unpredictable".  But
anyway. :-)

Let move it elsewhere if I am really annoyed.

Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much
       [not found]             ` <874jhem3z0.fsf@gnu.org>
@ 2023-11-22 11:57               ` Simon Tournier
  2023-11-22 16:00                 ` bug#66650: " Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Tournier @ 2023-11-22 11:57 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Josselin Poiret, Christopher Baines, 65720, 66650

Hi Ludo,

Thanks for explaining.

On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès <ludo@gnu.org> wrote:

>                                   it’s rarely going to fire.

[...]

>> Let move it elsewhere if I am really annoyed.
>
> :-/

Sorry, I poorly worded my last comment. :-)

Somehow I was expressing: my view probably falls into the “Premature
optimization is the root of all evil” category.  Other said, I have no
objection and I will revisit the issue when I will be on fire, if I am,
or annoyed for real.

Cheers,
simon

PS:

Aside this patch:

>> So, somehow when 'maybe-run-git-gc' is called appears to me
>> "unpredictable".  But anyway. :-)
>
> Sure, but the way I see it, that’s the nature of caches.

What makes cache unpredictable is their current state.  However, this
does not imply that *all* the actions modifying from one state to
another must also be triggered in unpredictable moment.

For instance, I choose when I wash family’s clothes and the wash-machine
does not start by itself when the unpredictable stack of family’s dirty
clothes is enough.  Because, maybe today it’s rainy so drying is
difficult and tomorrow will be sunny so it will be a better moment. :-)

For me, “guix gc” should be the driver for cleaning all the various Guix
caches.  Anyway. :-D




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66650: bug#65720: Guile-Git-managed checkouts grow way too much
  2023-11-22 11:57               ` [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
@ 2023-11-22 16:00                 ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-11-22 16:00 UTC (permalink / raw)
  To: Simon Tournier
  Cc: Josselin Poiret, Christopher Baines, 65720-done, 66650-done

Hi,

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> Somehow I was expressing: my view probably falls into the “Premature
> optimization is the root of all evil” category.  Other said, I have no
> objection and I will revisit the issue when I will be on fire, if I am,
> or annoyed for real.

Alright!

Pushed as b150c546b04c9ebb09de9f2c39789221054f5eea.

Let’s see how it behaves and if there are problems we had overlooked…

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-22 16:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87jzswsrlt.fsf@gnu.org>
2023-10-20 16:15 ` bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
2023-10-23 10:08   ` Simon Tournier
2023-10-23 22:27     ` Tobias Geerinckx-Rice via Bug reports for GNU Guix
2023-10-23 23:28       ` bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
     [not found]   ` <87sf5swc3j.fsf@cbaines.net>
     [not found]     ` <87o7fwae0q.fsf@gnu.org>
     [not found]       ` <87v8a4el3a.fsf@gmail.com>
2023-11-16 12:12         ` [bug#66650] bug#65720: [bug#66650] [PATCH] git: Shell out to ‘git gc’ when necessary Ludovic Courtès
2023-11-16 13:24           ` Simon Tournier
     [not found]             ` <874jhem3z0.fsf@gnu.org>
2023-11-22 11:57               ` [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much Simon Tournier
2023-11-22 16:00                 ` bug#66650: " Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).