all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* RFC: libgit2 is slow/inefficient; switch to git command?
@ 2022-11-22  2:21 Maxim Cournoyer
  2022-11-22 15:39 ` zimoun
  2022-11-23 22:16 ` Ludovic Courtès
  0 siblings, 2 replies; 8+ messages in thread
From: Maxim Cournoyer @ 2022-11-22  2:21 UTC (permalink / raw)
  To: guix-devel

Hi,

While attempting to bisect against the Linux kernel tree, the
performance of libgit2 quickly became problematic, to the point where
simply cloning the repo became a multiple hours affair, using upward to
3 GiB of RAM for the clone and indexing of the objects (!)

Given that:

* the git CLI doesn't suffer from such poor performance;
* This kind of performance problem has been known for years in libgit2
  [0] with no fix in sight;
* other projects such as Cargo support using the git CLI and that
  projects are using it for that reason [1];

Would it make sense to switch to use the git command directly instead of
calling into this libgit2 C library that ends up being slower?  It would
provide a hefty speed-up when using 'guix refresh' or building new
packages fetched from git without substitutes, or using 'git-checkout',
etc.

What do you think?

[0] https://github.com/libgit2/libgit2/issues/4674
[1] https://github.com/artichoke/playground/pull/700

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22  2:21 RFC: libgit2 is slow/inefficient; switch to git command? Maxim Cournoyer
@ 2022-11-22 15:39 ` zimoun
  2022-11-22 16:49   ` Philip McGrath
  2022-11-23 22:16 ` Ludovic Courtès
  1 sibling, 1 reply; 8+ messages in thread
From: zimoun @ 2022-11-22 15:39 UTC (permalink / raw)
  To: Maxim Cournoyer, guix-devel

Hi,

On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:

> Given that:
>
> * the git CLI doesn't suffer from such poor performance;
> * This kind of performance problem has been known for years in libgit2
>   [0] with no fix in sight;
> * other projects such as Cargo support using the git CLI and that
>   projects are using it for that reason [1];

And I would add the lack of «Support for shallow repositories» [1].

1: <https://github.com/libgit2/libgit2/issues/3058>


> Would it make sense to switch to use the git command directly instead of
> calling into this libgit2 C library that ends up being slower?  It would
> provide a hefty speed-up when using 'guix refresh' or building new
> packages fetched from git without substitutes, or using 'git-checkout',
> etc.

Well, the question is about the closure and the bootstrap.

For instance,

--8<---------------cut here---------------start------------->8---
$ guix size guix | grep 'total:'
total: 629.5 MiB

$ guix size guix git-minimal | grep 'total:'
total: 671.0 MiB
--8<---------------cut here---------------end--------------->8---

which is not nothing but not so worse neither.  However, it would
require a fine scrutinizing about what would be added as dependencies.

The proposal is to fully drop ’guile-git’ and instead run ’(invoke "git"
<thing>)’ as in the module ’(guix build git)’, right?

Cheers,
simon

PS: For the record, Software Heritage, which ingests *a lot* of Git
repositories, relies on Dulwhich [2] (pure Python implementation), IIUC.

2: <https://www.dulwich.io/>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22 15:39 ` zimoun
@ 2022-11-22 16:49   ` Philip McGrath
  2022-11-22 17:51     ` Wojtek Kosior via Development of GNU Guix and the GNU System distribution.
  0 siblings, 1 reply; 8+ messages in thread
From: Philip McGrath @ 2022-11-22 16:49 UTC (permalink / raw)
  To: Brian Cully

Hi,

On Tue, Nov 22, 2022, at 10:39 AM, zimoun wrote:
> Hi,
>
> On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:
>
>> Given that:
>>
>> * the git CLI doesn't suffer from such poor performance;
>> * This kind of performance problem has been known for years in libgit2
>>   [0] with no fix in sight;
>> * other projects such as Cargo support using the git CLI and that
>>   projects are using it for that reason [1];
>
> And I would add the lack of «Support for shallow repositories» [1].
>
> 1: <https://github.com/libgit2/libgit2/issues/3058>
>

>
> PS: For the record, Software Heritage, which ingests *a lot* of Git
> repositories, relies on Dulwhich [2] (pure Python implementation), IIUC.
>
> 2: <https://www.dulwich.io/>

Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work.

Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt
Docs: https://docs.racket-lang.org/net/git-checkout.html

(More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.)

-Philip


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22 16:49   ` Philip McGrath
@ 2022-11-22 17:51     ` Wojtek Kosior via Development of GNU Guix and the GNU System distribution.
  2022-11-22 21:15       ` Phil
  0 siblings, 1 reply; 8+ messages in thread
From: Wojtek Kosior via Development of GNU Guix and the GNU System distribution. @ 2022-11-22 17:51 UTC (permalink / raw)
  To: Philip McGrath; +Cc: Brian Cully

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

Hi,

I just want to add my 2 cents :)

Another issue with libgit2 is that is gives a very misleading error
report when trying to http-clone a repo that only support the old
"dumb" git protocol (as opposed to the newer "smart" one). More details
here[1].

Missing support for that dumb protocol is probably not that big issue
in practice. The misleading error Guix presents to the users trying to
access some poor git repo is :/

Wojtek

[1] https://issues.guix.gnu.org/58555


-- (sig_start)
website: https://koszko.org/koszko.html
PGP: https://koszko.org/key.gpg
fingerprint: E972 7060 E3C5 637C 8A4F  4B42 4BC5 221C 5A79 FD1A

Meet Kraków saints!           #18: blessed Józef Kowalski
Poznaj świętych krakowskich!  #18: błogosławiony Józef Kowalski
https://pl.wikipedia.org/wiki/Józef_Kowalski_(duchowny)
-- (sig_end)


On Tue, 22 Nov 2022 11:49:26 -0500
"Philip McGrath" <philip@philipmcgrath.com> wrote:

> Hi,
> 
> On Tue, Nov 22, 2022, at 10:39 AM, zimoun wrote:
> > Hi,
> >
> > On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:
> >  
> >> Given that:
> >>
> >> * the git CLI doesn't suffer from such poor performance;
> >> * This kind of performance problem has been known for years in libgit2
> >>   [0] with no fix in sight;
> >> * other projects such as Cargo support using the git CLI and that
> >>   projects are using it for that reason [1];  
> >
> > And I would add the lack of «Support for shallow repositories» [1].
> >
> > 1: <https://github.com/libgit2/libgit2/issues/3058>
> >  
> 
> >
> > PS: For the record, Software Heritage, which ingests *a lot* of Git
> > repositories, relies on Dulwhich [2] (pure Python implementation), IIUC.
> >
> > 2: <https://www.dulwich.io/>  
> 
> Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work.
> 
> Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt
> Docs: https://docs.racket-lang.org/net/git-checkout.html
> 
> (More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.)
> 
> -Philip
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22 17:51     ` Wojtek Kosior via Development of GNU Guix and the GNU System distribution.
@ 2022-11-22 21:15       ` Phil
  2022-11-23  9:57         ` zimoun
  2022-11-23 22:04         ` Ludovic Courtès
  0 siblings, 2 replies; 8+ messages in thread
From: Phil @ 2022-11-22 21:15 UTC (permalink / raw)
  To: Wojtek Kosior; +Cc: Philip McGrath, guix-devel

Hi,

Wojtek Kosior via Development of GNU Guix and the GNU System distribution. writes:

> Hi,
>
> I just want to add my 2 cents :)

Just to add mine too - libgit2 behaves differently to command-line git
in ways which can make guix do unexpected things when caching clones in
certain cases.  This has resulted in some hard to diagnose issues with
using guix to build PRs for example.

In particular we were forced to make this change to our local guix build
to ensure that guix behaved inline with git:
https://github.com/guix-mirror/guix/commit/473954dd92bbb84693b6fa3f007752eb53c804db

An explanation of why, was raised with libgit2:
https://github.com/libgit2/libgit2/issues/6183

The original guix-devel discussion here:
https://lists.gnu.org/archive/html/guix-devel/2022-03/msg00021.html

This particular issue is somewhat niche - but it demonstrates well the
danger of assuming the libgit2 and git behave in the same way!

This makes me a bit wary of using libgit2 now.

Cheers,
Phil.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22 21:15       ` Phil
@ 2022-11-23  9:57         ` zimoun
  2022-11-23 22:04         ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: zimoun @ 2022-11-23  9:57 UTC (permalink / raw)
  To: Phil, Wojtek Kosior; +Cc: Philip McGrath, guix-devel

Hi,

On Tue, 22 Nov 2022 at 21:15, Phil <phil@beadling.co.uk> wrote:

> Just to add mine too - libgit2 behaves differently to command-line git
> in ways which can make guix do unexpected things when caching clones in
> certain cases.  This has resulted in some hard to diagnose issues with
> using guix to build PRs for example.

[...]

> An explanation of why, was raised with libgit2:
> https://github.com/libgit2/libgit2/issues/6183

To avoid people to follow various links, this issue #6183 is another
manifestation of issue #3361 (still open since Aug 6, 2015).  Quoting:

        Git has recently changed the refspec matching rules to allow
        refspecs such as

            refs/heads/o*:refs/remotes/heads/i*

        We still perform checks using the old rules by which the glob
        must always match a full path element.

        https://github.com/libgit2/libgit2/issues/3361

However, pleas note that only 3 issues with the tag ’git compatibility’
are currently open [1]–well three when 2 are related.

1: <https://github.com/libgit2/libgit2/labels/git%20compatibility>

Cheers,
simon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22 21:15       ` Phil
  2022-11-23  9:57         ` zimoun
@ 2022-11-23 22:04         ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2022-11-23 22:04 UTC (permalink / raw)
  To: Phil; +Cc: Wojtek Kosior, Philip McGrath, guix-devel

Hi,

Phil <phil@beadling.co.uk> skribis:

> In particular we were forced to make this change to our local guix build
> to ensure that guix behaved inline with git:
> https://github.com/guix-mirror/guix/commit/473954dd92bbb84693b6fa3f007752eb53c804db
>
> An explanation of why, was raised with libgit2:
> https://github.com/libgit2/libgit2/issues/6183
>
> The original guix-devel discussion here:
> https://lists.gnu.org/archive/html/guix-devel/2022-03/msg00021.html

I didn’t follow that but perhaps there’s something we should do in Guix
proper?  Maybe not exactly this change though, because it might be a
performance hit, but it’s worth discussing.

> This particular issue is somewhat niche - but it demonstrates well the
> danger of assuming the libgit2 and git behave in the same way!

Whichever implementation we use is going to behave differently from Git
in some cases (unless we use Git, that is).  But I think that’s okay, we
can smooth out issues.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: libgit2 is slow/inefficient; switch to git command?
  2022-11-22  2:21 RFC: libgit2 is slow/inefficient; switch to git command? Maxim Cournoyer
  2022-11-22 15:39 ` zimoun
@ 2022-11-23 22:16 ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2022-11-23 22:16 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: guix-devel

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> While attempting to bisect against the Linux kernel tree, the
> performance of libgit2 quickly became problematic, to the point where
> simply cloning the repo became a multiple hours affair, using upward to
> 3 GiB of RAM for the clone and indexing of the objects (!)

Did you confirm with a pure Guile-Git snippet that calls ‘clone’ that
this is the behavior observed?

> Given that:
>
> * the git CLI doesn't suffer from such poor performance;
> * This kind of performance problem has been known for years in libgit2
>   [0] with no fix in sight;

This reports talks about 5x wall-clock time, which is obviously not
great, but it doesn’t talk about memory usage, does it?

It talks about SHAttered though; that’s a key consideration to make sure
we’re doing an apples-to-apples comparison.

> * other projects such as Cargo support using the git CLI and that
>   projects are using it for that reason [1];

Should we follow Cargo’s lead for packaging as well?  :-)

> Would it make sense to switch to use the git command directly instead of
> calling into this libgit2 C library that ends up being slower?  It would
> provide a hefty speed-up when using 'guix refresh' or building new
> packages fetched from git without substitutes, or using 'git-checkout',
> etc.
>
> What do you think?

I think that’s not an option.  The level of integration we have in (guix
git), (guix channels), etc. is not achievable by shelling out to ‘git’.

"Philip McGrath" <philip@philipmcgrath.com> skribis:

> Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work.
>
> Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt
> Docs: https://docs.racket-lang.org/net/git-checkout.html

That sounds like a worthy avenue; support for shallow clones would
already be an improvement.

> (More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.)

Same here.  The way I see it, we could gradually move bits of Guile-Git
to being pure Scheme.  So perhaps the first step would be to provide a
pure Scheme ‘clone’ based on the Racket code above?

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-23 22:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22  2:21 RFC: libgit2 is slow/inefficient; switch to git command? Maxim Cournoyer
2022-11-22 15:39 ` zimoun
2022-11-22 16:49   ` Philip McGrath
2022-11-22 17:51     ` Wojtek Kosior via Development of GNU Guix and the GNU System distribution.
2022-11-22 21:15       ` Phil
2022-11-23  9:57         ` zimoun
2022-11-23 22:04         ` Ludovic Courtès
2022-11-23 22:16 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.