Hi, While attempting to bisect against the Linux kernel tree, the performance of libgit2 quickly became problematic, to the point where simply cloning the repo became a multiple hours affair, using upward to 3 GiB of RAM for the clone and indexing of the objects (!) Given that: * the git CLI doesn't suffer from such poor performance; * This kind of performance problem has been known for years in libgit2 [0] with no fix in sight; * other projects such as Cargo support using the git CLI and that projects are using it for that reason [1]; Would it make sense to switch to use the git command directly instead of calling into this libgit2 C library that ends up being slower? It would provide a hefty speed-up when using 'guix refresh' or building new packages fetched from git without substitutes, or using 'git-checkout', etc. What do you think? [0] https://github.com/libgit2/libgit2/issues/4674 [1] https://github.com/artichoke/playground/pull/700 -- Thanks, Maxim
Hi, On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote: > Given that: > > * the git CLI doesn't suffer from such poor performance; > * This kind of performance problem has been known for years in libgit2 > [0] with no fix in sight; > * other projects such as Cargo support using the git CLI and that > projects are using it for that reason [1]; And I would add the lack of «Support for shallow repositories» [1]. 1: <https://github.com/libgit2/libgit2/issues/3058> > Would it make sense to switch to use the git command directly instead of > calling into this libgit2 C library that ends up being slower? It would > provide a hefty speed-up when using 'guix refresh' or building new > packages fetched from git without substitutes, or using 'git-checkout', > etc. Well, the question is about the closure and the bootstrap. For instance, --8<---------------cut here---------------start------------->8--- $ guix size guix | grep 'total:' total: 629.5 MiB $ guix size guix git-minimal | grep 'total:' total: 671.0 MiB --8<---------------cut here---------------end--------------->8--- which is not nothing but not so worse neither. However, it would require a fine scrutinizing about what would be added as dependencies. The proposal is to fully drop ’guile-git’ and instead run ’(invoke "git" <thing>)’ as in the module ’(guix build git)’, right? Cheers, simon PS: For the record, Software Heritage, which ingests *a lot* of Git repositories, relies on Dulwhich [2] (pure Python implementation), IIUC. 2: <https://www.dulwich.io/>
Hi, On Tue, Nov 22, 2022, at 10:39 AM, zimoun wrote: > Hi, > > On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote: > >> Given that: >> >> * the git CLI doesn't suffer from such poor performance; >> * This kind of performance problem has been known for years in libgit2 >> [0] with no fix in sight; >> * other projects such as Cargo support using the git CLI and that >> projects are using it for that reason [1]; > > And I would add the lack of «Support for shallow repositories» [1]. > > 1: <https://github.com/libgit2/libgit2/issues/3058> > > > PS: For the record, Software Heritage, which ingests *a lot* of Git > repositories, relies on Dulwhich [2] (pure Python implementation), IIUC. > > 2: <https://www.dulwich.io/> Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work. Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt Docs: https://docs.racket-lang.org/net/git-checkout.html (More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.) -Philip
[-- Attachment #1: Type: text/plain, Size: 2478 bytes --] Hi, I just want to add my 2 cents :) Another issue with libgit2 is that is gives a very misleading error report when trying to http-clone a repo that only support the old "dumb" git protocol (as opposed to the newer "smart" one). More details here[1]. Missing support for that dumb protocol is probably not that big issue in practice. The misleading error Guix presents to the users trying to access some poor git repo is :/ Wojtek [1] https://issues.guix.gnu.org/58555 -- (sig_start) website: https://koszko.org/koszko.html PGP: https://koszko.org/key.gpg fingerprint: E972 7060 E3C5 637C 8A4F 4B42 4BC5 221C 5A79 FD1A Meet Kraków saints! #18: blessed Józef Kowalski Poznaj świętych krakowskich! #18: błogosławiony Józef Kowalski https://pl.wikipedia.org/wiki/Józef_Kowalski_(duchowny) -- (sig_end) On Tue, 22 Nov 2022 11:49:26 -0500 "Philip McGrath" <philip@philipmcgrath.com> wrote: > Hi, > > On Tue, Nov 22, 2022, at 10:39 AM, zimoun wrote: > > Hi, > > > > On Mon, 21 Nov 2022 at 21:21, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote: > > > >> Given that: > >> > >> * the git CLI doesn't suffer from such poor performance; > >> * This kind of performance problem has been known for years in libgit2 > >> [0] with no fix in sight; > >> * other projects such as Cargo support using the git CLI and that > >> projects are using it for that reason [1]; > > > > And I would add the lack of «Support for shallow repositories» [1]. > > > > 1: <https://github.com/libgit2/libgit2/issues/3058> > > > > > > > PS: For the record, Software Heritage, which ingests *a lot* of Git > > repositories, relies on Dulwhich [2] (pure Python implementation), IIUC. > > > > 2: <https://www.dulwich.io/> > > Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work. > > Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt > Docs: https://docs.racket-lang.org/net/git-checkout.html > > (More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.) > > -Philip > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --]
Hi, Wojtek Kosior via Development of GNU Guix and the GNU System distribution. writes: > Hi, > > I just want to add my 2 cents :) Just to add mine too - libgit2 behaves differently to command-line git in ways which can make guix do unexpected things when caching clones in certain cases. This has resulted in some hard to diagnose issues with using guix to build PRs for example. In particular we were forced to make this change to our local guix build to ensure that guix behaved inline with git: https://github.com/guix-mirror/guix/commit/473954dd92bbb84693b6fa3f007752eb53c804db An explanation of why, was raised with libgit2: https://github.com/libgit2/libgit2/issues/6183 The original guix-devel discussion here: https://lists.gnu.org/archive/html/guix-devel/2022-03/msg00021.html This particular issue is somewhat niche - but it demonstrates well the danger of assuming the libgit2 and git behave in the same way! This makes me a bit wary of using libgit2 now. Cheers, Phil.
Hi, On Tue, 22 Nov 2022 at 21:15, Phil <phil@beadling.co.uk> wrote: > Just to add mine too - libgit2 behaves differently to command-line git > in ways which can make guix do unexpected things when caching clones in > certain cases. This has resulted in some hard to diagnose issues with > using guix to build PRs for example. [...] > An explanation of why, was raised with libgit2: > https://github.com/libgit2/libgit2/issues/6183 To avoid people to follow various links, this issue #6183 is another manifestation of issue #3361 (still open since Aug 6, 2015). Quoting: Git has recently changed the refspec matching rules to allow refspecs such as refs/heads/o*:refs/remotes/heads/i* We still perform checks using the old rules by which the glob must always match a full path element. https://github.com/libgit2/libgit2/issues/3361 However, pleas note that only 3 issues with the tag ’git compatibility’ are currently open [1]–well three when 2 are related. 1: <https://github.com/libgit2/libgit2/labels/git%20compatibility> Cheers, simon
Hi, Phil <phil@beadling.co.uk> skribis: > In particular we were forced to make this change to our local guix build > to ensure that guix behaved inline with git: > https://github.com/guix-mirror/guix/commit/473954dd92bbb84693b6fa3f007752eb53c804db > > An explanation of why, was raised with libgit2: > https://github.com/libgit2/libgit2/issues/6183 > > The original guix-devel discussion here: > https://lists.gnu.org/archive/html/guix-devel/2022-03/msg00021.html I didn’t follow that but perhaps there’s something we should do in Guix proper? Maybe not exactly this change though, because it might be a performance hit, but it’s worth discussing. > This particular issue is somewhat niche - but it demonstrates well the > danger of assuming the libgit2 and git behave in the same way! Whichever implementation we use is going to behave differently from Git in some cases (unless we use Git, that is). But I think that’s okay, we can smooth out issues. Thanks, Ludo’.
Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > While attempting to bisect against the Linux kernel tree, the > performance of libgit2 quickly became problematic, to the point where > simply cloning the repo became a multiple hours affair, using upward to > 3 GiB of RAM for the clone and indexing of the objects (!) Did you confirm with a pure Guile-Git snippet that calls ‘clone’ that this is the behavior observed? > Given that: > > * the git CLI doesn't suffer from such poor performance; > * This kind of performance problem has been known for years in libgit2 > [0] with no fix in sight; This reports talks about 5x wall-clock time, which is obviously not great, but it doesn’t talk about memory usage, does it? It talks about SHAttered though; that’s a key consideration to make sure we’re doing an apples-to-apples comparison. > * other projects such as Cargo support using the git CLI and that > projects are using it for that reason [1]; Should we follow Cargo’s lead for packaging as well? :-) > Would it make sense to switch to use the git command directly instead of > calling into this libgit2 C library that ends up being slower? It would > provide a hefty speed-up when using 'guix refresh' or building new > packages fetched from git without substitutes, or using 'git-checkout', > etc. > > What do you think? I think that’s not an option. The level of integration we have in (guix git), (guix channels), etc. is not achievable by shelling out to ‘git’. "Philip McGrath" <philip@philipmcgrath.com> skribis: > Along those lines, there’s an implementation of clone/checkout in pure Racket (for the package manager) that could probably be ported to Guile relatively easily. I’d expect libgit2 to be faster for the things that it supports, but the Racket implementation does support shallow checkout, so it might pay off if that skips a lot of work. > > Code: https://github.com/racket/racket/blob/master/racket/collects/net/git-checkout.rkt > Docs: https://docs.racket-lang.org/net/git-checkout.html That sounds like a worthy avenue; support for shallow clones would already be an improvement. > (More broadly, I haven’t investigated performance issues, but my basic inclination would be toward improving libgit2 over running the git executable.) Same here. The way I see it, we could gradually move bits of Guile-Git to being pure Scheme. So perhaps the first step would be to provide a pure Scheme ‘clone’ based on the Racket code above? Thanks, Ludo’.