unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Random idea about speeding up guix pull
@ 2017-09-03 14:27 Hartmut Goebel
  2017-09-03 14:38 ` ng0
  2017-09-04 15:01 ` Ludovic Courtès
  0 siblings, 2 replies; 9+ messages in thread
From: Hartmut Goebel @ 2017-09-03 14:27 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

Hi,

I've seen in Ludo's slides that speeding up guix pull is topic. Here is
a random idea on the:

"git pull" can probably be speed up by using something like

    git init .
    git remote add …
    git fetch --depth=1 origin master
    git checkout FETCH_HEAD

This will only download the top-most commit resp. commit-state.

From my mostly up-t-date clone, this method downloads only 1559 objects
and 'du -s .git' reports 13M – compared to "git pull" downloading 133284
objects and taking 49M.

We could use this for downloading sourcce-code via git (git-download).

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |


[-- Attachment #2: 0xBF773B65.asc --]
[-- Type: application/pgp-keys, Size: 14855 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-03 14:27 Random idea about speeding up guix pull Hartmut Goebel
@ 2017-09-03 14:38 ` ng0
  2017-09-04 15:01 ` Ludovic Courtès
  1 sibling, 0 replies; 9+ messages in thread
From: ng0 @ 2017-09-03 14:38 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

Hartmut Goebel transcribed 15K bytes:
> Hi,
> 
> I've seen in Ludo's slides that speeding up guix pull is topic. Here is
> a random idea on the:
> 
> "git pull" can probably be speed up by using something like
> 
>     git init .
>     git remote add …
>     git fetch --depth=1 origin master
>     git checkout FETCH_HEAD
> 
> This will only download the top-most commit resp. commit-state.
> 
> From my mostly up-t-date clone, this method downloads only 1559 objects
> and 'du -s .git' reports 13M – compared to "git pull" downloading 133284
> objects and taking 49M.

Yes, that would make many git clones take less space.

> We could use this for downloading sourcce-code via git (git-download).

Andy Wingo has proposed this in the past and had a patch which once
upon a time in 2015 worked. If you are motivated enough to adjust it,
it's still on the list but git-download and the other file it touches
has been changed very much since 2015.

> -- 
> Regards
> Hartmut Goebel
> 
> | Hartmut Goebel          | h.goebel@crazy-compilers.com               |
> | www.crazy-compilers.com | compilers which you thought are impossible |
> 

pub   RSA 4096/BF773B65 2013-10-05 Hartmut Goebel <h.goebel@goebel-consult.de>
> sub   RSA 4096/DDEAFF1A 2013-10-05
> > 


-- 
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://n0is.noblogs.org/my-keys
https://www.infotropique.org https://krosos.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-03 14:27 Random idea about speeding up guix pull Hartmut Goebel
  2017-09-03 14:38 ` ng0
@ 2017-09-04 15:01 ` Ludovic Courtès
  2017-09-04 15:39   ` Hartmut Goebel
  1 sibling, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2017-09-04 15:01 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

Heya,

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

> I've seen in Ludo's slides that speeding up guix pull is topic. Here is
> a random idea on the:
>
> "git pull" can probably be speed up by using something like
>
>     git init .
>     git remote add …
>     git fetch --depth=1 origin master
>     git checkout FETCH_HEAD
>
> This will only download the top-most commit resp. commit-state.

That’s roughly what ‘guix pull’ does nowadays, now that it uses
Guile-Git.

The problem is elsewhere: it’s compiling Guix’s Scheme code that takes
ages, in particular since we switch to Guile 2.2 (Guile 2.2’s fancy
compiler gives us significant speedups at run time on core Guix, but
it’s also slower when compiling simple code like package definitions.)

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-04 15:01 ` Ludovic Courtès
@ 2017-09-04 15:39   ` Hartmut Goebel
  2017-09-04 21:56     ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Hartmut Goebel @ 2017-09-04 15:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Am 04.09.2017 um 17:01 schrieb Ludovic Courtès:
> That’s roughly what ‘guix pull’ does nowadays, now that it uses
> Guile-Git.

Does it? I only found the call to `remote-fetch` in guix/git.scm, which
is not passed any option to.

The trick is to use `--depth=1` and fetch the one, expected commit, tag
or branch-head.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-04 15:39   ` Hartmut Goebel
@ 2017-09-04 21:56     ` Ludovic Courtès
  2017-09-05 12:23       ` Hartmut Goebel
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2017-09-04 21:56 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

> Am 04.09.2017 um 17:01 schrieb Ludovic Courtès:
>> That’s roughly what ‘guix pull’ does nowadays, now that it uses
>> Guile-Git.
>
> Does it? I only found the call to `remote-fetch` in guix/git.scm, which
> is not passed any option to.
>
> The trick is to use `--depth=1` and fetch the one, expected commit, tag
> or branch-head.

Oh right, it doesn’t do that.

What it does do is maintain a cached checkout in ~/.cache/guix/pull,
which makes subsequent pulls much faster.

Does that make sense?

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-04 21:56     ` Ludovic Courtès
@ 2017-09-05 12:23       ` Hartmut Goebel
  2017-09-05 14:33         ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Hartmut Goebel @ 2017-09-05 12:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3264 bytes --]

Am 04.09.2017 um 23:56 schrieb Ludovic Courtès:
> What it does do is maintain a cached checkout in ~/.cache/guix/pull,
> which makes subsequent pulls much faster.

Summary ( TL;DR):

  * "guix pull" should use "git fetch master"
  * "guix download" we can keep the current behaviour

I did a series of tests

  * - "fetch" without any argument will fetch *all* data from *all*
    branches.
  * - "fetch master" only fetches data living on "master", other
    branches are ignored

I compared the data fetched for a repo with status of 6bd1c41e8
(yesterday 05:29):

  * - "fetch" fetches 1000K
  * - "fetch master" fetches 755K
  * - "fetch --depth=1 master" fetches 588K (but see below)

I did some more tests (see results below and  attached script) and had
the following insights:

  * if not checking out FETCH_HEAD after fetch, the next fetch will
    download all data again (compare "fetch by ref" with "fetch by ref +
    checkout"
  * --depth=1 will download the *whole* state (at the given ref), no
    matter how many of the data is already here (compare "fetch by ref +
    checkout" with "fetch --depth=1 by ref + checkout")
  * I was not able to create a test-case where "fetch --depth=1 master"
    would only fetch parts of the data – so this contradicts the results
    when updating from 6bd1c41e8.

I suggest to make "guix pull" to fetch only from "master", since this
already reduces the since of downloaded data.

For guix download we don't (need to) cache former downloads, thus
"--depth=1 <commit>" would suffice. Unfortunately this only works for
branches and tags, not for commit-ids (see "man git-fetch-pack" for
exceptions). But most current package definitions are based on
commit-ids. Thus it is not worth trying "--depth=1 <commit>" first.

cloned repo ---------------
size 45M

fetch all ------------------
size 45M

fetch by ref ------------------
size v0.11.0    26M
size v0.12.0    32M
size v0.13.0    40M
size marker-1   45M
size marker-2   45M
size marker-3   45M
size marker-4   45M
size marker-5   45M
size master     45M

fetch by ref + checkout ------------------
size v0.11.0    26M
size v0.12.0    11M
size v0.13.0    12M
size marker-1   8,9M
size marker-2   1,1M
size marker-3   856K
size marker-4   856K
size marker-5   1,1M
size master     1,1M

fetch --depth=1 by ref ------------------
size v0.11.0    9,8M
size v0.12.0    11M
size v0.13.0    13M
size marker-1   13M
size marker-2   13M
size marker-3   13M
size marker-4   13M
size marker-5   13M
size master     13M

fetch --depth=1 by ref + checkout ------------------
size v0.11.0    9,8M
size v0.12.0    3,8M
size v0.13.0    5,6M
size marker-1   4,1M
size marker-2   4,1M
size marker-3   4,1M
size marker-4   4,1M
size marker-5   4,1M
size master     4,1M

fetch older all and master with --depth=1 by ref + checkout
------------------
size master     45M
size master     45M

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |


[-- Attachment #2: test-fetch.sh --]
[-- Type: application/x-shellscript, Size: 2590 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-05 12:23       ` Hartmut Goebel
@ 2017-09-05 14:33         ` Ludovic Courtès
  2017-09-05 14:51           ` Hartmut Goebel
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2017-09-05 14:33 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

> Am 04.09.2017 um 23:56 schrieb Ludovic Courtès:
>> What it does do is maintain a cached checkout in ~/.cache/guix/pull,
>> which makes subsequent pulls much faster.
>
> Summary ( TL;DR):
>
>   * "guix pull" should use "git fetch master"
>   * "guix download" we can keep the current behaviour
>
> I did a series of tests
>
>   * - "fetch" without any argument will fetch *all* data from *all*
>     branches.
>   * - "fetch master" only fetches data living on "master", other
>     branches are ignored
>
> I compared the data fetched for a repo with status of 6bd1c41e8
> (yesterday 05:29):
>
>   * - "fetch" fetches 1000K
>   * - "fetch master" fetches 755K
>   * - "fetch --depth=1 master" fetches 588K (but see below)

Thanks for the detailed analysis!

The problem is that libgit2 doesn’t support shallow clones, and it’s
unclear whether it will support it in the future:

  https://github.com/libgit2/libgit2/issues/3058

:-/

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-05 14:33         ` Ludovic Courtès
@ 2017-09-05 14:51           ` Hartmut Goebel
  2017-09-07  8:28             ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Hartmut Goebel @ 2017-09-05 14:51 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Am 05.09.2017 um 16:33 schrieb Ludovic Courtès:
> The problem is that libgit2 doesn’t support shallow clones, and it’s
> unclear whether it will support it in the future:

Maybe I'm wrong, but to my understanding fetching a single branch/tag is
not a "shallow clone", isn't it?

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Random idea about speeding up guix pull
  2017-09-05 14:51           ` Hartmut Goebel
@ 2017-09-07  8:28             ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2017-09-07  8:28 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

> Am 05.09.2017 um 16:33 schrieb Ludovic Courtès:
>> The problem is that libgit2 doesn’t support shallow clones, and it’s
>> unclear whether it will support it in the future:
>
> Maybe I'm wrong, but to my understanding fetching a single branch/tag is
> not a "shallow clone", isn't it?

I think it is, in the sense that just a subset of the Git object graph
is fetched, but I’m not 100% sure about the terminology.

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-07  8:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-03 14:27 Random idea about speeding up guix pull Hartmut Goebel
2017-09-03 14:38 ` ng0
2017-09-04 15:01 ` Ludovic Courtès
2017-09-04 15:39   ` Hartmut Goebel
2017-09-04 21:56     ` Ludovic Courtès
2017-09-05 12:23       ` Hartmut Goebel
2017-09-05 14:33         ` Ludovic Courtès
2017-09-05 14:51           ` Hartmut Goebel
2017-09-07  8:28             ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).