all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Parallel downloads
@ 2019-10-31 15:07 Pierre Neidhardt
  2019-10-31 16:18 ` Tobias Geerinckx-Rice
  2019-11-01 10:06 ` Joshua Branson
  0 siblings, 2 replies; 29+ messages in thread
From: Pierre Neidhardt @ 2019-10-31 15:07 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 269 bytes --]

Hi,

Is there any plan to support parallel downloads?  Currently downloads are a
bottleneck for `guix install / upgrade`, parallel downloads could reduce
the operation duration by an order of magnitude.

Thoughts?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 15:07 Parallel downloads Pierre Neidhardt
@ 2019-10-31 16:18 ` Tobias Geerinckx-Rice
  2019-10-31 16:48   ` Pierre Neidhardt
  2019-11-01 10:06 ` Joshua Branson
  1 sibling, 1 reply; 29+ messages in thread
From: Tobias Geerinckx-Rice @ 2019-10-31 16:18 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]

Hullo Pierre!

Pierre Neidhardt 写道:
> Is there any plan to support parallel downloads?

Guix already downloads sources and substitutes in parallel with 
other builds/downloads through the --max-jobs option.

You could add a separate knob for downloads that defaults to 
--max-jobs.  Or even (* max-jobs cores).  No plans for that AFAIK. 
I don't think it's a trivial tweak.

I'm interested in the numbers behind this claim:

> Currently downloads are a bottleneck for `guix install / 
> upgrade`,
> parallel downloads could reduce the operation duration by an 
> order
> of magnitude.

…because on a substitute-only workload, my default --max-jobs=4 
connections give me 4 MiB/s versus 1.5 MiB/s on a single job.

That's not even a linear increase, let alone an order of magnitude 
(base 2 doesn't count :-).

Kind regards,

T G-R

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 16:18 ` Tobias Geerinckx-Rice
@ 2019-10-31 16:48   ` Pierre Neidhardt
  2019-10-31 18:01     ` zimoun
  2019-11-03 14:48     ` Ludovic Courtès
  0 siblings, 2 replies; 29+ messages in thread
From: Pierre Neidhardt @ 2019-10-31 16:48 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice, guix-devel

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

Hi Tobias,

I'm not sure I understand: if I run

--8<---------------cut here---------------start------------->8---
guix build -M 4 vlc
--8<---------------cut here---------------end--------------->8---

download progress bars are updated one at a time.  Or am I mistaken?

Also is there a way to change the default value for --max-jobs from 1 to
something else?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 16:48   ` Pierre Neidhardt
@ 2019-10-31 18:01     ` zimoun
  2019-10-31 18:09       ` Pierre Neidhardt
  2019-11-03 14:48     ` Ludovic Courtès
  1 sibling, 1 reply; 29+ messages in thread
From: zimoun @ 2019-10-31 18:01 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix Devel

Hi Pierre,

Maybe you would like an concurrent output in this spirit:
https://joeyh.name/code/concurrent-output/
Right?

However, I am not sure that parallel downloads drastically change the
performance (bottleneck) because first often it is not linear and
second it strongly depends on the available bandwidth.

All the best,
simon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 18:01     ` zimoun
@ 2019-10-31 18:09       ` Pierre Neidhardt
  0 siblings, 0 replies; 29+ messages in thread
From: Pierre Neidhardt @ 2019-10-31 18:09 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

[-- Attachment #1: Type: text/plain, Size: 151 bytes --]

For sure it depends on the bandwidth.  It would mostly be beneficial to
users with a high bandwidth.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 15:07 Parallel downloads Pierre Neidhardt
  2019-10-31 16:18 ` Tobias Geerinckx-Rice
@ 2019-11-01 10:06 ` Joshua Branson
  2019-11-01 19:11   ` Pierre Neidhardt
  2019-11-03 14:50   ` Ludovic Courtès
  1 sibling, 2 replies; 29+ messages in thread
From: Joshua Branson @ 2019-11-01 10:06 UTC (permalink / raw)
  To: guix-devel

Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Hi,
>
> Is there any plan to support parallel downloads?  Currently downloads are a
> bottleneck for `guix install / upgrade`, parallel downloads could reduce
> the operation duration by an order of magnitude.

On a related note, would getting guix to download substitutes via IPFS
mean that one could update one's system faster? 

Thanks,

Joshua

>
> Thoughts?

--
Joshua Branson
Sent from Emacs and Gnus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-01 10:06 ` Joshua Branson
@ 2019-11-01 19:11   ` Pierre Neidhardt
  2019-11-03 14:50   ` Ludovic Courtès
  1 sibling, 0 replies; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-01 19:11 UTC (permalink / raw)
  To: Joshua Branson, guix-devel

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

Joshua Branson <jbranso@dismail.de> writes:

> On a related note, would getting guix to download substitutes via IPFS
> mean that one could update one's system faster? 

As of November 2019, no, IPFS is still too slow.  In the future, maybe :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-10-31 16:48   ` Pierre Neidhardt
  2019-10-31 18:01     ` zimoun
@ 2019-11-03 14:48     ` Ludovic Courtès
  2019-11-03 15:29       ` Pierre Neidhardt
  1 sibling, 1 reply; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-03 14:48 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hi Pierre,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> I'm not sure I understand: if I run
>
> guix build -M 4 vlc
>
> download progress bars are updated one at a time.  Or am I mistaken?

With -M4 you could potentially have 4 substitutions (downloads) going on
in parallel.

Tobias is right: the daemon works in terms of “jobs”, where a job can be
either a build or a substitution (a download).  ‘--max-jobs’ controls
that.

Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-01 10:06 ` Joshua Branson
  2019-11-01 19:11   ` Pierre Neidhardt
@ 2019-11-03 14:50   ` Ludovic Courtès
  1 sibling, 0 replies; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-03 14:50 UTC (permalink / raw)
  To: guix-devel

Hi,

Joshua Branson <jbranso@dismail.de> skribis:

> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>
>> Hi,
>>
>> Is there any plan to support parallel downloads?  Currently downloads are a
>> bottleneck for `guix install / upgrade`, parallel downloads could reduce
>> the operation duration by an order of magnitude.
>
> On a related note, would getting guix to download substitutes via IPFS
> mean that one could update one's system faster? 

Potentially!  For that I encourage people to pick up the work on
<https://issues.guix.gnu.org/issue/33899> and to start experimenting
with substitutes on IPFS.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-03 14:48     ` Ludovic Courtès
@ 2019-11-03 15:29       ` Pierre Neidhardt
  2019-11-06 15:34         ` Ludovic Courtès
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-03 15:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 167 bytes --]

A few questions:

- How does --mac-jobs work with regard to progress bars?

- Can we configure the default value?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-03 15:29       ` Pierre Neidhardt
@ 2019-11-06 15:34         ` Ludovic Courtès
  2019-11-06 16:08           ` Pierre Neidhardt
  2019-11-06 21:26           ` Bengt Richter
  0 siblings, 2 replies; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-06 15:34 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hi,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> A few questions:
>
> - How does --mac-jobs work with regard to progress bars?

When there are several jobs running at the same time, it doesn’t display
any progress bar.  (In theory, it could do a multi-line display but that
wouldn’t work with all terminals, it’s tricky, and overall not all that
useful.)

> - Can we configure the default value?

Yup, just pass ‘--max-jobs=N’ to the daemon.

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-06 15:34         ` Ludovic Courtès
@ 2019-11-06 16:08           ` Pierre Neidhardt
  2019-11-09 17:40             ` Ludovic Courtès
  2019-11-06 21:26           ` Bengt Richter
  1 sibling, 1 reply; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-06 16:08 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

>> - Can we configure the default value?
>
> Yup, just pass ‘--max-jobs=N’ to the daemon.

So I suppose you mean to use the `extra-options` field, e.g.

--8<---------------cut here---------------start------------->8---
(guix-service-type config =>
                   (guix-configuration
                    (inherit config)
                    (extra-options '("--max-jobs=4"))))
--8<---------------cut here---------------end--------------->8---

Ludo, what do you think of Tobias suggestion and have an extra knob to
specifically configure the number of download jobs?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-06 15:34         ` Ludovic Courtès
  2019-11-06 16:08           ` Pierre Neidhardt
@ 2019-11-06 21:26           ` Bengt Richter
  1 sibling, 0 replies; 29+ messages in thread
From: Bengt Richter @ 2019-11-06 21:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On +2019-11-06 16:34:17 +0100, Ludovic Courtès wrote:
> Hi,
> 
> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> 
> > A few questions:
> >
> > - How does --mac-jobs work with regard to progress bars?
> 
> When there are several jobs running at the same time, it doesn’t display
> any progress bar.  (In theory, it could do a multi-line display but that
> wouldn’t work with all terminals, it’s tricky, and overall not all that
> useful.)
>

I suppose you could do a single line with cpu numbers like 222232444
but you'd need to compute how long the full line should be and divide by
the number of cpus allocated and make the line buffer writes thread-safe.
... and use base>10 digits potentially ;-)

> > - Can we configure the default value?
> 
> Yup, just pass ‘--max-jobs=N’ to the daemon.
> 
> HTH!
> 
> Ludo’.
> 
--
Regards,
Bengt Richter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-06 16:08           ` Pierre Neidhardt
@ 2019-11-09 17:40             ` Ludovic Courtès
  2019-11-10 13:28               ` Pierre Neidhardt
                                 ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-09 17:40 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hi Pierre!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> - Can we configure the default value?
>>
>> Yup, just pass ‘--max-jobs=N’ to the daemon.
>
> So I suppose you mean to use the `extra-options` field, e.g.
>
> (guix-service-type config =>
>                    (guix-configuration
>                     (inherit config)
>                     (extra-options '("--max-jobs=4"))))

Yes.

> Ludo, what do you think of Tobias suggestion and have an extra knob to
> specifically configure the number of download jobs?

Like I wrote, it’s not that simple (we’d first need the daemon to
distinguish substitution jobs from other jobs, but note that there are
also “downloads” that are actually derivation builds), and it’s not
clear to me that it’s overall beneficial anyway: it’s not supposed to be
faster to download 10 things in parallel from ci.guix.gnu.org, than to
download them sequentially.

If it _is_ faster, then we need to investigate why that is the case.
For example, I’m aware that for some reason, nginx gives us low
bandwidth when downloading a nar that’s not already in its cache.  Its
probably an nginx misconfiguration issue but I couldn’t find out.  (This
has been discussed before on one of the mailing lists, I think.)

Does that make sense?  :-)

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-09 17:40             ` Ludovic Courtès
@ 2019-11-10 13:28               ` Pierre Neidhardt
  2019-11-12 15:36                 ` Ludovic Courtès
  2019-11-12 17:44               ` Leo Famulari
  2019-11-13 16:16               ` Mark H Weaver
  2 siblings, 1 reply; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-10 13:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> it’s not supposed to be
> faster to download 10 things in parallel from ci.guix.gnu.org, than to
> download them sequentially.

I was thinking a little bit ahead in the future, e.g. when we have more
build farms (if it ever happens), or with Torrent / IPFS support.  Then
I believe it would be highly beneficial: a sufficiently fast  connection
could effectively parallelize enough downloads to fill the bandwidth.

Also what about the CDNs?  Is it currently possible to download from
multiple CDNs at the same time?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-10 13:28               ` Pierre Neidhardt
@ 2019-11-12 15:36                 ` Ludovic Courtès
  2019-11-12 15:59                   ` John Soo
  0 siblings, 1 reply; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-12 15:36 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hello,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> it’s not supposed to be
>> faster to download 10 things in parallel from ci.guix.gnu.org, than to
>> download them sequentially.
>
> I was thinking a little bit ahead in the future, e.g. when we have more
> build farms (if it ever happens), or with Torrent / IPFS support.  Then
> I believe it would be highly beneficial: a sufficiently fast  connection
> could effectively parallelize enough downloads to fill the bandwidth.

I think we’ll have to revisit this issue at that point; it’s hard to
discuss things in the abstract.  :-)

> Also what about the CDNs?  Is it currently possible to download from
> multiple CDNs at the same time?

We’d have to ask Chris Marusich about that, but I think CDNs should
typically use all the available bandwidth so I suspect there’s little to
be gained by having several connections in parallel.

Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-12 15:36                 ` Ludovic Courtès
@ 2019-11-12 15:59                   ` John Soo
  2019-11-12 16:48                     ` zimoun
  0 siblings, 1 reply; 29+ messages in thread
From: John Soo @ 2019-11-12 15:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi everyone,

I’ve been watching this from afar and one thing and while I have to agree with this:

> .. I suspect there’s little to
> be gained by having several connections in parallel.

I do have to say that more fine grained concurrency would really help speed up builds without substitutes.

 Especially on old hardware, some builds can actually exhaust all resources. That means you can’t really get a speed up by bumping the max jobs. 

What would help is doing downloads of sources and substitutes asynchronously while a non-substitute job is taking place. If some network activity could be backgrounded, I think you might find build times decrease a lot. 

Thanks!

John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-12 15:59                   ` John Soo
@ 2019-11-12 16:48                     ` zimoun
  2019-11-13  7:43                       ` Efraim Flashner
  0 siblings, 1 reply; 29+ messages in thread
From: zimoun @ 2019-11-12 16:48 UTC (permalink / raw)
  To: John Soo; +Cc: Guix Devel

Hi,
It is not related with parallel download but on old machines "guix
build --no-substitutes" can eat a lot of resources; for example if the
package has a lot of dependencies. I would like to be able to list
which dependencies I want to build and which I want to substitute.
Maybe it is already possible.

Currently, I am doing:

  guix build `guix show PKG | recsel -R dependencies`
  guix build PKG --no-substitutes


What do you think?


All the best,
simon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-09 17:40             ` Ludovic Courtès
  2019-11-10 13:28               ` Pierre Neidhardt
@ 2019-11-12 17:44               ` Leo Famulari
  2019-11-17 17:15                 ` Ludovic Courtès
  2019-11-13 16:16               ` Mark H Weaver
  2 siblings, 1 reply; 29+ messages in thread
From: Leo Famulari @ 2019-11-12 17:44 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sat, Nov 09, 2019 at 06:40:56PM +0100, Ludovic Courtès wrote:
> Like I wrote, it’s not that simple (we’d first need the daemon to
> distinguish substitution jobs from other jobs, but note that there are
> also “downloads” that are actually derivation builds), and it’s not
> clear to me that it’s overall beneficial anyway: it’s not supposed to be
> faster to download 10 things in parallel from ci.guix.gnu.org, than to
> download them sequentially.

Parallel downloading is not faster in terms of overall transfer rate
from ci.guix.gnu.org.

However, installing things with Guix involves downloading a lot of very
small files like derivations, and Guix spends a lot of time initiating
these downloads.

For example, I can download things at 100 megabits, but when Guix needs
to sequentially download 50 10-kilobyte files, it may take an entire
minute.

So there is a huge speedup with parallel downloading.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-12 16:48                     ` zimoun
@ 2019-11-13  7:43                       ` Efraim Flashner
  2019-11-13 11:26                         ` zimoun
  0 siblings, 1 reply; 29+ messages in thread
From: Efraim Flashner @ 2019-11-13  7:43 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

On Tue, Nov 12, 2019 at 05:48:16PM +0100, zimoun wrote:
> Hi,
> It is not related with parallel download but on old machines "guix
> build --no-substitutes" can eat a lot of resources; for example if the
> package has a lot of dependencies. I would like to be able to list
> which dependencies I want to build and which I want to substitute.
> Maybe it is already possible.
> 
> Currently, I am doing:
> 
>   guix build `guix show PKG | recsel -R dependencies`
>   guix build PKG --no-substitutes
> 
> 
> What do you think?
> 

Not a true solution of 'guix build foo bar --no-substitutes baz' but for
your example you can do it in one line with:
'guix environment PKG -- guix build PKG --no-subsitutes'

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13  7:43                       ` Efraim Flashner
@ 2019-11-13 11:26                         ` zimoun
  0 siblings, 0 replies; 29+ messages in thread
From: zimoun @ 2019-11-13 11:26 UTC (permalink / raw)
  To: Efraim Flashner; +Cc: Guix Devel

Hi Efraim,

On Wed, 13 Nov 2019 at 08:44, Efraim Flashner <efraim@flashner.co.il> wrote:

> >   guix build `guix show PKG | recsel -R dependencies`
> >   guix build PKG --no-substitutes

> 'guix environment PKG -- guix build PKG --no-subsitutes'

Thank you.
It is a better solution, indeed! :-)

All the best,
simon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-09 17:40             ` Ludovic Courtès
  2019-11-10 13:28               ` Pierre Neidhardt
  2019-11-12 17:44               ` Leo Famulari
@ 2019-11-13 16:16               ` Mark H Weaver
  2019-11-13 18:03                 ` Pierre Neidhardt
                                   ` (3 more replies)
  2 siblings, 4 replies; 29+ messages in thread
From: Mark H Weaver @ 2019-11-13 16:16 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>
>> Ludo, what do you think of Tobias suggestion and have an extra knob to
>> specifically configure the number of download jobs?
>
> Like I wrote, it’s not that simple (we’d first need the daemon to
> distinguish substitution jobs from other jobs, but note that there are
> also “downloads” that are actually derivation builds), and it’s not
> clear to me that it’s overall beneficial anyway: it’s not supposed to be
> faster to download 10 things in parallel from ci.guix.gnu.org, than to
> download them sequentially.

I'll also note that parallel downloads would increase the memory usage
on the server.  If users, on average, configured 4 parallel downloads, I
guess that would have the effect of multiplying the server memory usage
by about 4.  It might also create an incentive for users to configure
more parallel downloads in order to grab a larger ratio of the server's
available resources.

For these reasons, I'm inclined to think that parallel downloads is the
wrong approach.  If a single download process is not making efficient
use of the available bandwidth, I'd be more inclined to look carefully
at why it's failing to do so.  For example, I'm not sure if this is the
case (and don't have time to look right now), but if the current code
waits until a NAR has finished downloading before asking for the next
one, that's an issue that could be fixed by use of HTTP pipelining,
without multiplying the memory usage.

What do you think?

      Mark

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13 16:16               ` Mark H Weaver
@ 2019-11-13 18:03                 ` Pierre Neidhardt
  2019-11-13 18:25                 ` Leo Famulari
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-13 18:03 UTC (permalink / raw)
  To: Mark H Weaver, Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]

Maybe the solution to this would be to only allow parallel downloads
from separate servers.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13 16:16               ` Mark H Weaver
  2019-11-13 18:03                 ` Pierre Neidhardt
@ 2019-11-13 18:25                 ` Leo Famulari
  2019-11-13 19:34                 ` Pierre Neidhardt
  2019-11-17 17:52                 ` Ludovic Courtès
  3 siblings, 0 replies; 29+ messages in thread
From: Leo Famulari @ 2019-11-13 18:25 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

On Wed, Nov 13, 2019 at 11:16:53AM -0500, Mark H Weaver wrote:
> For these reasons, I'm inclined to think that parallel downloads is the
> wrong approach.  If a single download process is not making efficient
> use of the available bandwidth, I'd be more inclined to look carefully
> at why it's failing to do so.  For example, I'm not sure if this is the
> case (and don't have time to look right now), but if the current code
> waits until a NAR has finished downloading before asking for the next
> one, that's an issue that could be fixed by use of HTTP pipelining,
> without multiplying the memory usage.
> 
> What do you think?

I agree that parallel downloads is a kludge to work around the issue of
slow set-up and tear-down of our download code. Pipelining would help a
lot, and we could also profile the relevant Guile code to see if there
are any easy speedups.

This issue was actually discussed a year ago:

https://lists.gnu.org/archive/html/guix-devel/2018-11/msg00148.html

I'll quote Ludo's suggestion from then:

> I’d be in favor of a solution where ‘guix substitute’ is kept alive
> across substitutions (like what happens with ‘guix substitute --query’),
> which would allow it to keep connections alive and thus save the TLS
> handshake and a few extra round trips per download.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13 16:16               ` Mark H Weaver
  2019-11-13 18:03                 ` Pierre Neidhardt
  2019-11-13 18:25                 ` Leo Famulari
@ 2019-11-13 19:34                 ` Pierre Neidhardt
  2019-12-13  9:35                   ` Pierre Neidhardt
  2019-11-17 17:52                 ` Ludovic Courtès
  3 siblings, 1 reply; 29+ messages in thread
From: Pierre Neidhardt @ 2019-11-13 19:34 UTC (permalink / raw)
  To: Mark H Weaver, Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

Mark H Weaver <mhw@netris.org> writes:

> For these reasons, I'm inclined to think that parallel downloads is the
> wrong approach.  If a single download process is not making efficient
> use of the available bandwidth, I'd be more inclined to look carefully
> at why it's failing to do so.  For example, I'm not sure if this is the
> case (and don't have time to look right now), but if the current code
> waits until a NAR has finished downloading before asking for the next
> one, that's an issue that could be fixed by use of HTTP pipelining,
> without multiplying the memory usage.

I think so too.

More generally, the Guix daemon jobs can be categorized in 2: downloads
and builds.  These 2 categories demands almost complementary hardware
resources for the local machine.

With that in mind, we could have 2 pipelines: one for the builds and one
for the downloads.  This, I think, is general enough that it could be
used by default and improve performance for everyone, regardless of your
Internet bandwidth or CPU power.

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-12 17:44               ` Leo Famulari
@ 2019-11-17 17:15                 ` Ludovic Courtès
  0 siblings, 0 replies; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-17 17:15 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel

Hi,

Leo Famulari <leo@famulari.name> skribis:

> On Sat, Nov 09, 2019 at 06:40:56PM +0100, Ludovic Courtès wrote:
>> Like I wrote, it’s not that simple (we’d first need the daemon to
>> distinguish substitution jobs from other jobs, but note that there are
>> also “downloads” that are actually derivation builds), and it’s not
>> clear to me that it’s overall beneficial anyway: it’s not supposed to be
>> faster to download 10 things in parallel from ci.guix.gnu.org, than to
>> download them sequentially.
>
> Parallel downloading is not faster in terms of overall transfer rate
> from ci.guix.gnu.org.
>
> However, installing things with Guix involves downloading a lot of very
> small files like derivations, and Guix spends a lot of time initiating
> these downloads.

Good point.

Note that .drv files are never downloaded, but nevertheless it’s true
that there are often small files like the “module-import-compiled”
things.  This happens when building a system, but not so much when
building a package, though.

> For example, I can download things at 100 megabits, but when Guix needs
> to sequentially download 50 10-kilobyte files, it may take an entire
> minute.
>
> So there is a huge speedup with parallel downloading.

One thing I’d like to get rid of is the initial HTTP GET for
/nix-cache-info which is completely useless now.  That could help a bit.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13 16:16               ` Mark H Weaver
                                   ` (2 preceding siblings ...)
  2019-11-13 19:34                 ` Pierre Neidhardt
@ 2019-11-17 17:52                 ` Ludovic Courtès
  3 siblings, 0 replies; 29+ messages in thread
From: Ludovic Courtès @ 2019-11-17 17:52 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

> For these reasons, I'm inclined to think that parallel downloads is the
> wrong approach.  If a single download process is not making efficient
> use of the available bandwidth, I'd be more inclined to look carefully
> at why it's failing to do so.  For example, I'm not sure if this is the
> case (and don't have time to look right now), but if the current code
> waits until a NAR has finished downloading before asking for the next
> one, that's an issue that could be fixed by use of HTTP pipelining,
> without multiplying the memory usage.

I agree.  There’s HTTP pipelining for narinfos but not for nars.  Worse,
before fetching a nar, we do a GET /nix-cache-info, and in fact we spawn
a new ‘guix substitute’ process for each download (for “historical
reasons”).

So there’s room for optimization there!

Ludo’.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-11-13 19:34                 ` Pierre Neidhardt
@ 2019-12-13  9:35                   ` Pierre Neidhardt
  2019-12-13 12:25                     ` Brett Gilio
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Neidhardt @ 2019-12-13  9:35 UTC (permalink / raw)
  To: Mark H Weaver, Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

Update: I've been using --max-jobs=2 by default for about 2 weeks now,
and it feels like a much smoother experience overall: faster downloads, faster
builds.

This is obviously a very dumb "optimization" but at least it serves to
underline that Guix could still do much better by parallelizing
downloads and build in a smart way.

I couple of ideas were mentioned in this thread: Anyone interested in
working on them? :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Parallel downloads
  2019-12-13  9:35                   ` Pierre Neidhardt
@ 2019-12-13 12:25                     ` Brett Gilio
  0 siblings, 0 replies; 29+ messages in thread
From: Brett Gilio @ 2019-12-13 12:25 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Update: I've been using --max-jobs=2 by default for about 2 weeks now,
> and it feels like a much smoother experience overall: faster downloads, faster
> builds.
>
> This is obviously a very dumb "optimization" but at least it serves to
> underline that Guix could still do much better by parallelizing
> downloads and build in a smart way.

Agree.

>
> I couple of ideas were mentioned in this thread: Anyone interested in
> working on them? :)

When I get some time, I would be more than happy to help with this.

-- 
Brett M. Gilio
Homepage -- https://scm.pw/
GNU Guix -- https://guix.gnu.org/

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-12-13 12:25 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-31 15:07 Parallel downloads Pierre Neidhardt
2019-10-31 16:18 ` Tobias Geerinckx-Rice
2019-10-31 16:48   ` Pierre Neidhardt
2019-10-31 18:01     ` zimoun
2019-10-31 18:09       ` Pierre Neidhardt
2019-11-03 14:48     ` Ludovic Courtès
2019-11-03 15:29       ` Pierre Neidhardt
2019-11-06 15:34         ` Ludovic Courtès
2019-11-06 16:08           ` Pierre Neidhardt
2019-11-09 17:40             ` Ludovic Courtès
2019-11-10 13:28               ` Pierre Neidhardt
2019-11-12 15:36                 ` Ludovic Courtès
2019-11-12 15:59                   ` John Soo
2019-11-12 16:48                     ` zimoun
2019-11-13  7:43                       ` Efraim Flashner
2019-11-13 11:26                         ` zimoun
2019-11-12 17:44               ` Leo Famulari
2019-11-17 17:15                 ` Ludovic Courtès
2019-11-13 16:16               ` Mark H Weaver
2019-11-13 18:03                 ` Pierre Neidhardt
2019-11-13 18:25                 ` Leo Famulari
2019-11-13 19:34                 ` Pierre Neidhardt
2019-12-13  9:35                   ` Pierre Neidhardt
2019-12-13 12:25                     ` Brett Gilio
2019-11-17 17:52                 ` Ludovic Courtès
2019-11-06 21:26           ` Bengt Richter
2019-11-01 10:06 ` Joshua Branson
2019-11-01 19:11   ` Pierre Neidhardt
2019-11-03 14:50   ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.