* Speeding up archive export
@ 2020-09-10 9:04 Ludovic Courtès
2020-09-15 1:40 ` Maxim Cournoyer
0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2020-09-10 9:04 UTC (permalink / raw)
To: Guix-devel
[-- Attachment #1: Type: text/plain, Size: 652 bytes --]
Hello Guix!
If you’ve ever used offloading (or ‘guix copy’), you’ve probably noticed
that the time to send store items is proportional to the number of store
items to send rather than their total size. Namely:
guix archive --export coreutils
is fast, but:
guix archive --export $(guix build -d coreutils)
is slow (there are lots of small files).
Running ‘perf timechart record guix archive --export …’ confirms the
problem: guix-daemon is mostly idle, waiting for all the tiny ‘guix
authenticate’ programs it spawns to sign each every store item. Here’s
the Gantt diagram (grey = idle, blue = busy):
[-- Attachment #2: Gantt diagram --]
[-- Type: image/png, Size: 101155 bytes --]
[-- Attachment #3: Type: text/plain, Size: 1836 bytes --]
How can we improve on that?
Here are several solutions that come to mind:
1. Sign the whole bundle instead of each individual item.
That solves the problem, but that would prevent the receiver from
storing individual store item signatures in the future (a few years
ago Nix added signatures as part of the ‘ValidPathInfo’ table of
the store database, and I think that’s something we might want to
have too).
2. Sign fewer items: we can do that by signing only store items that
are not content-addressed—i.e., resulting from a fixed-output
derivation or being a “source” coming from ‘add-to-store’ or
similar.
That means we wouldn’t have to sign .drv and *-guile-builder, which
would make a big difference and is generally advisable.
Unfortunately, there’s no easy way to determine whether a store
item is content-addressable. Again Nix added
“certificate-addressability claims” to ‘ValidPathInfo’, which might
help, though it’s not entirely clear.
3. Reimplement ‘guix authenticate’ and a subset of (guix pki) in C++ (!).
We could load the keys and the ACL only once, and we wouldn’t have
to fork and all, I’m sure it’d be very fast… and very distracting
too: I’d rather investigate in the daemon rewrite in Scheme.
4. Spawn ‘guix authenticate’ once and talk to it over a pipe (similar
to ‘guix offload’). That might be the easiest short-term solution.
Anyway I thought I’d share and invite y’all to brainstorm. :-)
All in all, there’s more and more pressure to get our act together
regarding the daemon rewrite in Scheme. The difficulty here is to have
a series of reasonable milestones rather than all or nothing.
Ludo’.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Speeding up archive export
2020-09-10 9:04 Speeding up archive export Ludovic Courtès
@ 2020-09-15 1:40 ` Maxim Cournoyer
2020-09-15 7:53 ` Ludovic Courtès
0 siblings, 1 reply; 4+ messages in thread
From: Maxim Cournoyer @ 2020-09-15 1:40 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Guix-devel
Hi Ludo!
Ludovic Courtès <ludovic.courtes@inria.fr> writes:
> Hello Guix!
>
> If you’ve ever used offloading (or ‘guix copy’), you’ve probably noticed
> that the time to send store items is proportional to the number of store
> items to send rather than their total size. Namely:
>
> guix archive --export coreutils
>
> is fast, but:
>
> guix archive --export $(guix build -d coreutils)
>
> is slow (there are lots of small files).
>
> Running ‘perf timechart record guix archive --export …’ confirms the
> problem: guix-daemon is mostly idle, waiting for all the tiny ‘guix
> authenticate’ programs it spawns to sign each every store item. Here’s
> the Gantt diagram (grey = idle, blue = busy):
Very cool! The timechart suggests the guix-authenticate programs are
run sequentially? Perhaps running them in parallel would be a cheap,
first step to improve performance?
> How can we improve on that?
>
> Here are several solutions that come to mind:
>
> 1. Sign the whole bundle instead of each individual item.
>
> That solves the problem, but that would prevent the receiver from
> storing individual store item signatures in the future (a few years
> ago Nix added signatures as part of the ‘ValidPathInfo’ table of
> the store database, and I think that’s something we might want to
> have too).
Why? Couldn't the receiver do the book keeping no matter if it received
a signed bundle or a single file? It could assign the bundle signature
to individual store files in the database, for example. This seems the
obvious, easy solution. We need good arguments to not implement it.
> 2. Sign fewer items: we can do that by signing only store items that
> are not content-addressed—i.e., resulting from a fixed-output
> derivation or being a “source” coming from ‘add-to-store’ or
> similar.
>
> That means we wouldn’t have to sign .drv and *-guile-builder, which
> would make a big difference and is generally advisable.
> Unfortunately, there’s no easy way to determine whether a store
> item is content-addressable. Again Nix added
> “certificate-addressability claims” to ‘ValidPathInfo’, which might
> help, though it’s not entirely clear.
>
> 3. Reimplement ‘guix authenticate’ and a subset of (guix pki) in C++ (!).
> We could load the keys and the ACL only once, and we wouldn’t have
> to fork and all, I’m sure it’d be very fast… and very distracting
> too: I’d rather investigate in the daemon rewrite in Scheme.
>
> 4. Spawn ‘guix authenticate’ once and talk to it over a pipe (similar
> to ‘guix offload’). That might be the easiest short-term solution.
Failing 1., 4. is my second favorite, because it seems the most Guixy of
the remaining options, and should provide acceptable performance.
Maxim
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Speeding up archive export
2020-09-15 1:40 ` Maxim Cournoyer
@ 2020-09-15 7:53 ` Ludovic Courtès
2020-09-15 17:14 ` Maxim Cournoyer
0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2020-09-15 7:53 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: Guix-devel
Hi Maxim,
I’m a bad person, I realize I didn’t even follow up to my message to
point to <https://issues.guix.gnu.org/43340>, which I pushed just
yesterday. Sorry for the confusion!
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Ludovic Courtès <ludovic.courtes@inria.fr> writes:
>
>> Hello Guix!
>>
>> If you’ve ever used offloading (or ‘guix copy’), you’ve probably noticed
>> that the time to send store items is proportional to the number of store
>> items to send rather than their total size. Namely:
>>
>> guix archive --export coreutils
>>
>> is fast, but:
>>
>> guix archive --export $(guix build -d coreutils)
>>
>> is slow (there are lots of small files).
>>
>> Running ‘perf timechart record guix archive --export …’ confirms the
>> problem: guix-daemon is mostly idle, waiting for all the tiny ‘guix
>> authenticate’ programs it spawns to sign each every store item. Here’s
>> the Gantt diagram (grey = idle, blue = busy):
>
> Very cool! The timechart suggests the guix-authenticate programs are
> run sequentially? Perhaps running them in parallel would be a cheap,
> first step to improve performance?
The sequence goes like this:
1. Export store item as nar and compute its hash.
2. Pass hash to ‘guix authenticate sign’.
3. Goto 1 for next store item.
So it’s not really parallelizable, and even pipelining is not really
feasible.
>> 1. Sign the whole bundle instead of each individual item.
>>
>> That solves the problem, but that would prevent the receiver from
>> storing individual store item signatures in the future (a few years
>> ago Nix added signatures as part of the ‘ValidPathInfo’ table of
>> the store database, and I think that’s something we might want to
>> have too).
>
> Why? Couldn't the receiver do the book keeping no matter if it received
> a signed bundle or a single file? It could assign the bundle signature
> to individual store files in the database, for example. This seems the
> obvious, easy solution. We need good arguments to not implement it.
The idea of storing signatures is that you’d keep one signature per
store item. That way, you have precise provenance tracking for each
store item. Also, if you re-export them (via ‘guix publish’ or ‘guix
archive’), you can choose to serve those third-party signatures.
>> 2. Sign fewer items: we can do that by signing only store items that
>> are not content-addressed—i.e., resulting from a fixed-output
>> derivation or being a “source” coming from ‘add-to-store’ or
>> similar.
>>
>> That means we wouldn’t have to sign .drv and *-guile-builder, which
>> would make a big difference and is generally advisable.
>> Unfortunately, there’s no easy way to determine whether a store
>> item is content-addressable. Again Nix added
>> “certificate-addressability claims” to ‘ValidPathInfo’, which might
>> help, though it’s not entirely clear.
>>
>> 3. Reimplement ‘guix authenticate’ and a subset of (guix pki) in C++ (!).
>> We could load the keys and the ACL only once, and we wouldn’t have
>> to fork and all, I’m sure it’d be very fast… and very distracting
>> too: I’d rather investigate in the daemon rewrite in Scheme.
>>
>> 4. Spawn ‘guix authenticate’ once and talk to it over a pipe (similar
>> to ‘guix offload’). That might be the easiest short-term solution.
>
> Failing 1., 4. is my second favorite, because it seems the most Guixy of
> the remaining options, and should provide acceptable performance.
Yes, that’s what <https://issues.guix.gnu.org/43340> does, with quite
some success. I’ve been testing it locally and will now give it a spin
on berlin.
Thanks for your feedback!
Ludo’.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-09-15 17:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-10 9:04 Speeding up archive export Ludovic Courtès
2020-09-15 1:40 ` Maxim Cournoyer
2020-09-15 7:53 ` Ludovic Courtès
2020-09-15 17:14 ` Maxim Cournoyer
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.