"guix pack -f docker" does too much work

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* "guix pack -f docker" does too much work
@ 2024-05-29 12:58 Ricardo Wurmus
  2024-05-30 13:10 ` Michal Atlas
  2024-06-01 13:58 ` Ludovic Courtès
  0 siblings, 2 replies; 14+ messages in thread
From: Ricardo Wurmus @ 2024-05-29 12:58 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

a few months ago "guix pack -f docker" was modified to produce layers.
This is great!  Unfortunately, "guix pack" itself still produces one big
tarball containing all these layers.  There is no sharing of previously
built layers, because they are all hidden inside the pack.

I think it would be great if "guix pack -f docker" could avoid building
all these identical layers again and again.  Perhaps it would be
possible to have a single derivation for each layer?  This way we
wouldn't have to recreate the same layer archives every time.

What do you think?

-- 
Ricardo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-05-29 12:58 "guix pack -f docker" does too much work Ricardo Wurmus
@ 2024-05-30 13:10 ` Michal Atlas
  2024-06-17 11:21   ` Ludovic Courtès
  2024-06-01 13:58 ` Ludovic Courtès
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Atlas @ 2024-05-30 13:10 UTC (permalink / raw)
  To: guix-devel

Hello Ricardo,

I greatly agree, it would be an awesome QOL improvement.

Just want to mention that it might be nice to take inspiration from the 
Nix dockerTools, since they already have quite a lot of effort put into 
this.

Including for example an option called `streamLayeredImage` [1] which 
doesn't generate a tarball at all, but rather a script that outputs the 
layers without assembling them, in a format which Docker or Podman can 
import without the huge intermediary file.

i.e. $(guix pack ...) | docker load

[1]: 
https://ryantm.github.io/nixpkgs/builders/images/dockertools/#ssec-pkgs-dockerTools-streamLayeredImage

So that'd allow Guix to skip generating the final tarball altogether, 
which makes packing very swift.
Also seems that Nix's way only quickly imports the changed layers? And 
Guix's always imports the whole thing, at least I think?

Reading through how they do it, it seems that they pass the raw store 
paths to this python script [2] and it does the rest? Save for figuring 
out some merging of paths since there's a limit to the number of layers, 
I don't think this would be too difficult to port (after we find what 
license the script is under at least, or replicate the behaviour in Guile).

[2]: 
https://github.com/NixOS/nixpkgs/blob/90509d6d66eb1524e2798a2a8627f44ae413f174/pkgs/build-support/docker/stream_layered_image.py

What do you think?

---

Atlas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-05-30 13:10 ` Michal Atlas
@ 2024-06-17 11:21   ` Ludovic Courtès
  2024-06-17 11:57     ` Michal Atlas
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2024-06-17 11:21 UTC (permalink / raw)
  To: Michal Atlas; +Cc: guix-devel

Hi,

Michal Atlas <michal_atlas+gnu@posteo.net> skribis:

> I greatly agree, it would be an awesome QOL improvement.

If there’s consensus, let’s see how we can get that done.  The advantage
of having (guix docker) & co. all in Scheme is that moving it from a
derivation to code running straight from ‘guix pack’ is definitely
feasible (a bit of work though because ‘guix pack’ has quite a few
backends).

> Just want to mention that it might be nice to take inspiration from
> the Nix dockerTools, since they already have quite a lot of effort put
> into this.
>
> Including for example an option called `streamLayeredImage` [1] which
> doesn't generate a tarball at all, but rather a script that outputs
> the layers without assembling them, in a format which Docker or Podman
> can import without the huge intermediary file.
>
> i.e. $(guix pack ...) | docker load
>
> [1]:
> https://ryantm.github.io/nixpkgs/builders/images/dockertools/#ssec-pkgs-dockerTools-streamLayeredImage

Nice!  Sounds very much in line with what Ricardo was proposing.

> Also seems that Nix's way only quickly imports the changed layers? And
> Guix's always imports the whole thing, at least I think?

What do you mean by “imports the whole thing”?

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-17 11:21   ` Ludovic Courtès
@ 2024-06-17 11:57     ` Michal Atlas
  2024-06-17 21:24       ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Atlas @ 2024-06-17 11:57 UTC (permalink / raw)
  To: Ludovic Courtès, Michal Atlas; +Cc: guix-devel

Hi,
>> Also seems that Nix's way only quickly imports the changed layers? And
>> Guix's always imports the whole thing, at least I think?
> What do you mean by “imports the whole thing”?

I'm not sure what exactly happens, so correct me if I'm wrong, however 
if I time the different approaches, I think that how Guix creates a 
single-layered image, then if anything changes the entire image gets 
re-imported into docker. Though with the layered approach, if only one 
or two paths change, then those get imported, (and even though there's 
still some baseline that compression takes up) docker importing just the 
changed paths is a very noticeable speedup.

On that note, I know that guix pack goes through %compressors in order, 
however zstd is an insane improvement over gzip when working with 
containers, would it perhaps be possible to default to it, or would that 
break far too many workflows, or is there another reason? Perhaps during 
changing how guix pack works would be a good time to make both breaking 
changes at once?

Thanks, Michal.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-17 11:57     ` Michal Atlas
@ 2024-06-17 21:24       ` Ludovic Courtès
  0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2024-06-17 21:24 UTC (permalink / raw)
  To: Michal Atlas; +Cc: Michal Atlas, guix-devel, Oleg Pykhalov

Hi,

Michal Atlas <michal_atlas@posteo.net> skribis:

>>> Also seems that Nix's way only quickly imports the changed layers? And
>>> Guix's always imports the whole thing, at least I think?
>> What do you mean by “imports the whole thing”?
>
> I'm not sure what exactly happens, so correct me if I'm wrong, however
> if I time the different approaches, I think that how Guix creates a
> single-layered image, then if anything changes the entire image gets
> re-imported into docker.

Oh, there’s the quite recent ‘--max-layers’ option:

  https://guix.gnu.org/manual/devel/en/html_node/Invoking-guix-pack.html

However the default is to create a single layer.  Maybe worth changing
to 32 or so?  Oleg, WDYT?

(We should also document the default value of ‘--max-layers’ in the
manual: I had to check the code…)

> On that note, I know that guix pack goes through %compressors in
> order, however zstd is an insane improvement over gzip when working
> with containers, would it perhaps be possible to default to it, or
> would that break far too many workflows, or is there another reason?
> Perhaps during changing how guix pack works would be a good time to
> make both breaking changes at once?

If Docker itself always understands zstd, then we could change the
default, indeed.

For other backends, such as plain tarballs, we could make that change
but it’s going to be potentially more of a breaking change.

Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-05-29 12:58 "guix pack -f docker" does too much work Ricardo Wurmus
  2024-05-30 13:10 ` Michal Atlas
@ 2024-06-01 13:58 ` Ludovic Courtès
  2024-06-01 19:07   ` Ricardo Wurmus
                     ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Ludovic Courtès @ 2024-06-01 13:58 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> a few months ago "guix pack -f docker" was modified to produce layers.
> This is great!  Unfortunately, "guix pack" itself still produces one big
> tarball containing all these layers.  There is no sharing of previously
> built layers, because they are all hidden inside the pack.

Right.

> I think it would be great if "guix pack -f docker" could avoid building
> all these identical layers again and again.  Perhaps it would be
> possible to have a single derivation for each layer?  This way we
> wouldn't have to recreate the same layer archives every time.

That sounds nice in terms of saving CPU time.  It’s less nice in terms
of disk usage: a single ‘guix pack -f docker’ run would populate the
store with roughly twice the size of the closure.

I think each solution (single derivation vs. one derivation per layer)
makes a different tradeoff.  I don’t have a strong feeling about which
one is better.

WDYT?

Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-01 13:58 ` Ludovic Courtès
@ 2024-06-01 19:07   ` Ricardo Wurmus
  2024-06-03  7:09   ` Andy Wingo
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Ricardo Wurmus @ 2024-06-01 19:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Ludovic Courtès <ludo@gnu.org> writes:

>> I think it would be great if "guix pack -f docker" could avoid building
>> all these identical layers again and again.  Perhaps it would be
>> possible to have a single derivation for each layer?  This way we
>> wouldn't have to recreate the same layer archives every time.
>
> That sounds nice in terms of saving CPU time.  It’s less nice in terms
> of disk usage: a single ‘guix pack -f docker’ run would populate the
> store with roughly twice the size of the closure.

Arguably we don't actually care all that much for the Docker image that
ends up in the store.  It's really a temporary thing that we want to
load into Docker or upload somewhere else.  I've often wanted to stream
the eventual output of "guix pack" to a pipe, precisely because I don't
want to store the same thing twice: once in the store and once in the
Docker storage backend.

It's actually worse than that: I often end up having dozens of packs in
the store whose layers are almost all identical.

> I think each solution (single derivation vs. one derivation per layer)
> makes a different tradeoff.  I don’t have a strong feeling about which
> one is better.

Can we have both?  I realize that adding the option to stream build
output to a pipe is not a trivial change, but it would solve the
unnecessary storage requirement for packs.  "docker load" reads from
standard input, but other packs would also benefit from a streaming
output; an example is Docker-free deployment to a remote server: just
pipe "guix pack" to a remote tar process and you're all set.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-01 13:58 ` Ludovic Courtès
  2024-06-01 19:07   ` Ricardo Wurmus
@ 2024-06-03  7:09   ` Andy Wingo
  2024-06-04 18:14   ` Simon Tournier
  2024-09-14 14:55   ` Maxim Cournoyer
  3 siblings, 0 replies; 14+ messages in thread
From: Andy Wingo @ 2024-06-03  7:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ricardo Wurmus, guix-devel

On Sat 01 Jun 2024 15:58, Ludovic Courtès <ludo@gnu.org> writes:

>> I think it would be great if "guix pack -f docker" could avoid building
>> all these identical layers again and again.  Perhaps it would be
>> possible to have a single derivation for each layer?  This way we
>> wouldn't have to recreate the same layer archives every time.
>
> That sounds nice in terms of saving CPU time.  It’s less nice in terms
> of disk usage: a single ‘guix pack -f docker’ run would populate the
> store with roughly twice the size of the closure.

If the concern is CPU time, I would make sure you have switched to zstd
or some other faster codec, via `guix pack -f docker -C zstd`.

You probably already knew but if you haven't tried, it's quite
surprising :)

Andy


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-01 13:58 ` Ludovic Courtès
  2024-06-01 19:07   ` Ricardo Wurmus
  2024-06-03  7:09   ` Andy Wingo
@ 2024-06-04 18:14   ` Simon Tournier
  2024-09-14 14:55   ` Maxim Cournoyer
  3 siblings, 0 replies; 14+ messages in thread
From: Simon Tournier @ 2024-06-04 18:14 UTC (permalink / raw)
  To: Ludovic Courtès, Ricardo Wurmus; +Cc: guix-devel

Hi,

On Sat, 01 Jun 2024 at 15:58, Ludovic Courtès <ludo@gnu.org> wrote:

>> I think it would be great if "guix pack -f docker" could avoid building
>> all these identical layers again and again.  Perhaps it would be
>> possible to have a single derivation for each layer?  This way we
>> wouldn't have to recreate the same layer archives every time.
>
> That sounds nice in terms of saving CPU time.  It’s less nice in terms
> of disk usage: a single ‘guix pack -f docker’ run would populate the
> store with roughly twice the size of the closure.
>
> I think each solution (single derivation vs. one derivation per layer)
> makes a different tradeoff.  I don’t have a strong feeling about which
> one is better.

I share Ricardo wish.  From my perspective, I do not care much about
polluting my local Guix store when building Docker images.  Because all
that will be removed at the next GC – once all the work is loaded
elsewhere.

However, it appears frustrating to build again and again complete large
images when the difference is sometimes just a couple of packages.

I would be in favor to share more derivations between images. :-)

Cheers,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-06-01 13:58 ` Ludovic Courtès
                     ` (2 preceding siblings ...)
  2024-06-04 18:14   ` Simon Tournier
@ 2024-09-14 14:55   ` Maxim Cournoyer
  2024-09-14 18:36     ` Ricardo Wurmus
  3 siblings, 1 reply; 14+ messages in thread
From: Maxim Cournoyer @ 2024-09-14 14:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ricardo Wurmus, guix-devel

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> a few months ago "guix pack -f docker" was modified to produce layers.
>> This is great!  Unfortunately, "guix pack" itself still produces one big
>> tarball containing all these layers.  There is no sharing of previously
>> built layers, because they are all hidden inside the pack.
>
> Right.
>
>> I think it would be great if "guix pack -f docker" could avoid building
>> all these identical layers again and again.  Perhaps it would be
>> possible to have a single derivation for each layer?  This way we
>> wouldn't have to recreate the same layer archives every time.
>
> That sounds nice in terms of saving CPU time.  It’s less nice in terms
> of disk usage: a single ‘guix pack -f docker’ run would populate the
> store with roughly twice the size of the closure.
>
> I think each solution (single derivation vs. one derivation per layer)
> makes a different tradeoff.  I don’t have a strong feeling about which
> one is better.

In past discussions (such as the implementation of the 'RPM' pack
format) we had concluded that a single derivation was preferable.  Large
chunks to be sent to offload machines over the network are not very
practical, and as Ludovic said, they also require more store space.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-09-14 14:55   ` Maxim Cournoyer
@ 2024-09-14 18:36     ` Ricardo Wurmus
  2024-09-15  0:42       ` Suhail Singh
  2024-09-26 16:18       ` Simon Tournier
  0 siblings, 2 replies; 14+ messages in thread
From: Ricardo Wurmus @ 2024-09-14 18:36 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Ludovic Courtès, guix-devel

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

>>> I think it would be great if "guix pack -f docker" could avoid building
>>> all these identical layers again and again.  Perhaps it would be
>>> possible to have a single derivation for each layer?  This way we
>>> wouldn't have to recreate the same layer archives every time.
>>
>> That sounds nice in terms of saving CPU time.  It’s less nice in terms
>> of disk usage: a single ‘guix pack -f docker’ run would populate the
>> store with roughly twice the size of the closure.
>>
>> I think each solution (single derivation vs. one derivation per layer)
>> makes a different tradeoff.  I don’t have a strong feeling about which
>> one is better.
>
> In past discussions (such as the implementation of the 'RPM' pack
> format) we had concluded that a single derivation was preferable.  Large
> chunks to be sent to offload machines over the network are not very
> practical, and as Ludovic said, they also require more store space.

Dependent on the situation I can see one approach to be preferrable to
the other, and in other situations this could very well be reversed.

Can we expose this choice to the command line interface of "guix pack"?

-- 
Ricardo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-09-14 18:36     ` Ricardo Wurmus
@ 2024-09-15  0:42       ` Suhail Singh
  2024-09-26 16:18       ` Simon Tournier
  1 sibling, 0 replies; 14+ messages in thread
From: Suhail Singh @ 2024-09-15  0:42 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Maxim Cournoyer, Ludovic Courtès, guix-devel

Ricardo Wurmus <rekado@elephly.net> writes:

>>>> I think it would be great if "guix pack -f docker" could avoid building
>>>> all these identical layers again and again.  Perhaps it would be
>>>> possible to have a single derivation for each layer?  This way we
>>>> wouldn't have to recreate the same layer archives every time.
>>>
>>> That sounds nice in terms of saving CPU time.  It’s less nice in terms
>>> of disk usage: a single ‘guix pack -f docker’ run would populate the
>>> store with roughly twice the size of the closure.
>>>
>>> I think each solution (single derivation vs. one derivation per layer)
>>> makes a different tradeoff.  I don’t have a strong feeling about which
>>> one is better.
>>
>> In past discussions (such as the implementation of the 'RPM' pack
>> format) we had concluded that a single derivation was preferable.  Large
>> chunks to be sent to offload machines over the network are not very
>> practical, and as Ludovic said, they also require more store space.
>
> Dependent on the situation I can see one approach to be preferrable to
> the other, and in other situations this could very well be reversed.

I agree.

> Can we expose this choice to the command line interface of "guix pack"?

That would be quite helpful, indeed.  Happy to help with this if someone
can point me in the right direction, provided the effort of "pointing me
in the right direction" isn't too great to be impractical.

-- 
Suhail


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-09-14 18:36     ` Ricardo Wurmus
  2024-09-15  0:42       ` Suhail Singh
@ 2024-09-26 16:18       ` Simon Tournier
  2024-10-05 14:01         ` Maxim Cournoyer
  1 sibling, 1 reply; 14+ messages in thread
From: Simon Tournier @ 2024-09-26 16:18 UTC (permalink / raw)
  To: Ricardo Wurmus, Maxim Cournoyer; +Cc: Ludovic Courtès, guix-devel

Hi,

On Sat, 14 Sep 2024 at 20:36, Ricardo Wurmus <rekado@elephly.net> wrote:

>> In past discussions (such as the implementation of the 'RPM' pack
>> format) we had concluded that a single derivation was preferable.  Large
>> chunks to be sent to offload machines over the network are not very
>> practical, and as Ludovic said, they also require more store space.

Well, the argument “require more store space” appears to me as an
“half-joke”when you know all the space that is required by Guix for a
day-to-day usage. :-) I don’t buy it. ;-)

About offload, indeed.  And this can be a “bad surprise”.


> Dependent on the situation I can see one approach to be preferrable to
> the other, and in other situations this could very well be reversed.

I agree.

> Can we expose this choice to the command line interface of "guix pack"?

Maybe the switch could be with the option ’--no-offload’ instead of
adding yet another one.

Cheers,
simon



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: "guix pack -f docker" does too much work
  2024-09-26 16:18       ` Simon Tournier
@ 2024-10-05 14:01         ` Maxim Cournoyer
  0 siblings, 0 replies; 14+ messages in thread
From: Maxim Cournoyer @ 2024-10-05 14:01 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ricardo Wurmus, Ludovic Courtès, guix-devel

Hi,

Simon Tournier <zimon.toutoune@gmail.com> writes:

[...]

>> Dependent on the situation I can see one approach to be preferrable to
>> the other, and in other situations this could very well be reversed.
>
> I agree.

Why not, if someone's itch is strong enough to implement it!

>> Can we expose this choice to the command line interface of "guix pack"?
>
> Maybe the switch could be with the option ’--no-offload’ instead of
> adding yet another one.

I conflating this behavior with that switch would bring more confusion
than good; I'd favor separate and explicitly named options.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-05 14:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 12:58 "guix pack -f docker" does too much work Ricardo Wurmus
2024-05-30 13:10 ` Michal Atlas
2024-06-17 11:21   ` Ludovic Courtès
2024-06-17 11:57     ` Michal Atlas
2024-06-17 21:24       ` Ludovic Courtès
2024-06-01 13:58 ` Ludovic Courtès
2024-06-01 19:07   ` Ricardo Wurmus
2024-06-03  7:09   ` Andy Wingo
2024-06-04 18:14   ` Simon Tournier
2024-09-14 14:55   ` Maxim Cournoyer
2024-09-14 18:36     ` Ricardo Wurmus
2024-09-15  0:42       ` Suhail Singh
2024-09-26 16:18       ` Simon Tournier
2024-10-05 14:01         ` Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).