Re: Experiment in generating multi-layer Docker images with guix pack

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Christopher Baines <mail@cbaines.net>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Experiment in generating multi-layer Docker images with guix pack
Date: Thu, 26 Mar 2020 20:15:09 +0000	[thread overview]
Message-ID: <87wo76kh4i.fsf@cbaines.net> (raw)
In-Reply-To: <87zhc39vcs.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3859 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> I think it could be useful to support multiple different strategies for
>> generating layers for Docker images, with different trade-offs. This approach
>> using two layers should make the resulting images more efficient to use in the
>> case where like the guile example above, where the packages you run guix pack
>> with have exactly matching inputs.
>
> Did you read <https://grahamc.com/blog/nix-and-layered-docker-images>?
> They came up with a pretty smart algorithm that would be worth copying.

I'm aware of it, but I haven't read it in detail yet.

>> As well as these behaviour changes, these patches also modify the
>> implementation. Rather than having some build side code that's used in the
>> pack and vm module gexpressions, these patches introduce two new record types:
>> <docker-image-layer> and <docker-image>. This at least structures the
>> derivations so that each layer is represented by a derivation, and then
>> there's a derivation for the image itself, which is a little more efficient in
>> terms of computation.
>
> Nice.
>
> I think a layering algorithm like Graham Christensen’s above requires
> knowledge of the reference graph, meaning that layering can only be
> computed on the build side, using #:references-graphs.  In that case, it
> could be that you can’t have a host-side <docker-image-layer> record.

As I understand it, you only have to do the computation on the build
side if you're restricted to doing a single set of builds. If you first
build the store items you want to put in the image, then look at there
references and compute the derivation for building the image, then you
could do this kind of computation on the client side.

But yeah, this is important to work out, as how image generation should
work, and what behaviours we want should define the structure of the
code.

I went with records to represent layers partially because I'm familiar
with it, but also because it allows for easier manipulation of layers on
the client side. Representing different layers as different derivations
also allows them to potentially be built in parallel, although I'm not
sure how beneficial this might be.

Related to this, at the moment Docker V1 images can be generated, it
would be good in the future to also support Docker V2 images and OCI
images. All three container formats use a layered approach to managing
the files, but they are all different (as far as I'm aware).

In my mind there are three architectural approaches:

 - Image generation entirely on the build side

   - The layers and the image are constructed through one derivation
   - The code for building images is in a module available at build time
   - Different approaches for layering are implemented in the module
     available at build time, and parameters are passed in as
     data/gexpressions

 - Image generation entirely on the client side

   - Each layer is a derivation, and the image is an additional
     derivation that takes the layers as an input
   - The code for building images is inside gexp compilers for the
     record types representing the images and layers
   - Different approaches for layering manipulate the layer records on
     the client side

 - Image generation can be done both build and client side

   - Depending on the parameters, the layers and image can be a single
     derivation, or one for each layer, and another for the image
   - The code for building images is in a module available at build
     time, and this is also used by gexp compilers
   - Different approaches for layering have the option of either being
     on the build side, or the client side

What are peoples thoughts?

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

next prev parent reply	other threads:[~2020-03-26 20:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21 23:24 Experiment in generating multi-layer Docker images with guix pack Christopher Baines
2020-03-21 23:24 ` [PATCH 1/3] Rename (guix docker) to (guix build docker) Christopher Baines
2020-03-21 23:24 ` [PATCH 2/3] Make guix pack work with the new docker image gexpressions Christopher Baines
2020-03-21 23:24 ` [PATCH 3/3] Generate two layers for docker images in guix pack Christopher Baines
2020-03-26 12:03 ` Experiment in generating multi-layer Docker images with " Ludovic Courtès
2020-03-26 20:15   ` Christopher Baines [this message]
2020-03-29 14:50     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wo76kh4i.fsf@cbaines.net \
    --to=mail@cbaines.net \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).