unofficial mirror of help-guix@gnu.org 
 help / color / mirror / Atom feed
From: Stephen Scheck <singularsyntax@gmail.com>
To: Chris Marusich <cmmarusich@gmail.com>
Cc: help-guix <help-guix@gnu.org>
Subject: Re: Guix Docker image inflation
Date: Sat, 30 May 2020 13:02:02 -0400	[thread overview]
Message-ID: <CAKjnHz1=4v5kMQ8G6+_rQpMrK183HD5JUW8YKuvFPVX-e_UfWw@mail.gmail.com> (raw)
In-Reply-To: <87h7vyxqrz.fsf@gmail.com>

On Fri, May 29, 2020 at 7:31 PM Chris Marusich <cmmarusich@gmail.com> wrote:

>
> Could it be that you are accumulating layers without bound?
>
>
> https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/
>
> Since Docker images are built up of immutable layers, if you build your
> image from an existing base image, I'm not sure that it's possible to
> produce a new image that is smaller than the base image.  Basically,
> even if you run "guix gc" to remove dead store items, they will still
> exist on a prior layer, so the size of the new image won't decrease.
> And since you're installing new things, the size will actually increase.
> If you repeat this process by using the new image as an input for yet
> another build, I think you will accumulate layers and storage space
> without bound.
>

Layers certainly add some image size overhead, but I don't think that is
the culprit
here. And producing a smaller image isn't really the goal, it's just to
keep image
growth reasonable between each incremental guix pull. Dead store items would
only exist on previous layers if they make it there in the first place. As
has been
demonstrated on previous posts in the thread, I believe the problem is some
guix bug which prevents deletion of garbage-collected store items.

What is reasonable growth? That is hard to answer, but I would expect it be
roughly
proportional to the growth of a guix installation over time in a non-Docker
environment,
taking some constant amount of layer overhead as a given.

I don't really know what `guix pull` does, but I think it's something along
these lines:
1) the global package index is brought up-to-date; 2) Any packages which
are installed
in the profile doing the pull are upgraded to newer versions if they've
been updated. So
day-to-day, particularly in the case where there have been no updates to
packages
installed in the profile, size growth should be very small. Periodic
"rebasing" of incremental
Docker images might still be helpful from time to time using one of the
layer squashing
tools out there, but I don't think it should be necessary on a daily basis.

Also, layers are helpful in the case of someone pulling down daily Guix
Docker images
on a frequent basis, because then only the new, ideally small layers need
to be downloaded,
whereas if you rebase for every image build, you'd have to download the
entire image
every day.

The boundless layer accumulation you point out shouldn't be a problem with
the way that
I'm building the images. When you do a `RUN <command>` inside a Dockerfile,
it is essentially
doing `docker exec <container> <command>` followed by `docker commit
<container>`. It is
the commit step which produces a new layer. You can think of a RUN command
inside a Dockerfile
as kind of a single-step transaction, which incorporates the net file
system changes into the image.

My build script issues several `docker exec <container> <command>`
sequences, followed by a
`docker commit <container>`. Intermediate changes to the container file
system prior to the commit
do not generate layers, only the net changes after the commit.

You can convince yourself of this by doing something like the following:

    docker run <some-linux-image>
    docker exec <container-id> dd if=/dev/urandom of=/RANDOM-DATA
bs=1048576 count=1024
    docker commit <container-id>
    docker exec <container-id> rm /RANDOM-DATA
    docker commit <container-id>

You'll end up with two new images - the first one should be about 1 GB
larger than the base image,
the second one the same size.

FYI, Guix itself can build Docker images from scratch - no base image
> required!  It can even build a Docker image of a full-blown Guix System
> from scratch.  Sorry if you already knew that - I just wanted to point
> it out in case you didn't!
>

Yes, thanks, I know - if you read through the thread you'll see that I make
reference to  `guix system docker-image [...]`.

-SS

  parent reply	other threads:[~2020-05-30 17:02 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 19:41 Guix Docker image inflation Stephen Scheck
2020-05-28 18:10 ` Leo Famulari
2020-05-29 16:19   ` Stephen Scheck
2020-05-29 17:08     ` Leo Famulari
2020-05-29 17:56       ` Stephen Scheck
2020-05-29 18:02         ` Leo Famulari
2020-05-29 18:21           ` Marius Bakke
2020-05-29 18:37             ` Leo Famulari
2020-05-29 18:44               ` zimoun
2020-05-29 21:24                 ` Stephen Scheck
2020-05-29 18:29           ` Stephen Scheck
2020-05-29 17:12     ` zimoun
2020-05-29 17:36       ` Stephen Scheck
2020-05-29 18:08 ` zimoun
2020-05-29 18:47   ` Stephen Scheck
2020-05-29 20:02     ` zimoun
2020-05-29 21:04       ` Stephen Scheck
2020-05-29 21:54         ` zimoun
2020-05-29 22:11           ` Stephen Scheck
2020-05-29 23:30 ` Chris Marusich
2020-05-29 23:55   ` zimoun
2020-05-30 17:13     ` Stephen Scheck
2020-05-31  9:37       ` zimoun
2020-05-31 18:30         ` Stephen Scheck
2020-05-31 18:51           ` zimoun
2020-05-31 19:43             ` Stephen Scheck
2020-05-31 23:27               ` zimoun
2020-05-31 21:04           ` Chris Marusich
2020-06-01  0:37             ` zimoun
2020-05-30 17:02   ` Stephen Scheck [this message]
2020-05-31  4:31     ` Chris Marusich
2020-05-31  9:08       ` zimoun
2020-05-31 17:50       ` Stephen Scheck
2020-05-31 18:33         ` zimoun
2020-05-31  8:24     ` zimoun
2020-05-31 10:50       ` Vincent Legoll
2020-05-31 17:58         ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKjnHz1=4v5kMQ8G6+_rQpMrK183HD5JUW8YKuvFPVX-e_UfWw@mail.gmail.com' \
    --to=singularsyntax@gmail.com \
    --cc=cmmarusich@gmail.com \
    --cc=help-guix@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).