* Guix Docker image inflation @ 2020-05-27 19:41 Stephen Scheck 2020-05-28 18:10 ` Leo Famulari ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-27 19:41 UTC (permalink / raw) To: help-guix Hello, As an exercise, I set up daily Guix System Docker image builds using GitLab and Docker Hub, here: https://hub.docker.com/repository/registry-1.docker.io/singularsyntax/guix/tags?page=1 The build process works as follows: if an existing `latest` image does not exist for a given branch (master, 1.1.0, etc.), then bootstrap an image by running `guix system docker-image` inside an Alpine Linux Docker container with a fresh Guix installation. Using this image as a seed, `guix pull` is run for the desired branch, and the resulting image is committed to the Docker repository. If a "latest" image does exist, it is used instead as the base from which to run `guix pull`. Daily images are thus built incrementally from the previous day's build. For anybody curious about the process, the build script can be browsed here: https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml It works pretty well, except that I'm observing substantial image size inflation day-over-day, starting at ~197 MB from the seed image, now up to 1.71 GB eleven days later despite running `guix gc --delete-generations`, `guix gc --collect-garbage`, and `guix gc --optimize` after pulling prior to committing each new image. I'm wondering if there is some other Guix GC operation or option I'm missing, or any other suggestions which could stop this unsustainable image bloat from occurring. I really do doubt that the Guix System itself is growing this quickly. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-27 19:41 Guix Docker image inflation Stephen Scheck @ 2020-05-28 18:10 ` Leo Famulari 2020-05-29 16:19 ` Stephen Scheck 2020-05-29 18:08 ` zimoun 2020-05-29 23:30 ` Chris Marusich 2 siblings, 1 reply; 37+ messages in thread From: Leo Famulari @ 2020-05-28 18:10 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix On Wed, May 27, 2020 at 03:41:49PM -0400, Stephen Scheck wrote: > As an exercise, I set up daily Guix System Docker image builds using GitLab > and Docker Hub, here: > https://hub.docker.com/repository/registry-1.docker.io/singularsyntax/guix/tags?page=1 Cool! > The build process works as follows: if an existing `latest` image does not > exist for a given branch (master, 1.1.0, etc.), then bootstrap an image by > running `guix system docker-image` inside an Alpine Linux Docker container > with a fresh Guix installation. Using this image as a seed, `guix pull` is > run for the desired branch, and the resulting image is committed to the > Docker repository. If a "latest" image does exist, it is used instead as > the base from which to run `guix pull`. Daily images are thus built > incrementally from the previous day's build. For anybody curious about the > process, the build script can be browsed here: > https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml I'm not familiar with Docker so I'm not sure exactly what you are doing. Specifically, I can't tell if you are creating new Docker images from scratch each day, or if you are continuing to use the same one from day to day. > It works pretty well, except that I'm observing substantial image size > inflation day-over-day, starting at ~197 MB from the seed image, now up to > 1.71 GB eleven days later despite running `guix gc --delete-generations`, > `guix gc --collect-garbage`, and `guix gc --optimize` after pulling prior > to committing each new image. I'm also not sure which image is growing each day... In general, the parameters --delete-generations and --collect-garbage are supposed to be passed values like a reference to a profile or an amount of data to delete, respectively. Are you doing that? Are you removing / invalidating old generations before attempting to garbage collect them? The store items they refer to cannot be deleted until the generations themselves are no longer registered. You can list existing generations with e.g. `guix package --list-generations`. You can invalidate them with `guix package --delete-generations=42` or with time-based patterns like `guix package --delete-generations=1m`, which removes everything older than one month. The same-named argument to `guix gc` should be shorthand for that. Similarly for the profile used by `guix pull`, which is accessed like this: `guix package --profile=$HOME/.config/guix/current --list-generations`. Usually, these old profiles are responsible for most of the disk usage in /gnu/store. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-28 18:10 ` Leo Famulari @ 2020-05-29 16:19 ` Stephen Scheck 2020-05-29 17:08 ` Leo Famulari 2020-05-29 17:12 ` zimoun 0 siblings, 2 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 16:19 UTC (permalink / raw) To: Leo Famulari; +Cc: help-guix On Thu, May 28, 2020 at 3:33 PM Leo Famulari <leo@famulari.name> wrote: > I'm not familiar with Docker so I'm not sure exactly what you are doing. > Specifically, I can't tell if you are creating new Docker images from > scratch each day, or if you are continuing to use the same one from day > to day. The previous day's Docker image is used as the base for the new one being built - the image is pulled from Docker Hub, `guix pull` is run inside it, and a new image is "committed" (Docker terminology for creating a new image from a file system snapshot). BTW, I posted an incorrect internal link - the actual Docker images are available here if you'd like to try them out: https://hub.docker.com/r/singularsyntax/guix/tags > I'm also not sure which image is growing each day... The daily Docker images described above. > In general, the parameters --delete-generations and --collect-garbage > are supposed to be passed values like a reference to a profile or an > amount of data to delete, respectively. Are you doing that? `guix gc --delete-generations` without a parameter causes all preceding pull and package generations to be deleted. > Are you removing / invalidating old generations before attempting to > garbage collect them? The store items they refer to cannot be deleted > until the generations themselves are no longer registered. Yes, `guix gc --delete-generations`, `guix gc --collect-garbage`, and `guix gc --optimize` are run in the order given. Note that passing a specific amount parameter to `--collect-garbage` makes no difference. > Usually, these old profiles are responsible for most of the disk usage > in /gnu/store. Indeed. It's clear what's taking up the space, but I don't understand why it does not get garbage collected: root@localhost /# guix pull --list-generations Generation 12 May 28 2020 20:45:30 (current) guix a5374cd repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: a5374cde918cfeae5c16b43b9f2dd2b24bc3564d root@localhost /# guix package --list-generations guix package: error: profile '/var/guix/profiles/per-user/root/guix-profile' does not exist root@localhost /# du -h --max-depth=1 /gnu/store | egrep "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$" 44M /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system 44M /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system 107M /gnu/store/plaay02w581vx9ilyiv93sl1lw54n7h5-guix-packages-base 44M /gnu/store/qhbk7g8z97m37iak1s1yn2my82gv0lj5-guix-system 103M /gnu/store/2qcfl7h10dynjlifyvqwh9iiic52q5x6-guix-packages-base 107M /gnu/store/m0fv2xmfif5pxnfb1bscfvgyfx0x6xdc-guix-packages-base 90M /gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules 41M /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system 191M /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules 201M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules 44M /gnu/store/dzc16sv8jv831m0jkk5llc2ws1a3mk0z-guix-system 44M /gnu/store/9a2hr5lh15vxqa7bjih8w47wr6hr11nv-guix-system 103M /gnu/store/1lwdys51wi08r5an2rr6sqk9kbgr7qip-guix-packages-base 44M /gnu/store/c3spiv1c0fg83j7d99mjwk0s6fw77wl5-guix-system 44M /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system 6.7M /gnu/store/a5xsqxr04pwnyni5x2gqjnishzq80cbw-guix-packages-base 14M /gnu/store/mych9fchln22pbhpc5syxyymx4hz496y-guix-8bd0b533b-modules 35M /gnu/store/brbwlbnx56ms50kklyqk9fsf0xkwjjf9-guix-498e2e669-modules 3.2M /gnu/store/dirpwhdr7h4nyphy4ncxqi4f2njv3rsh-guix-packages-base 35M /gnu/store/d3h4b7nvnms8d03ddi9b481dlxpykl7l-guix-5e3d16994-modules 5.8M /gnu/store/n339sr8c63f0nzja6yl8zfwy1jklj19j-guix-packages-base 25M /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules 41M /gnu/store/pwr8ab20xa1whxag689lsz82l2na08x0-guix-system 6.5M /gnu/store/6sggbpgg0zkbgxwf3wa2j15dis8z7cr1-guix-packages-base 57M /gnu/store/8z9qc2bvq8azc08p4miq77yf2agk07aq-guix-843e77205-modules 71M /gnu/store/ibgjq1ampj8bldrabbsnwik2sr0gg3as-guix-a43fe7acd-modules 37M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules 18M /gnu/store/i72b4biraw6bhy1v7ly46kwyaacvfa28-guix-system 178M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules 15M /gnu/store/77sxajrwigsdnyr4l4jq4pk6v5kwbm59-guix-system ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 16:19 ` Stephen Scheck @ 2020-05-29 17:08 ` Leo Famulari 2020-05-29 17:56 ` Stephen Scheck 2020-05-29 17:12 ` zimoun 1 sibling, 1 reply; 37+ messages in thread From: Leo Famulari @ 2020-05-29 17:08 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix On Fri, May 29, 2020 at 12:19:46PM -0400, Stephen Scheck wrote: > The previous day's Docker image is used as the base for the new one being > built - the image is pulled from Docker Hub, `guix pull` is run inside it, > and a new > image is "committed" (Docker terminology for creating a new image from a > file system snapshot). I'm still not quite sure what you are doing (or what Docker does) so please bear with me. > root@localhost /# du -h --max-depth=1 /gnu/store | egrep > "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$" [...] > 191M /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules > 201M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules If I understand correctly, you should not need both of these directories in a Guix VM image. The latter hashes are truncated guix.git commit hashes and a VM image would only be based on a single one. I recommend looking into why all these directories are being copied into your images. I figure you'd want to create each image with *only* the things corresponding to the Git commit it's based on, but it sounds like they are being created by copying the entire host image, which doesn't seem right. If the Docker images are being created by simply snapshotting the file system of a non-ephemeral Guix system, that's probably not the right way to do it. Is that what's going on? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 17:08 ` Leo Famulari @ 2020-05-29 17:56 ` Stephen Scheck 2020-05-29 18:02 ` Leo Famulari 0 siblings, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 17:56 UTC (permalink / raw) To: Leo Famulari; +Cc: help-guix On Fri, May 29, 2020 at 1:08 PM Leo Famulari <leo@famulari.name> wrote: > I'm still not quite sure what you are doing (or what Docker does) so > please bear with me. > > > root@localhost /# du -h --max-depth=1 /gnu/store | egrep > > "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$" > [...] > > 191M > /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules > > 201M > /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules > > If I understand correctly, you should not need both of these directories > in a Guix VM image. The latter hashes are truncated guix.git commit > hashes and a VM image would only be based on a single one. > Exactly, I agree (to the extent that I understand Guix). I recommend looking into why all these directories are being copied into > your images. > Whatever is in /gnu/store (as managed by Guix) goes into the image, nothing more and nothing less. > > I figure you'd want to create each image with *only* the things > corresponding to the Git commit it's based on, but it sounds like they > are being created by copying the entire host image, which doesn't seem > right. > > If the Docker images are being created by simply snapshotting the file > system of a non-ephemeral Guix system, that's probably not the right way > to do it. Is that what's going on? > Yes, as I said, the image is created from a file system snapshot, after Guix is brought up to date via `guix pull` and those various Guix garbage collection operations are run. However, it's not quite "non-ephmeral" as each Guix operation is run as an atomic command inside the Docker container, with nothing else running (except for guix-daemon, which has to always be running for Guix to operate to the best of my understanding, and a couple other Guix System daemons which anyway would be equivalent to the situation to any Guix installation running outside of a Docker container). How else would you suggest that it be done? It would be nice if `guix system docker-image` took `--branch` and `--commit` options to build a container from a well-defined Guix check-in state, but that doesn't seem to be the case. And in any case - too slow. The point here is to leverage daily incremental pulls to keep data transfer and build times down. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 17:56 ` Stephen Scheck @ 2020-05-29 18:02 ` Leo Famulari 2020-05-29 18:21 ` Marius Bakke 2020-05-29 18:29 ` Stephen Scheck 0 siblings, 2 replies; 37+ messages in thread From: Leo Famulari @ 2020-05-29 18:02 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix On Fri, May 29, 2020 at 01:56:28PM -0400, Stephen Scheck wrote: > > > "guix-system$|guix-packages-base$|guix-[0-9a-f]*-modules$" > > [...] > > > 191M > > /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules > > > 201M > > /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules > > > > If I understand correctly, you should not need both of these directories > > in a Guix VM image. The latter hashes are truncated guix.git commit > > hashes and a VM image would only be based on a single one. > > > > Exactly, I agree (to the extent that I understand Guix). > > I recommend looking into why all these directories are being copied into > > your images. > > > > Whatever is in /gnu/store (as managed by Guix) goes into the image, nothing > more and nothing less. Okay. For debugging, can you try garbage collecting those modules directories? And if the garbage collector refuses, you can investigate why with the 3 R's of Guix garbage collection, --referrers, --references, and --requisites. > How else would you suggest that it be done? It would be nice if `guix > system docker-image` > took `--branch` and `--commit` options to build a container from a > well-defined Guix check-in > state, but that doesn't seem to be the case. And in any case - too slow. > The point here is to > leverage daily incremental pulls to keep data transfer and build times down. --branch and --commit would be passed to `guix pull`, and then you'd run `guix system docker-image` based on that. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:02 ` Leo Famulari @ 2020-05-29 18:21 ` Marius Bakke 2020-05-29 18:37 ` Leo Famulari 2020-05-29 18:29 ` Stephen Scheck 1 sibling, 1 reply; 37+ messages in thread From: Marius Bakke @ 2020-05-29 18:21 UTC (permalink / raw) To: Leo Famulari, Stephen Scheck; +Cc: help-guix [-- Attachment #1: Type: text/plain, Size: 615 bytes --] Leo Famulari <leo@famulari.name> writes: >> How else would you suggest that it be done? It would be nice if `guix >> system docker-image` >> took `--branch` and `--commit` options to build a container from a >> well-defined Guix check-in >> state, but that doesn't seem to be the case. And in any case - too slow. >> The point here is to >> leverage daily incremental pulls to keep data transfer and build times down. > > --branch and --commit would be passed to `guix pull`, and then you'd run > `guix system docker-image` based on that. There is also 'guix time-machine --commit=abc123 -- system docker-image'. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:21 ` Marius Bakke @ 2020-05-29 18:37 ` Leo Famulari 2020-05-29 18:44 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Leo Famulari @ 2020-05-29 18:37 UTC (permalink / raw) To: Marius Bakke; +Cc: help-guix, Stephen Scheck On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote: > Leo Famulari <leo@famulari.name> writes: > > --branch and --commit would be passed to `guix pull`, and then you'd run > > `guix system docker-image` based on that. > > There is also 'guix time-machine --commit=abc123 -- system docker-image'. Right, that's probably more efficient than creating lots of `guix pull` generations. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:37 ` Leo Famulari @ 2020-05-29 18:44 ` zimoun 2020-05-29 21:24 ` Stephen Scheck 0 siblings, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 18:44 UTC (permalink / raw) To: Leo Famulari; +Cc: help-guix, Stephen Scheck On Fri, 29 May 2020 at 20:37, Leo Famulari <leo@famulari.name> wrote: > > On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote: > > Leo Famulari <leo@famulari.name> writes: > > > --branch and --commit would be passed to `guix pull`, and then you'd run > > > `guix system docker-image` based on that. > > > > There is also 'guix time-machine --commit=abc123 -- system docker-image'. > > Right, that's probably more efficient than creating lots of `guix pull` > generations. Yes, but it is hard to apriori know the forward commit. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:44 ` zimoun @ 2020-05-29 21:24 ` Stephen Scheck 0 siblings, 0 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 21:24 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 2:44 PM zimoun <zimon.toutoune@gmail.com> wrote: > On Fri, 29 May 2020 at 20:37, Leo Famulari <leo@famulari.name> wrote: > > > > On Fri, May 29, 2020 at 08:21:08PM +0200, Marius Bakke wrote: > > > Leo Famulari <leo@famulari.name> writes: > > > > --branch and --commit would be passed to `guix pull`, and then you'd > run > > > > `guix system docker-image` based on that. > > > > > > There is also 'guix time-machine --commit=abc123 -- system > docker-image'. > > > > Right, that's probably more efficient than creating lots of `guix pull` > > generations. > > Yes, but it is hard to apriori know the forward commit. > Yes, and also, does a Docker image created by `guix pull` followed by `guix system docker-image [...]` in fact really inherit the Guix snapshot from the system that creates it? Here's what I get on a freshly minted image made that way: root@guix /# guix pull --list-generations guix pull: error: profile '/var/guix/profiles/per-user/root/current-guix' does not exist root@guix /# guix describe guix describe: error: failed to determine origin hint: Perhaps this `guix' command was not obtained with `guix pull'? Its version string is 1.1.0-4.bdc801e. root@guix /# guix package --list-generations guix package: error: profile '/var/guix/profiles/per-user/root/guix-profile' does not exist But here's `guix describe` output from the parent system: root@localhost /# guix describe Generation 13 May 29 2020 19:28:11 (current) guix 41a2d6a repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: 41a2d6a8b9294a6eb8e97aaefd569e755f5f461e Until a fresh `guix pull` is run on the new image, it isn't functional and there's no apparent way to confirm its actual commit hash, so I don't really see what advantage it offers over the incremental method I'm using (and it's unfeasibly slow, about 10-15 minutes for an incremental pull compared to over an hour to finish `guix system docker-image`). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:02 ` Leo Famulari 2020-05-29 18:21 ` Marius Bakke @ 2020-05-29 18:29 ` Stephen Scheck 1 sibling, 0 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 18:29 UTC (permalink / raw) To: Leo Famulari; +Cc: help-guix On Fri, May 29, 2020 at 2:02 PM Leo Famulari <leo@famulari.name> wrote: > Okay. For debugging, can you try garbage collecting those modules > directories? And if the garbage collector refuses, you can investigate > why with the 3 R's of Guix garbage collection, --referrers, > --references, and --requisites. > # Hmm... root@localhost /gnu/store# guix gc --references /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules guix gc: error: path `/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' is not valid # Hmm... root@localhost /gnu/store# guix gc --requisites /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules guix gc: error: path `/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' is not valid # Hmm... this one is different - no output root@localhost /gnu/store# guix gc --referrers /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules # Now try to delete it... root@localhost /gnu/store# guix gc --delete /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules finding garbage collector roots... [0 MiB] deleting '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' deleting `/gnu/store/trash' deleting unused links... note: currently hard linking saves 1181.36 MiB # Still there... root@localhost /gnu/store# du -hs /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules 210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 16:19 ` Stephen Scheck 2020-05-29 17:08 ` Leo Famulari @ 2020-05-29 17:12 ` zimoun 2020-05-29 17:36 ` Stephen Scheck 1 sibling, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 17:12 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear Stephen, I am not sure to follow all the Docker dance. Well, if I understand correctly, you did: guix package --delete-generations guix gc which remove all except the current profile, i.e., ~/.guix-profile. However, there is another profile '~/.config/guix/current' which is the profile used when guix pull. Therefore, you have to clean the generations here too: guix pull --delete-generations guix gc Does it reduce the size? All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 17:12 ` zimoun @ 2020-05-29 17:36 ` Stephen Scheck 0 siblings, 0 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 17:36 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 1:12 PM zimoun <zimon.toutoune@gmail.com> wrote: > Dear Stephen, > > I am not sure to follow all the Docker dance. Well, if I understand > correctly, you did: > > guix package --delete-generations > guix gc > > which remove all except the current profile, i.e., ~/.guix-profile. > However, there is another profile '~/.config/guix/current' which is > the profile used when guix pull. Therefore, you have to clean the > generations here too: > > guix pull --delete-generations > guix gc > > Does it reduce the size? > root@localhost /# du -hs /gnu/store 4.3G /gnu/store ### only one generation ### root@localhost /# guix pull --list-generations Generation 12 May 28 2020 20:45:30 (current) guix a5374cd repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: a5374cde918cfeae5c16b43b9f2dd2b24bc3564d root@localhost /# guix pull --delete-generations ### still only one generation ### root@localhost /# guix pull --list-generations Generation 12 May 28 2020 20:45:30 (current) guix a5374cd repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: a5374cde918cfeae5c16b43b9f2dd2b24bc3564d root@localhost /# guix gc finding garbage collector roots... deleting garbage... [0 MiB] deleting '/gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system' [0 MiB] deleting '/gnu/store/dzifisbdk1gwy2fw2hwzgvdnjak22awl-guix-extra' [0 MiB] deleting '/gnu/store/rqz825cwaf4866d2aljwkk9qq0g7rmzm-module-import' ### Many more store files (all 0 MiB) elided... deleting `/gnu/store/trash' deleting unused links... note: currently hard linking saves 1181.36 MiB guix gc: freed 0 MiBs ### no space recovered ### root@localhost /# du -hs /gnu/store 4.3G /gnu/store ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-27 19:41 Guix Docker image inflation Stephen Scheck 2020-05-28 18:10 ` Leo Famulari @ 2020-05-29 18:08 ` zimoun 2020-05-29 18:47 ` Stephen Scheck 2020-05-29 23:30 ` Chris Marusich 2 siblings, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 18:08 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear, On Wed, 27 May 2020 at 21:42, Stephen Scheck <singularsyntax@gmail.com> wrote: > https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml How the initial Docker image singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 is built? To understand, you use the Docker image singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 to build another Docker image namely guix-docker-image.tar using Guix, right? Well, that is not the point neither the issue. :-) Well, instead of that --8<---------------cut here---------------start------------->8--- GUIX_PATH=/root/.config/guix/current/bin $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback /root/.config/guix/current/bin/guix gc --delete-generations /root/.config/guix/current/bin/guix gc --collect-garbage /root/.config/guix/current/bin/guix gc --optimize docker commit /root/.config/guix/current/bin/guix package --install --fallback jq --8<---------------cut here---------------end--------------->8--- could you try that --8<---------------cut here---------------start------------->8--- GUIX_PATH=/root/.config/guix/current/bin $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback /root/.config/guix/current/bin/guix pull -d /root/.config/guix/current/bin/guix package -d /root/.config/guix/current/bin/guix gc docker commit /root/.config/guix/current/bin/guix package --install --fallback jq --8<---------------cut here---------------end--------------->8--- ? Last, you could try to see what "guix package --list-profiles" says and then "guix gc --list-dead". Hope that helps, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:08 ` zimoun @ 2020-05-29 18:47 ` Stephen Scheck 2020-05-29 20:02 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 18:47 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 2:08 PM zimoun <zimon.toutoune@gmail.com> wrote: > How the initial Docker image > singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 is built? > To understand, you use the Docker image > singularsyntax/guix-bootstrap:1.1.0-alpine-3.11 to build another > Docker image namely guix-docker-image.tar using Guix, right? > Well, that is not the point neither the issue. :-) > You can look at the Dockerfile here: https://gitlab.com/singularsyntax-docker-hub/guix-bootstrap It's pretty close to exactly the manual instructions for installing Guix on a "foreign" distro on top of Alpine Linux. Not the point, no, but how else do I obtain a seed Guix Docker image, which I can use to birth clean, pristine "baby" images of Guix's own making? It would be really nice if the Guix project itself provided such an image! > could you try that > > --8<---------------cut here---------------start------------->8--- > GUIX_PATH=/root/.config/guix/current/bin > $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback > /root/.config/guix/current/bin/guix pull -d > /root/.config/guix/current/bin/guix package -d > /root/.config/guix/current/bin/guix gc > docker commit > /root/.config/guix/current/bin/guix package --install --fallback jq > --8<---------------cut here---------------end--------------->8--- > > Last, you could try to see what "guix package --list-profiles" says > and then "guix gc --list-dead". root@localhost /gnu/store# guix pull -d root@localhost /gnu/store# guix package --list-profiles /root/.config/guix/current root@localhost /gnu/store# guix package -d guix package: error: profile '/var/guix/profiles/per-user/root/guix-profile' does not exist root@localhost /gnu/store# guix package --list-profiles /root/.config/guix/current root@localhost /gnu/store# du -hs . 4.3G . root@localhost /gnu/store# guix gc finding garbage collector roots... deleting garbage... [0 MiB] deleting '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' [0 MiB] deleting '/gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system' [0 MiB] deleting '/gnu/store/dzifisbdk1gwy2fw2hwzgvdnjak22awl-guix-extra' deleting `/gnu/store/trash' deleting unused links... note: currently hard linking saves 1181.82 MiB guix gc: freed 0.636 MiBs root@localhost /gnu/store# du -hs . 4.3G . root@localhost /gnu/store# guix gc --list-dead finding garbage collector roots... determining live/dead paths... /gnu/store/0bm8h4ns6bymc7q24vhfr0dnb7qab729-guix-cli /gnu/store/0hjjj9dppc5xvq3bfjwbsygrfyqn0rlv-guix-cli /gnu/store/0m0xx2958fgyz8kk093afik5cn4rhrc1-guix-cli-modules /gnu/store/0pi2jhn3a778gc3fm1l31sh07fik4zwa-guix-system-tests-modules /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules # Lots more listed... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 18:47 ` Stephen Scheck @ 2020-05-29 20:02 ` zimoun 2020-05-29 21:04 ` Stephen Scheck 0 siblings, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 20:02 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix On Fri, 29 May 2020 at 20:47, Stephen Scheck <singularsyntax@gmail.com> wrote: > Not the point, no, but how else do I obtain a seed Guix Docker image, which I can use to birth clean, pristine > "baby" images of Guix's own making? It would be really nice if the Guix project itself provided such an image! Help welcome! :-) Well, this > root@localhost /gnu/store# guix gc > finding garbage collector roots... > deleting garbage... > [0 MiB] deleting '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules' > [0 MiB] deleting '/gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system' > [0 MiB] deleting '/gnu/store/dzifisbdk1gwy2fw2hwzgvdnjak22awl-guix-extra' > deleting `/gnu/store/trash' > deleting unused links... > note: currently hard linking saves 1181.82 MiB > guix gc: freed 0.636 MiBs and this > root@localhost /gnu/store# guix gc --list-dead > finding garbage collector roots... > determining live/dead paths... > /gnu/store/0bm8h4ns6bymc7q24vhfr0dnb7qab729-guix-cli > /gnu/store/0hjjj9dppc5xvq3bfjwbsygrfyqn0rlv-guix-cli > /gnu/store/0m0xx2958fgyz8kk093afik5cn4rhrc1-guix-cli-modules > /gnu/store/0pi2jhn3a778gc3fm1l31sh07fik4zwa-guix-system-tests-modules > /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules > # Lots more listed... is weird. Something wrong happens here. Well, could you try guix system delete-generations guix gc ? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 20:02 ` zimoun @ 2020-05-29 21:04 ` Stephen Scheck 2020-05-29 21:54 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 21:04 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 4:02 PM zimoun <zimon.toutoune@gmail.com> wrote: > Well, could you try > > guix system delete-generations > guix gc > root@guix /# guix system list-generations guix system: error: open-file: No such file or directory: "/var/guix/profiles/system-1-link/parameters" root@guix /# guix system delete-generations Backtrace: 1 (primitive-load "/root/.config/guix/current/bin/guix") In guix/ui.scm: 1936:12 0 (run-guix-command _ . _) guix/ui.scm:1936:12: In procedure run-guix-command: In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 21:04 ` Stephen Scheck @ 2020-05-29 21:54 ` zimoun 2020-05-29 22:11 ` Stephen Scheck 0 siblings, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 21:54 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix On Fri, 29 May 2020 at 23:04, Stephen Scheck <singularsyntax@gmail.com> wrote: > root@guix /# guix system list-generations > guix system: error: open-file: No such file or directory: "/var/guix/profiles/system-1-link/parameters" Do you have '/var/' in your Docker image? Because it looks like the same than: > root@localhost /gnu/store# guix package --list-profiles > /root/.config/guix/current > root@localhost /gnu/store# guix package -d > guix package: error: profile '/var/guix/profiles/per-user/root/guix-profile' does not exist In addition, you have that: > root@localhost /gnu/store# guix package --list-profiles > /root/.config/guix/current and it is really weird because you are doing: guix package --install --fallback jq /root/.config/guix/current/bin/guix describe --format=json | /root/.guix-profile/bin/jq therefore, somehow, the profile '/root/.guix-profile' should appears with '--list-profiles' too. I do not know if it is a bug -- as Leo suggests -- or if something is not configured as expected. Well, I asked you about the initial Docker images because it should come from this one. The fact that "guix gc --list-dead" outputs a lot of items and the fact that 'you cannot garbage collect with "guix gc" leads me to think that something is wrong with '/var/guix/'. I do not know... Well, does "guix gc --list-dead | grep guix-cli-modules.drv | wc -l" return the same number than you have ran "guix pull"? All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 21:54 ` zimoun @ 2020-05-29 22:11 ` Stephen Scheck 0 siblings, 0 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-29 22:11 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 5:54 PM zimoun <zimon.toutoune@gmail.com> wrote: > Do you have '/var/' in your Docker image? Because it looks like the same > than: > Yes: root@guix ~# ls -la /var/guix total 44 drwxr-xr-x 1 root root 4096 May 16 19:36 ./ drwxr-xr-x 1 root root 4096 May 29 22:02 ../ drwxr-xr-x 1 root root 4096 May 29 22:02 daemon-socket/ drwxr-xr-x 1 root root 4096 May 27 00:34 db/ -rw------- 1 root root 0 May 16 19:35 gc.lock drwxr-xr-x 1 root root 4096 May 16 19:57 gcroots/ drwxr-xr-x 1 root root 4096 Jan 1 1970 profiles/ drwxr-xr-x 1 root root 4096 May 16 19:35 substitute/ drwxr-xr-x 1 root root 4096 May 27 00:34 temproots/ drwxr-xr-x 1 root root 4096 May 16 19:36 userpool/ If you'd like, you can fetch the exact same image and look around yourself: docker pull singularsyntax/guix:master-a5374cd # same as singularsyntax/guix:latest CONTAINER=`docker run --detach --tty --privileged singularsyntax/guix:master-a5374cd` docker exec --interactive --tty $CONTAINER /run/current-system/profile/bin/bash --login ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-27 19:41 Guix Docker image inflation Stephen Scheck 2020-05-28 18:10 ` Leo Famulari 2020-05-29 18:08 ` zimoun @ 2020-05-29 23:30 ` Chris Marusich 2020-05-29 23:55 ` zimoun 2020-05-30 17:02 ` Stephen Scheck 2 siblings, 2 replies; 37+ messages in thread From: Chris Marusich @ 2020-05-29 23:30 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix [-- Attachment #1: Type: text/plain, Size: 3114 bytes --] Stephen Scheck <singularsyntax@gmail.com> writes: > Hello, > > As an exercise, I set up daily Guix System Docker image builds using GitLab > and Docker Hub, here: > https://hub.docker.com/repository/registry-1.docker.io/singularsyntax/guix/tags?page=1 > > The build process works as follows: if an existing `latest` image does not > exist for a given branch (master, 1.1.0, etc.), then bootstrap an image by > running `guix system docker-image` inside an Alpine Linux Docker container > with a fresh Guix installation. Using this image as a seed, `guix pull` is > run for the desired branch, and the resulting image is committed to the > Docker repository. If a "latest" image does exist, it is used instead as > the base from which to run `guix pull`. Daily images are thus built > incrementally from the previous day's build. For anybody curious about the > process, the build script can be browsed here: > https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml > > It works pretty well, except that I'm observing substantial image size > inflation day-over-day, starting at ~197 MB from the seed image, now up to > 1.71 GB eleven days later despite running `guix gc --delete-generations`, > `guix gc --collect-garbage`, and `guix gc --optimize` after pulling prior > to committing each new image. > > I'm wondering if there is some other Guix GC operation or option I'm > missing, or any other suggestions which could stop this unsustainable image > bloat from occurring. I really do doubt that the Guix System itself is > growing this quickly. Could it be that you are accumulating layers without bound? https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/ Since Docker images are built up of immutable layers, if you build your image from an existing base image, I'm not sure that it's possible to produce a new image that is smaller than the base image. Basically, even if you run "guix gc" to remove dead store items, they will still exist on a prior layer, so the size of the new image won't decrease. And since you're installing new things, the size will actually increase. If you repeat this process by using the new image as an input for yet another build, I think you will accumulate layers and storage space without bound. If this is what's happening, you might consider always building starting from the same base image every time. You could then update the base image (e.g., by changing the FROM line of a Dockerfile, if that's what you're using) periodically as new versions of it are released. This would probably allow you to avoid accumulating layers without bound. FYI, Guix itself can build Docker images from scratch - no base image required! It can even build a Docker image of a full-blown Guix System from scratch. Sorry if you already knew that - I just wanted to point it out in case you didn't! See: https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html https://guix.gnu.org/manual/en/html_node/Invoking-guix-system.html Hope that helps, -- Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 23:30 ` Chris Marusich @ 2020-05-29 23:55 ` zimoun 2020-05-30 17:13 ` Stephen Scheck 2020-05-30 17:02 ` Stephen Scheck 1 sibling, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-29 23:55 UTC (permalink / raw) To: Chris Marusich; +Cc: help-guix, Stephen Scheck Hi Chris, On Sat, 30 May 2020 at 01:31, Chris Marusich <cmmarusich@gmail.com> wrote: > Could it be that you are accumulating layers without bound? > > https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/ > > Since Docker images are built up of immutable layers, if you build your > image from an existing base image, I'm not sure that it's possible to > produce a new image that is smaller than the base image. Basically, > even if you run "guix gc" to remove dead store items, they will still > exist on a prior layer, so the size of the new image won't decrease. > And since you're installing new things, the size will actually increase. > If you repeat this process by using the new image as an input for yet > another build, I think you will accumulate layers and storage space > without bound. Thank you for the explanation. The issue is these layers. When I wrote [1], it was not clear for me because I am not enough familiar with Docker, but with your explanations, it is clear now. :-) [1] http://issues.guix.gnu.org/41607#1 > FYI, Guix itself can build Docker images from scratch - no base image > required! It can even build a Docker image of a full-blown Guix System > from scratch. Sorry if you already knew that - I just wanted to point > it out in case you didn't! I think the idea is to use GitlabCI to build the Docker images containing Guix materials. And AFAIK, GitlabCI does not provide Guix related tools, isn't it? I mean there is no gitlab-runner able to run guix-daemon. If I remember well, we discussed about this topic at FOSDEM, it should be awesome. :-) Cheers, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 23:55 ` zimoun @ 2020-05-30 17:13 ` Stephen Scheck 2020-05-31 9:37 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-30 17:13 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Fri, May 29, 2020 at 7:55 PM zimoun <zimon.toutoune@gmail.com> wrote: > Thank you for the explanation. The issue is these layers. When I > wrote [1], it was not clear for me because I am not enough familiar > with Docker, but with your explanations, it is clear now. :-) > > [1] http://issues.guix.gnu.org/41607#1 > No, it is not layers - they are a symptom, not the cause. See my reply to Chris. The problem is clearly that Guix isn't deleting garbage files ... which may have something to do with how Guix interacts with files in the file system and differences in Docker environments (no idea, I don't know how Guix works, but perhaps it needs some special privilege enabled when it runs inside Docker containers?), but layers themselves do not prevent file deletion inside a container. > > FYI, Guix itself can build Docker images from scratch - no base image > > required! It can even build a Docker image of a full-blown Guix System > > from scratch. Sorry if you already knew that - I just wanted to point > > it out in case you didn't! > > I think the idea is to use GitlabCI to build the Docker images > containing Guix materials. And AFAIK, GitlabCI does not provide Guix > related tools, isn't it? I mean there is no gitlab-runner able to run > guix-daemon. If I remember well, we discussed about this topic at > FOSDEM, it should be awesome. :-) > It is possible to host your own external Runners, and have them utilized by CI/CD jobs running inside the GitLab cloud service. You could install Guix on them and configure your CI/CD pipeline to require execution of certain jobs on these custom runners. But I'm not sure I see why that would help? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-30 17:13 ` Stephen Scheck @ 2020-05-31 9:37 ` zimoun 2020-05-31 18:30 ` Stephen Scheck 0 siblings, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-31 9:37 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear Stephen, On Sat, 30 May 2020 at 19:13, Stephen Scheck <singularsyntax@gmail.com> wrote: > No, it is not layers - they are a symptom, not the cause. See my reply to Chris. > The problem is clearly that Guix isn't deleting garbage files ... which may have something > to do with how Guix interacts with files in the file system and differences in Docker > environments (no idea, I don't know how Guix works, but perhaps it needs some special > privilege enabled when it runs inside Docker containers?), but layers themselves do not > prevent file deletion inside a container. No, it is how Docker is designed. Maybe the terminology "layer" is not the Docker one but when the images are chained, one cannot remove the data of the previous layer of the total image. > It is possible to host your own external Runners, and have them utilized by > CI/CD jobs running inside the GitLab cloud service. You could install Guix > on them and configure your CI/CD pipeline to require execution of certain > jobs on these custom runners. But I'm not sure I see why that would help? Because if you run Guix outside an Docker container, you will not have the issue. The main issue is how the Docker "filesystem" is designed. All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 9:37 ` zimoun @ 2020-05-31 18:30 ` Stephen Scheck 2020-05-31 18:51 ` zimoun 2020-05-31 21:04 ` Chris Marusich 0 siblings, 2 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-31 18:30 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Sun, May 31, 2020 at 5:37 AM zimoun <zimon.toutoune@gmail.com> wrote: > No, it is how Docker is designed. Maybe the terminology "layer" is > not the Docker one but when the images are chained, one cannot remove > the data of the previous layer of the total image. > I'm not disagreeing with that, but IF any of the store files resulting from `guix pull` are ephemeral (i.e. intermediate build results not anchored to a profile) AND guix GC worked inside the container, my approach might still work - yes there would be image and layers growth but it might be small enough not to care between periodic image rebases. But I'm starting to doubt that, or at least it is difficult to quantify with the GC issues. > Because if you run Guix outside an Docker container, you will not have > the issue. The main issue is how the Docker "filesystem" is designed. > Actually, there might be another way around this, still avoiding the need for a custom Runner, for example mounting /var/guix and /gnu/store into the container instead of belonging to it. If done that way, layer accumulation wouldn't be an issue, and maybe GC between layers neither. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 18:30 ` Stephen Scheck @ 2020-05-31 18:51 ` zimoun 2020-05-31 19:43 ` Stephen Scheck 2020-05-31 21:04 ` Chris Marusich 1 sibling, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-31 18:51 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear Stephen, again :-) On Sun, 31 May 2020 at 20:30, Stephen Scheck <singularsyntax@gmail.com> wrote: >> No, it is how Docker is designed. Maybe the terminology "layer" is >> not the Docker one but when the images are chained, one cannot remove >> the data of the previous layer of the total image. > > I'm not disagreeing with that, but IF any of the store files resulting from `guix pull` > are ephemeral (i.e. intermediate build results not anchored to a profile) AND guix > GC worked inside the container, my approach might still work - yes there would be > image and layers growth but it might be small enough not to care between periodic image > rebases. But I'm starting to doubt that, or at least it is difficult to quantify with the > GC issues. Currently, if I read correctly, your images are chained with something like, --8<---------------cut here---------------start------------->8--- GUIX_PATH=/root/.config/guix/current/bin $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback /root/.config/guix/current/bin/guix gc --delete-generations /root/.config/guix/current/bin/guix gc --collect-garbage /root/.config/guix/current/bin/guix gc --optimize docker commit --8<---------------cut here---------------end--------------->8--- and instead you should do something like --8<---------------cut here---------------start------------->8--- GUIX_PATH=/root/.config/guix/current/bin $GUIX_PATH/guix pull --branch=$CI_COMMIT_REF_NAME--fallback /root/.config/guix/current/bin/guix pull -d /root/.config/guix/current/bin/guix package -d /root/.config/guix/current/bin/guix gc docker commit docker export | docker import --8<---------------cut here---------------end--------------->8--- Maybe the explosion of size would be slower. If you do, please report here the number after say 12 generations; I am really interesting. ;-) >> Because if you run Guix outside an Docker container, you will not have >> the issue. The main issue is how the Docker "filesystem" is designed. > > Actually, there might be another way around this, still avoiding the need for a custom Runner, > for example mounting /var/guix and /gnu/store into the container instead of belonging to it. If > done that way, layer accumulation wouldn't be an issue, and maybe GC between layers neither. Yes, it is one solution. All the question seems to be: - what is the purpose of such Docker image? Which usage? - what infrastructure do you have at hand to build the Docker images? Thank you for raising all this Docker image production question. :-) All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 18:51 ` zimoun @ 2020-05-31 19:43 ` Stephen Scheck 2020-05-31 23:27 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-31 19:43 UTC (permalink / raw) To: zimoun; +Cc: help-guix On Sun, May 31, 2020 at 2:51 PM zimoun <zimon.toutoune@gmail.com> wrote: > Maybe the explosion of size would be slower. If you do, please report > here the number after say 12 generations; I am really interesting. ;-) > Now I'm confused - in your reply to Vincent, it seemed that there were still problems with the GC removing dead store items even after you did an export/re-import with the entire image on a single Docker layer? Or did I misread it? > All the question seems to be: > - what is the purpose of such Docker image? Which usage? > - what infrastructure do you have at hand to build the Docker images? > Well, Guix is interesting, and there aren't ready-made containers for it out there like there are for Ubuntu, Fedora, etc. if you have a need to do some task in that kind of environment, or just to play around, or see how the system is evolving. Also, I have been playing around with Guile lately and I thought Guix might be a better fit for that kind of work than other environments where Guile is largely neglected (Guix is *written* in Guile, after all). And I happened to be learning GitLab CI/CD around the same time, and it seemed like a good opportunity to experiment with both at once, so I thought, why not? :-) Which infrastructure - well, GitLab CI/CD, with fixed compute limits :-) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 19:43 ` Stephen Scheck @ 2020-05-31 23:27 ` zimoun 0 siblings, 0 replies; 37+ messages in thread From: zimoun @ 2020-05-31 23:27 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear Stephen, On Sun, 31 May 2020 at 21:43, Stephen Scheck <singularsyntax@gmail.com> wrote: > On Sun, May 31, 2020 at 2:51 PM zimoun <zimon.toutoune@gmail.com> wrote: >> >> Maybe the explosion of size would be slower. If you do, please report >> here the number after say 12 generations; I am really interesting. ;-) > > Now I'm confused - in your reply to Vincent, it seemed that there were still problems > with the GC removing dead store items even after you did an export/re-import with the > entire image on a single Docker layer? Or did I misread it? The export/import trick cut by half the size of "your guix-bootstrap" image. So even I am not convince that it will fix the issue, I think that my proposal is the correct thing to do to delete dead items in the store. Basically, after the pull, you need to delete all the other generations of /root/.config/guix/current (expected one) by "guix pull -d", then to delete all the generations of /root/.guix-profile with "guix package -d" and last garbage collect. For sure, it will not delete the items coming from previous layer but it will delete all the dead items of the current session. And "docker export | docker import" could remove other items -- even if in the case of "install/remove hello" it is not work cleanly, some items are deleted. Well, it is just to be complete with your approach. >> All the question seems to be: >> - what is the purpose of such Docker image? Which usage? >> - what infrastructure do you have at hand to build the Docker images? > > Well, Guix is interesting, and there aren't ready-made containers for it out there like there are for > Ubuntu, Fedora, etc. if you have a need to do some task in that kind of environment, or just to play > around, or see how the system is evolving. Also, I have been playing around with Guile lately and > I thought Guix might be a better fit for that kind of work than other environments where Guile is > largely neglected (Guix is *written* in Guile, after all). And I happened to be learning GitLab CI/CD > around the same time, and it seemed like a good opportunity to experiment with both at once, > so I thought, why not? :-) Which infrastructure - well, GitLab CI/CD, with fixed compute limits :-) Yeah, "ready-made container" could be really cool! AFAIK, no one took the time to implement and document. There are various attempts but not always reported on help-guix or guix-devel. Well, the answer of these 2 questions implies different strategies. For example, I am running Guix on the top of Debian so I basically use only the package manager. And I use this infrastructure to produce Docker images containing apps running "guix pack -f docker -m manifest.scm". Because I am interested in Reproducible Science, I also use "guix time-machine -C channel.scm -- pack -f docker -m manifest.scm". However, the Docker images contains only applications (R or Python with bunch of packages) and the "user" cannot use these images to extend them running "guix install foo"; because I want to track reproducibility so the only way is to go by 'manifest.scm' and 'channel.scm' files. Another example is the Dockerfile way. Based on any image (Alpine or Debian), I build an image containing the Guix package manager -- roughly speaking as you are doing with your image 'guix-bootstrap'. Then I use this image in 2 different ways: with a Dockerfile or directly. In both cases, it always starts by "guix pull. And I never chain the images -- I mean only 3 "layers" at maximum: 0-debian, 1-guix-fresh and 2-guix-lastest. Well, I have never run "guix gc" inside a Docker image. Last, I have never played with "guix system docker-image". But in the context of GitlabCI, what I would try should be something like: CONTAINER=`docker run --detach --privileged $OLD` docker exec $CONTAINER guix pull docker exec $CONTAINER guix system docker-image --root=/image.tar stuff.scm docker cp $CONTAINER:$IMG $NEW with maybe instead of "guix pull" this bazooka: docker exec $CONTAINER guix install git docker exec $CONTAINER git clone docker exec $CONTAINER guix environment guix docker exec $CONTAINER ./bootstrap && ./configure --localstatedir=/var/ docker exec $CONTAINER make -j docker exec $CONTAINER ./pre-inst-env guix system docker-image Well, use "guix system docker-image" inside a Docker image already containing Guix; this avoid the layer issue, isn't it? But as I said elsewhere, I am not really familiar with Docker so my words are probable irrelevant. All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 18:30 ` Stephen Scheck 2020-05-31 18:51 ` zimoun @ 2020-05-31 21:04 ` Chris Marusich 2020-06-01 0:37 ` zimoun 1 sibling, 1 reply; 37+ messages in thread From: Chris Marusich @ 2020-05-31 21:04 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix [-- Attachment #1: Type: text/plain, Size: 11168 bytes --] Stephen Scheck <singularsyntax@gmail.com> writes: > IF any of the store files resulting from `guix pull` are ephemeral > (i.e. intermediate build results not anchored to a profile) AND guix > GC worked inside the container, my approach might still work - yes > there would be image and layers growth but it might be small enough > not to care between periodic image rebases. But I'm starting to doubt > that, or at least it is difficult to quantify with the GC issues. I think you're right about it being difficult to quantify the GC issues. Basically, when you run "guix pull", the current Guix will "build" (i.e., maybe download via substitutes, maybe build from source) the new Guix, which puts it into the store, and updates the profile symlinks to make it current. In the process of doing this, some intermediate builds might be performed if substitutes are not available. Although the new Guix will remain live in the store after the profile symlinks are updated to make it current, (1) intermediate results might be left dead after "guix pull" is finished, and (2) if the old Guix is sufficiently different from the new Guix, it will also become dead after the symlinks that were keeping it live are removed. So, the amount of garbage that will be left over depends on a few factors, like whether substitutes were available, and how different the new Guix is from the old one. It can also depend on how the guix-daemon has been started (see "--gc-keep-outputs" and --gc-keep-derivations" in the "Invoking guix-daemon" section of the manual). In the case of your Docker images, most (all?) of the garbage is coming from case (2) above: as Guix changes, the old Guix will be made dead and GC'd (hypothetically, let's suppose GC is working), but it will still exist on prior layers, since it came from a prior layer. As for case (1), the intermediate results, I think they are not contributing to your image size for two reasons: substitutes are probably available, and even if they weren't available, the intermediates would probably appear during "guix pull", which means they'd be on the top layer and would be GC'd, so they wouldn't be included in any layer of the next image. The fact that the biggest dead paths in your latest image consist entirely of store paths that look suspiciously like they came from prior Guix installations is further evidence in support of this theory. --8<---------------cut here---------------start------------->8--- root@guix /# du -Phc $(guix gc --list-dead) 2>/dev/null | sort -hk 1,1 | tail finding garbage collector roots... determining live/dead paths... 187M /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules/lib/guile/3.0/site-ccache 187M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib 187M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile 187M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile/3.0 187M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules/lib/guile/3.0/site-ccache 194M /gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules 198M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules 210M /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules 210M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules 3.0G total root@guix /# --8<---------------cut here---------------end--------------->8--- These "guix-HASH-modules" directories, for example, are used as part of each Guix installation: --8<---------------cut here---------------start------------->8--- root@guix /# realpath ~/.config/guix/current/share/guile /gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules/share/guile root@guix /# --8<---------------cut here---------------end--------------->8--- Each of them has a total closure size of almost 500 MB, although since they might share some references, each one individually is adding "only" about 200 MB. --8<---------------cut here---------------start------------->8--- root@guix /# guix size /gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules store item total self /gnu/store/mj6pf6nf0kf03nhh7bmpc6m43v6knq6m-guix-a5374cde9-modules 485.9 206.9 42.6% /gnu/store/hkmsljl2sf4nk96b35f0bmfkr2lqanfq-guix-packages-base 105.7 105.7 21.8% /gnu/store/s7izb7j0s5rzcq297nd7ba9sfiqh5zmz-guix-system 43.2 43.2 8.9% /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31 38.4 36.7 7.6% /gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib 71.0 32.6 6.7% /gnu/store/wcv5mscivggkygnz68nn2671fr3kapjc-guix-packages-base-source 19.4 19.4 4.0% /gnu/store/6zygksmvzcq92xf65cna91dbf7a4zblh-guix-extra 19.4 19.4 4.0% /gnu/store/a7wiy24mmcilbqp39pl0jdlw10vbvavb-guix-cli 8.0 7.3 1.5% /gnu/store/f6k9b4grrfpip4h5lrmpnsnn2gqziihr-guix-system-tests 4.6 4.6 1.0% /gnu/store/gbrd1laxsncb9zd218pyglisxyxymmbd-guix-system-source 1.9 1.9 0.4% /gnu/store/mmhimfwmmidf09jw1plw3aw1g1zn2nkh-bash-static-5.0.16 1.6 1.6 0.3% /gnu/store/5lr8miawrk380zw8yjy0crcl6vcs10s3-guix-extra-source 1.5 1.5 0.3% /gnu/store/pwcp239kjf7lnj5i4lkdzcfcxwcfyk72-bash-minimal-5.0.16 39.4 1.0 0.2% /gnu/store/r7k859hmcnkazf492fasqvk25jflnfk6-xz-5.2.4 73.0 0.9 0.2% /gnu/store/bhs4rj58v8j1narb2454raan2ps38xd8-grep-3.4 72.9 0.8 0.2% /gnu/store/z0572147hprpbjrcjqkgrv3f80ip2klx-guix-cli-source 0.7 0.7 0.1% /gnu/store/a9f7wmc75hbpg520phw9z4l9asm3qvsw-bzip2-1.0.8 72.5 0.4 0.1% /gnu/store/7y0nin2d0j46j26a1n46bl5zl3px0zvz-guix-system-tests-source 0.3 0.3 0.1% /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11 71.2 0.2 0.0% /gnu/store/jqr5bz89gfwhxcndnhq333dyclvkq7ws-lzlib-1.11 71.2 0.2 0.0% /gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10 73.1 0.2 0.0% /gnu/store/kfj1lc84v50imn3raijgih4salilmf1a-guix-packages-base-modules 125.2 0.0 0.0% /gnu/store/lvszhqs57scb2ax18l2nrn9dwiyf6iza-guix-system-tests-modules 4.9 0.0 0.0% /gnu/store/lr65f259z1730p7bvplsj9k6yvbkyh39-guix-system-modules 45.1 0.0 0.0% /gnu/store/nk1x6cdif8pd9vi04nzxfqinh0ag06am-guix-extra-modules 20.9 0.0 0.0% /gnu/store/s6vlfscnfvnrlv3yfag6qsy5j6c9pxqb-guix-cli-modules 8.0 0.0 0.0% total: 485.9 MiB root@guix /# --8<---------------cut here---------------end--------------->8--- And there are still other components adding space each time you run "guix pull", like the "guix-system" component, for example: --8<---------------cut here---------------start------------->8--- root@guix /# du -Phc $(guix gc --list-dead | grep guix-system) 2>/dev/null | sort -hk 1,1 | tail finding garbage collector roots... determining live/dead paths... 44M /gnu/store/qhbk7g8z97m37iak1s1yn2my82gv0lj5-guix-system/gnu 44M /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system 44M /gnu/store/slwkzcmg6r1lr9a16x3krd2ax384p8wr-guix-system/gnu 44M /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system 44M /gnu/store/vwzk618h1wxy6z9i06xnhnxj4gvhkiss-guix-system/gnu 44M /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system 44M /gnu/store/w47fgv8p2hvaqdwywymwvm0qlh4gw0ih-guix-system/gnu 44M /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system 44M /gnu/store/zf67wb6c0s97vwmywjq09hy9jq0w5mmi-guix-system/gnu 523M total root@guix /# --8<---------------cut here---------------end--------------->8--- Anyway, the point is: you begin with a previous image. The previous image already has these store paths from the previous installation of Guix. Therefore, they exist on the previous layer. Because they exist on the previous layer, they cannot be removed from the Docker image, and they are carried forward in that previous layer, to all new images. Regardless of any changes to guix-daemon we might make, the way in which you build your images will cause them to grow by hundreds of megabytes every day. > Actually, there might be another way around this, still avoiding the > need for a custom Runner, for example mounting /var/guix and > /gnu/store into the container instead of belonging to it. If done that > way, layer accumulation wouldn't be an issue, and maybe GC between > layers neither. This sounds like a great idea, actually! "The right way" to do Docker containers is to have a single process per container, and to not store state in the Docker container. We're violating that principle on both counts when we run an entire GNU/Linux distribution inside a Docker container, especially since the guix-daemon is all about managing the "state" of /var/guix and /gnu/store. If you can somehow move that "state" into a Docker volume instead of the container itself, that would definitely be an improvement. It may be tricky, though, since if guix-daemon sees stuff in /gnu/store that is inconsistent with its database in /var/guix, bad things can happen. So you'll have to ensure they remain consistent with one another. >> Besides store items, I noticed two other things about your images: >> >> - The contents of /var is growing slowly without bound, but it isn't >> nearly as bad as the contents of /gnu/store. This is probably due to >> log files; consider pruning them. >> > > These are presumably OK to delete, without any special handling for Guix? I think the answer is "probably", but I would stop guix-daemon first. Other processes may be using /var, too, so I would stop them, also. >> - Your script runs "docker commit" while guix-daemon (and other >> programs) are still running. To ensure the guix-daemon's database (or >> other things) does not become corrupt, consider terminating all >> processes before committing the new image. >> > > `docker commit` pauses the container (unless you tell it not to) ... > although I guess that could still cause problems if Guix store writes > aren't implemented in an atomic way. I'm not sure what "pause" means in the Docker documentation, but since I can run "docker commit" while running a shell in the container, and the shell doesn't get terminated, it clearly doesn't terminate the processes. It might be safe do just pause the container when committing, but it's definitely safe if you gracefully shut down all processes first. This definitely ensures that things like databases are left in known good states when committing the image. What I'm saying is that, yeah sure, you can probably get away with not gracefully shutting down the processes. Similarly, you can often get away with pulling the power cord out of your computer because a lot of software and storage is pretty robust by default nowadays. However, it increases the risk of encountering a problem like data corruption, so it's better to shut things down gracefully if you can. -- Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 21:04 ` Chris Marusich @ 2020-06-01 0:37 ` zimoun 0 siblings, 0 replies; 37+ messages in thread From: zimoun @ 2020-06-01 0:37 UTC (permalink / raw) To: Chris Marusich; +Cc: help-guix, Stephen Scheck Hi Chris, On Sun, 31 May 2020 at 23:04, Chris Marusich <cmmarusich@gmail.com> wrote: > Anyway, the point is: you begin with a previous image. The previous > image already has these store paths from the previous installation of > Guix. Therefore, they exist on the previous layer. Because they exist > on the previous layer, they cannot be removed from the Docker image, and > they are carried forward in that previous layer, to all new images. > Regardless of any changes to guix-daemon we might make, the way in which > you build your images will cause them to grow by hundreds of megabytes > every day. I agree that the core of the issue is the Docker layers filesystem. And it cannot be fixed on the Guix side. Therefore, even it is *bad* and dangerous, what about --8<---------------cut here---------------start------------->8--- root@guix /# guix gc root@guix /# guix gc --list-dead | xargs rm -rf root@guix /# exit $ docker stop $ docker commit $ docker export $ID | docker import - guix-new > --8<---------------cut here---------------end--------------->8--- ? Well, it cannot be recommended because it is dangerous. But it somehow bypasses the guix-daemon and "hard-removes" the items and then, as Vincent suggested, 'docker export | docker import' flattens the layers so the dead items are then really gone in the new Docker image. And I have shown [1], "guix gc+export/import" does not lead to an image where the dead items are gone, I have not mistaken. [1] http://issues.guix.gnu.org/41607#6 WDYT? Cheers, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-29 23:30 ` Chris Marusich 2020-05-29 23:55 ` zimoun @ 2020-05-30 17:02 ` Stephen Scheck 2020-05-31 4:31 ` Chris Marusich 2020-05-31 8:24 ` zimoun 1 sibling, 2 replies; 37+ messages in thread From: Stephen Scheck @ 2020-05-30 17:02 UTC (permalink / raw) To: Chris Marusich; +Cc: help-guix On Fri, May 29, 2020 at 7:31 PM Chris Marusich <cmmarusich@gmail.com> wrote: > > Could it be that you are accumulating layers without bound? > > > https://developers.redhat.com/blog/2016/03/09/more-about-docker-images-size/ > > Since Docker images are built up of immutable layers, if you build your > image from an existing base image, I'm not sure that it's possible to > produce a new image that is smaller than the base image. Basically, > even if you run "guix gc" to remove dead store items, they will still > exist on a prior layer, so the size of the new image won't decrease. > And since you're installing new things, the size will actually increase. > If you repeat this process by using the new image as an input for yet > another build, I think you will accumulate layers and storage space > without bound. > Layers certainly add some image size overhead, but I don't think that is the culprit here. And producing a smaller image isn't really the goal, it's just to keep image growth reasonable between each incremental guix pull. Dead store items would only exist on previous layers if they make it there in the first place. As has been demonstrated on previous posts in the thread, I believe the problem is some guix bug which prevents deletion of garbage-collected store items. What is reasonable growth? That is hard to answer, but I would expect it be roughly proportional to the growth of a guix installation over time in a non-Docker environment, taking some constant amount of layer overhead as a given. I don't really know what `guix pull` does, but I think it's something along these lines: 1) the global package index is brought up-to-date; 2) Any packages which are installed in the profile doing the pull are upgraded to newer versions if they've been updated. So day-to-day, particularly in the case where there have been no updates to packages installed in the profile, size growth should be very small. Periodic "rebasing" of incremental Docker images might still be helpful from time to time using one of the layer squashing tools out there, but I don't think it should be necessary on a daily basis. Also, layers are helpful in the case of someone pulling down daily Guix Docker images on a frequent basis, because then only the new, ideally small layers need to be downloaded, whereas if you rebase for every image build, you'd have to download the entire image every day. The boundless layer accumulation you point out shouldn't be a problem with the way that I'm building the images. When you do a `RUN <command>` inside a Dockerfile, it is essentially doing `docker exec <container> <command>` followed by `docker commit <container>`. It is the commit step which produces a new layer. You can think of a RUN command inside a Dockerfile as kind of a single-step transaction, which incorporates the net file system changes into the image. My build script issues several `docker exec <container> <command>` sequences, followed by a `docker commit <container>`. Intermediate changes to the container file system prior to the commit do not generate layers, only the net changes after the commit. You can convince yourself of this by doing something like the following: docker run <some-linux-image> docker exec <container-id> dd if=/dev/urandom of=/RANDOM-DATA bs=1048576 count=1024 docker commit <container-id> docker exec <container-id> rm /RANDOM-DATA docker commit <container-id> You'll end up with two new images - the first one should be about 1 GB larger than the base image, the second one the same size. FYI, Guix itself can build Docker images from scratch - no base image > required! It can even build a Docker image of a full-blown Guix System > from scratch. Sorry if you already knew that - I just wanted to point > it out in case you didn't! > Yes, thanks, I know - if you read through the thread you'll see that I make reference to `guix system docker-image [...]`. -SS ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-30 17:02 ` Stephen Scheck @ 2020-05-31 4:31 ` Chris Marusich 2020-05-31 9:08 ` zimoun 2020-05-31 17:50 ` Stephen Scheck 2020-05-31 8:24 ` zimoun 1 sibling, 2 replies; 37+ messages in thread From: Chris Marusich @ 2020-05-31 4:31 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix [-- Attachment #1: Type: text/plain, Size: 3488 bytes --] Hi Stephen, Stephen Scheck <singularsyntax@gmail.com> writes: > Layers certainly add some image size overhead, but I don't think that > is the culprit here. > Also, layers are helpful in the case of someone pulling down daily > Guix Docker images on a frequent basis, because then only the new, > ideally small layers need to be downloaded, whereas if you rebase for > every image build, you'd have to download the entire image every day. That is true, but suppose I have the following 3 images: - Image A: A base image created in January 2020. - Image B: Based on A, and I ran "guix pull" in February 2020. - Image C: Based on A, and I ran "guix pull" in June 2020. I would guess that the size difference between A and B is approximately the same as the difference between A and C. It'll be different, of course, but generally the size difference between A and C should not grow linearly with time, since "guix pull" is only going to install at most the total closure of things necessary to build and run Guix, which doesn't increase much in size as time goes on. However, when you daisy-chain the images every day, the image size will grow linearly with time because the contents of all the previous layers is carried forward. > My build script issues several `docker exec <container> <command>` > sequences, followed by a `docker commit <container>`. Intermediate > changes to the container file system prior to the commit do not > generate layers, only the net changes after the commit. There are two problems here. One is that the image size grows without bound. The other is that guix-daemon is failing to GC store items in the Docker container. Although they are both concerning, the latter is not the cause of the former. If you install new store items (e.g., via "guix pull"), make them dead, and then GC them, all in the same container before running "docker commit", then I agree: those GC'd store items would not persist in a layer anywhere. However, I don't think that's what's happening here. Sure, there might be a few store items like this, but in practice, there will be many store items from the previous image which began live but became dead when you ran "guix pull" and deleted your old profile generations. It is those store items that are adding the most space to your image. Besides store items, I noticed two other things about your images: - The contents of /var is growing slowly without bound, but it isn't nearly as bad as the contents of /gnu/store. This is probably due to log files; consider pruning them. - Your script runs "docker commit" while guix-daemon (and other programs) are still running. To ensure the guix-daemon's database (or other things) does not become corrupt, consider terminating all processes before committing the new image. > FYI, Guix itself can build Docker images from scratch - no base image >> required! It can even build a Docker image of a full-blown Guix System >> from scratch. Sorry if you already knew that - I just wanted to point >> it out in case you didn't! >> > > Yes, thanks, I know - if you read through the thread you'll see that I make > reference to `guix system docker-image [...]`. I apologize for not reading your thread more closely to begin with. I took a closer looks, and I think I can explain what is going on now. Please check the bug report and reply there if anything is unclear. -- Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 4:31 ` Chris Marusich @ 2020-05-31 9:08 ` zimoun 2020-05-31 17:50 ` Stephen Scheck 1 sibling, 0 replies; 37+ messages in thread From: zimoun @ 2020-05-31 9:08 UTC (permalink / raw) To: Chris Marusich; +Cc: help-guix, Stephen Scheck Hi Chris, On Sun, 31 May 2020 at 06:32, Chris Marusich <cmmarusich@gmail.com> wrote: > I would guess that the size difference between A and B is approximately > the same as the difference between A and C. It'll be different, of > course, but generally the size difference between A and C should not > grow linearly with time, since "guix pull" is only going to install at > most the total closure of things necessary to build and run Guix, which > doesn't increase much in size as time goes on. However, when you > daisy-chain the images every day, the image size will grow linearly with > time because the contents of all the previous layers is carried forward. Exactly and it is not specific to Guix but it is how Docker works, if I understand correctly. > - Your script runs "docker commit" while guix-daemon (and other > programs) are still running. To ensure the guix-daemon's database (or > other things) does not become corrupt, consider terminating all > processes before committing the new image. Do you think the GC issue comes from this? Because "docker stop" and then "docker commit" does not change the issue. The GC is still confused by trying to delete items than are not in the store. Roughly speaking, "guix gc" says it removes items of size 0, but then "guix gc --references" says the path does not exist. --8<---------------cut here---------------start------------->8--- / # /root/.config/guix/current/bin/guix gc /root/.config/guix/current/bin/guix gc [...] / # /root/.config/guix/current/bin/guix gc --list-dead | grep hello /root/.config/guix/current/bin/guix gc --list-dead | grep hello finding garbage collector roots... determining live/dead paths... /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10 / # /root/.config/guix/current/bin/guix gc --references /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10 /root/.config/guix/current/bin/guix gc --references /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10 guix gc: error: path `/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10' is not valid --8<---------------cut here---------------end--------------->8--- > I apologize for not reading your thread more closely to begin with. I > took a closer looks, and I think I can explain what is going on now. > Please check the bug report and reply there if anything is unclear. Ah sorry, maybe you always addressed these questions in the bug report. Cheers, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 4:31 ` Chris Marusich 2020-05-31 9:08 ` zimoun @ 2020-05-31 17:50 ` Stephen Scheck 2020-05-31 18:33 ` zimoun 1 sibling, 1 reply; 37+ messages in thread From: Stephen Scheck @ 2020-05-31 17:50 UTC (permalink / raw) To: Chris Marusich; +Cc: help-guix On Sun, May 31, 2020 at 12:31 AM Chris Marusich <cmmarusich@gmail.com> wrote: > > Also, layers are helpful in the case of someone pulling down daily > > Guix Docker images on a frequent basis, because then only the new, > > ideally small layers need to be downloaded, whereas if you rebase for > > every image build, you'd have to download the entire image every day. > > That is true, but suppose I have the following 3 images: > > - Image A: A base image created in January 2020. > - Image B: Based on A, and I ran "guix pull" in February 2020. > - Image C: Based on A, and I ran "guix pull" in June 2020. > > I would guess that the size difference between A and B is approximately > the same as the difference between A and C. It'll be different, of > course, but generally the size difference between A and C should not > grow linearly with time, since "guix pull" is only going to install at > most the total closure of things necessary to build and run Guix, which > doesn't increase much in size as time goes on. However, when you > daisy-chain the images every day, the image size will grow linearly with > time because the contents of all the previous layers is carried forward. > > > My build script issues several `docker exec <container> <command>` > > sequences, followed by a `docker commit <container>`. Intermediate > > changes to the container file system prior to the commit do not > > generate layers, only the net changes after the commit. > > There are two problems here. One is that the image size grows without > bound. The other is that guix-daemon is failing to GC store items in > the Docker container. Although they are both concerning, the latter is > not the cause of the former. > > If you install new store items (e.g., via "guix pull"), make them dead, > and then GC them, all in the same container before running "docker > commit", then I agree: those GC'd store items would not persist in a > layer anywhere. However, I don't think that's what's happening here. > Sure, there might be a few store items like this, but in practice, there > will be many store items from the previous image which began live but > became dead when you ran "guix pull" and deleted your old profile > generations. It is those store items that are adding the most space to > your image. > Yes, I get this. I never expected the container to stay constant in size, but I was hoping daily pulls would result in relatively low image growth. It's not clear to me if any of the items which should get GC'd but don't are just ephemeral build results, in which case growth might be tolerable with an occasional rebase (perhaps monthly or bi-monthly). But I'm now starting to doubt my whole approach because it seems like there are some fundamental GC problems with running a live Guix system inside a container. > Besides store items, I noticed two other things about your images: > > - The contents of /var is growing slowly without bound, but it isn't > nearly as bad as the contents of /gnu/store. This is probably due to > log files; consider pruning them. > These are presumably OK to delete, without any special handling for Guix? > - Your script runs "docker commit" while guix-daemon (and other > programs) are still running. To ensure the guix-daemon's database (or > other things) does not become corrupt, consider terminating all > processes before committing the new image. > `docker commit` pauses the container (unless you tell it not to) ... although I guess that could still cause problems if Guix store writes aren't implemented in an atomic way. Thanks, -SS ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 17:50 ` Stephen Scheck @ 2020-05-31 18:33 ` zimoun 0 siblings, 0 replies; 37+ messages in thread From: zimoun @ 2020-05-31 18:33 UTC (permalink / raw) To: Stephen Scheck; +Cc: help-guix Dear Stephen, On Sun, 31 May 2020 at 19:51, Stephen Scheck <singularsyntax@gmail.com> wrote: > But I'm now starting to doubt my whole approach because it seems like > there are some fundamental GC problems with running a live Guix system > inside a container. I do not think it is "some fundamental GC problems with running a live Guix system inside a container" but it is a fundamental Docker filesystem design which is incompatible with your approach. As I have tried to show, the issue is: $ CONTAINER=`docker run --detach --tty --privileged image0` $ docker exec --interactive --tty $CONTAINER /bin/sh / # dd if=/dev/urandom of=/data1 bs=1234567 count=1024 $ HASH=`docker commit $CONTAINER` && docker tag $HASH image1 $ CONTAINER=`docker run --detach --tty --privileged image1` $ docker exec --interactive --tty $CONTAINER /bin/sh / # rm /data1 / # dd if=/dev/urandom of=/data2 bs=1234567 count=1024 $ HASH=`docker commit $CONTAINER` && docker tag $HASH image2 $ CONTAINER=`docker run --detach --tty --privileged image2` $ docker exec --interactive --tty $CONTAINER /bin/sh / # rm /data2 / # dd if=/dev/urandom of=/data3 bs=1234567 count=1024 $ HASH=`docker commit $CONTAINER` && docker tag $HASH image3 etc. And all the resulting images are bigger and bigger. Do I misread something? Maybe "docker export | docker import" should help to keep the size "reasonable" even if I am not convinced... Well, thank you for raising the issue, because I have learnt interesting stuff about Docker. :-) And I do not have yet something concrete to say about your initial issue, sorry. All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-30 17:02 ` Stephen Scheck 2020-05-31 4:31 ` Chris Marusich @ 2020-05-31 8:24 ` zimoun 2020-05-31 10:50 ` Vincent Legoll 1 sibling, 1 reply; 37+ messages in thread From: zimoun @ 2020-05-31 8:24 UTC (permalink / raw) To: Stephen Scheck, 41607; +Cc: help-guix Dear Stephen, Follow ups of https://lists.gnu.org/archive/html/help-guix/2020-05/msg00249.html and bug#41607 CC http://issues.guix.gnu.org/41607 On Sat, 30 May 2020 at 19:02, Stephen Scheck <singularsyntax@gmail.com> wrote: > You can convince yourself of this by doing something like the following: > > docker run <some-linux-image> > docker exec <container-id> dd if=/dev/urandom of=/RANDOM-DATA > bs=1048576 count=1024 > docker commit <container-id> > docker exec <container-id> rm /RANDOM-DATA > docker commit <container-id> It does not convince myself and maybe I am doing wrongly but it is not what I am observing for an example with more than 2 'commits'. Here my session, based on your images rename "fresh" because it will happen on any image. --8<---------------cut here---------------start------------->8--- $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE fresh latest c36cef8306d5 3 weeks ago 1.06GB singularsyntax/guix-bootstrap 1.1.0-alpine-3.11 c36cef8306d5 3 weeks ago 1.06GB $ CONTAINER=`docker run --detach --tty --privileged fresh` $ docker exec --interactive --tty $CONTAINER /bin/sh / # dd if=/dev/urandom of=/DATA bs=1234567 count=1024 dd if=/dev/urandom of=/DATA bs=1234567 count=1024 1024+0 records in 1024+0 records out / # exit exit $ HASH=`docker commit $CONTAINER` && docker tag $HASH add-data $ docker stop $CONTAINER && docker rm $CONTAINER cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7 cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7 $ CONTAINER=`docker run --detach --tty --privileged add-data` $ docker exec --interactive --tty $CONTAINER /bin/sh / # rm /DATA rm /DATA / # dd if=/dev/urandom of=/OTHER bs=1234567 count=1024 dd if=/dev/urandom of=/OTHER bs=1234567 count=1024 1024+0 records in 1024+0 records out / # exit exit $ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-data-add-other $ docker stop $CONTAINER && docker rm $CONTAINER 93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256 93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256 $ CONTAINER=`docker run --detach --tty --privileged rm-data-add-other` $ docker exec --interactive --tty $CONTAINER /bin/sh / # rm /OTHER rm /OTHER / # exit exit $ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-other $ docker stop $CONTAINER && docker rm $CONTAINER 469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4 469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE rm-other latest b80d300aa755 23 seconds ago 3.59GB rm-data-add-other latest de551eac1d55 About a minute ago 3.59GB add-data latest 6a563daccccd 3 minutes ago 2.32GB fresh latest c36cef8306d5 3 weeks ago 1.06GB singularsyntax/guix-bootstrap 1.1.0-alpine-3.11 c36cef8306d5 3 weeks ago 1.06GB $ CONTAINER=`docker run --detach --tty --privileged rm-other` $ docker exec --interactive --tty $CONTAINER /bin/sh / # ls / ls / bin dev etc gnu home lib media mnt opt proc root run sbin srv sys tmp usr var / # exit exit --8<---------------cut here---------------end--------------->8--- > You'll end up with two new images - the first one should be about 1 GB > larger than the base image, > the second one the same size. As you see, the image 'rm-other' does not container either /DATA or /OTHER and its size is not the same than the initial one 'fresh'. So I do not know if the correct Docker terminology is "layer" because the issue is definitely on the Docker side and not on the Guix side. Cheers, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 8:24 ` zimoun @ 2020-05-31 10:50 ` Vincent Legoll 2020-05-31 17:58 ` zimoun 0 siblings, 1 reply; 37+ messages in thread From: Vincent Legoll @ 2020-05-31 10:50 UTC (permalink / raw) To: zimoun, Stephen Scheck, 41607; +Cc: help-guix Hello, maybe you can try: docker export <CONTAINER ID> | docker import - img_name This should flatten the layers back to a single one. -- Vincent Legoll ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Guix Docker image inflation 2020-05-31 10:50 ` Vincent Legoll @ 2020-05-31 17:58 ` zimoun 0 siblings, 0 replies; 37+ messages in thread From: zimoun @ 2020-05-31 17:58 UTC (permalink / raw) To: Vincent Legoll; +Cc: 41607, help-guix, Stephen Scheck Hi Vincent, On Sun, 31 May 2020 at 12:50, Vincent Legoll <vincent.legoll@gmail.com> wrote: > docker export <CONTAINER ID> | docker import - img_name I do not know if it really works here. Maybe I am doing incorrectly... --8<---------------cut here---------------start------------->8--- $ docker images --format "{{.Size}}\t{{.Repository}}" 959MB 4reexport 960MB 3clean 960MB 2remove-hello 959MB 1install-hello 578MB 0new-fresh 1.06GB fresh 1.06GB singularsyntax/guix-bootstrap --8<---------------cut here---------------end--------------->8--- Well, and the interesting part is: --8<---------------cut here---------------start------------->8--- $ CONTAINER=`docker run --detach --tty --privileged 4reexport` $ docker exec --interactive --tty $CONTAINER /bin/sh / # /root/.config/guix/current/bin/guix gc --list-live | grep hello /root/.config/guix/current/bin/guix gc --list-live | grep hello finding garbage collector roots... determining live/dead paths... / # /root/.config/guix/current/bin/guix gc --list-dead | grep hello /root/.config/guix/current/bin/guix gc --list-dead | grep hello finding garbage collector roots... determining live/dead paths... /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10 / # /root/.config/guix/current/bin/guix gc --references /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10 /root/.config/guix/current/bin/guix gc --references /gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-he llo-2.10 guix gc: error: path `/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10' is not valid / # exit --8<---------------cut here---------------end--------------->8--- Just for the record, the commands run: --8<---------------cut here---------------start------------->8--- $ CONTAINER=`docker run --detach --tty --privileged fresh` $ CMD='CMD "/root/.config/guix/current/bin/guix-daemon" "--build-users-group=guixbuild"' $ docker export $CONTAINER \ | docker import --change $CMD - 0new-fresh $ CONTAINER=`docker run --detach --tty --privileged 0new-fresh` $ docker exec --interactive --tty $CONTAINER /bin/sh / # /root/.config/guix/current/bin/guix install hello / # exit $ docker stop $CONTAINER $ HASH=`docker commit $CONTAINER` && docker tag $HASH 1install-hello $ CONTAINER=`docker run --detach --tty --privileged 1install-hello` $ docker exec --interactive --tty $CONTAINER /bin/sh / # /root/.config/guix/current/bin/guix remove hello / # exit $ docker stop $CONTAINER $ HASH=`docker commit $CONTAINER` && docker tag $HASH 2remove-hello $ CONTAINER=`docker run --detach --tty --privileged 2remove-hello` $ docker exec --interactive --tty $CONTAINER /bin/sh / # /root/.config/guix/current/bin/guix pull -d / # /root/.config/guix/current/bin/guix package -d / # /root/.config/guix/current/bin/guix gc / # exit $ docker stop $CONTAINER $ HASH=`docker commit $CONTAINER` && docker tag $HASH 3clean $ CONTAINER=`docker run --detach --tty --privileged 3clean` $ docker export $CONTAINER | docker import --change $CMD - 4reexport --8<---------------cut here---------------end--------------->8--- where I cheated with $CMD which does not as is but the full 'CMD...' has to be typed after '--change'. All the best, simon ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2020-06-01 0:37 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-05-27 19:41 Guix Docker image inflation Stephen Scheck 2020-05-28 18:10 ` Leo Famulari 2020-05-29 16:19 ` Stephen Scheck 2020-05-29 17:08 ` Leo Famulari 2020-05-29 17:56 ` Stephen Scheck 2020-05-29 18:02 ` Leo Famulari 2020-05-29 18:21 ` Marius Bakke 2020-05-29 18:37 ` Leo Famulari 2020-05-29 18:44 ` zimoun 2020-05-29 21:24 ` Stephen Scheck 2020-05-29 18:29 ` Stephen Scheck 2020-05-29 17:12 ` zimoun 2020-05-29 17:36 ` Stephen Scheck 2020-05-29 18:08 ` zimoun 2020-05-29 18:47 ` Stephen Scheck 2020-05-29 20:02 ` zimoun 2020-05-29 21:04 ` Stephen Scheck 2020-05-29 21:54 ` zimoun 2020-05-29 22:11 ` Stephen Scheck 2020-05-29 23:30 ` Chris Marusich 2020-05-29 23:55 ` zimoun 2020-05-30 17:13 ` Stephen Scheck 2020-05-31 9:37 ` zimoun 2020-05-31 18:30 ` Stephen Scheck 2020-05-31 18:51 ` zimoun 2020-05-31 19:43 ` Stephen Scheck 2020-05-31 23:27 ` zimoun 2020-05-31 21:04 ` Chris Marusich 2020-06-01 0:37 ` zimoun 2020-05-30 17:02 ` Stephen Scheck 2020-05-31 4:31 ` Chris Marusich 2020-05-31 9:08 ` zimoun 2020-05-31 17:50 ` Stephen Scheck 2020-05-31 18:33 ` zimoun 2020-05-31 8:24 ` zimoun 2020-05-31 10:50 ` Vincent Legoll 2020-05-31 17:58 ` zimoun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).