* bug#55936: dockerd fails to start on boot @ 2022-06-12 22:56 Luciano Laratelli 2022-06-24 5:11 ` Maxim Cournoyer 2022-07-02 10:41 ` bug#55936: [PATCH] services: docker: Fix race condition Oleg Pykhalov 0 siblings, 2 replies; 8+ messages in thread From: Luciano Laratelli @ 2022-06-12 22:56 UTC (permalink / raw) To: 55936 [-- Attachment #1: Type: text/plain, Size: 5580 bytes --] Hi, hope you are doing well. I’m running Guix System and am seeing that `dockerd' fails to start on boot due to not being able to find `containerd': $ sudo tail /var/log/docker.log 2022-06-12 18:25:29 time=“2022-06-12T18:25:29.969005384-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e eb10082295c7a53d882e36d93a8b5eb20e980a5950c4a67fa03444274448b232], retrying….” 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.068910364-04:00” level=info msg=“Removing stale sandbox e35667a7ef1441bced213cf035efc9d6c71a0dce7f8941e3fbb63f5a27265bca (91314e5594f72585f9df121ba16cc8d67c4e1fcb91fc3c7b9b0660aed1b3054a)” 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.080685302-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e 825f4a6f68b1b81b24b2edc0b382deca116e72a75e6207036f24e18ba6434c81], retrying….” 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.143624227-04:00” level=info msg=“Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option –bip can be used to set a preferred IP address” 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.400700443-04:00” level=info msg=“Loading containers: done.” 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.689183684-04:00” level=info msg=“Docker daemon” commit=v19.03.15 graphdriver(s)=overlay2 version=19.03.15-ce 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.691171101-04:00” level=info msg=“Daemon has completed initialization” 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.961049886-04:00” level=info msg=“API listen on /var/run/docker.sock” 2022-06-12 18:43:43 time=“2022-06-12T18:43:43.503118343-04:00” level=info msg=“Starting up” 2022-06-12 18:43:43 failed to start containerd: exec: “containerd”: executable file not found in $PATH $ sudo docker ps Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? $ sudo herd status dockerd Status of dockerd: It is stopped. It is enabled. Provides (dockerd). Requires (containerd dbus-system elogind file-system-/sys/fs/cgroup/blkio file-system-/sys/fs/cgroup/cpu file-system-/sys/fs/cgroup/cpuset file-system-/sys/fs/cgroup/devices file-system-/sys/fs/cgroup/memory file-system-/sys/fs/cgroup/pids networking udev). Conflicts with (). Will be respawned. I can start it myself, though: $ sudo herd start dockerd Service dockerd has been started. I found a [past issue] on this list with someone experiencing a similar problem, but adding `kmod' as suggested did not resolve the issue on my end.. Here’s my config.scm - I’ve redcated my host name and file-system/swap-devices blocks, but everything else is verbatim what the machine is running. (use-modules (gnu) (nongnu packages linux) (nongnu system linux-initrd)) (use-service-modules desktop networking ssh xorg docker) (operating-system (kernel linux) (firmware (list linux-firmware)) (initrd microcode-initrd) (locale “en_US.utf8”) (timezone “America/New_York”) (keyboard-layout (keyboard-layout “us”)) (users (cons* (user-account (name “luciano”) (comment “Luciano Laratelli”) (group “users”) (home-directory “/home/luciano”) (supplementary-groups ’(“wheel” “netdev” “audio” “video”))) %base-user-accounts)) (packages (append (list (specification->package “st”) (specification->package “nss-certs”) (specification->package “docker”) (specification->package “docker-compose”) (specification->package “containerd”) (specification->package “kmod”) (specification->package “vim”) (specification->package “emacs-no-x-toolkit”) (specification->package “parted”)) %base-packages)) (services (append (list (service openssh-service-type (openssh-configuration (password-authentication? #f))) (service network-manager-service-type) (service wpa-supplicant-service-type) (service docker-service-type) (elogind-service) (set-xorg-configuration (xorg-configuration (keyboard-layout keyboard-layout)))) (modify-services %base-services (guix-service-type config => (guix-configuration (inherit config) (substitute-urls (append (list "<https://substitutes.nonguix.org>”) %default-substitute-urls)) (authorized-keys (append (list (plain-file “signing-key.pub” “ (public-key (ecc (curve Ed25519) (q #C1FD53E5D4CE971933EC50C9F307AE2171A2D3B52C804642A7A35F84F3A4EA98#) ) )”)) %default-authorized-guix-keys))))))) (bootloader (bootloader-configuration (bootloader grub-efi-bootloader) (targets ’(“/boot/efi”)) (keyboard-layout keyboard-layout)))) I’m not sure how to debug this issue any further and would appreciate some pointers there. Thank you, Luciano [past issue] <https://issues.guix.gnu.org/issue/34333> [-- Attachment #2: Type: text/html, Size: 8942 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: dockerd fails to start on boot 2022-06-12 22:56 bug#55936: dockerd fails to start on boot Luciano Laratelli @ 2022-06-24 5:11 ` Maxim Cournoyer 2022-07-02 10:41 ` bug#55936: [PATCH] services: docker: Fix race condition Oleg Pykhalov 1 sibling, 0 replies; 8+ messages in thread From: Maxim Cournoyer @ 2022-06-24 5:11 UTC (permalink / raw) To: Luciano Laratelli; +Cc: 55936 Hello, Luciano Laratelli <luciano@laratel.li> writes: > Hi, hope you are doing well. > > I’m running Guix System and am seeing that `dockerd' fails to start on boot due to not being able to find `containerd': > > $ sudo tail /var/log/docker.log > 2022-06-12 18:25:29 time=“2022-06-12T18:25:29.969005384-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e eb10082295c7a53d882e36d93a8b5eb20e980a5950c4a67fa03444274448b232], retrying….” > 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.068910364-04:00” level=info msg=“Removing stale sandbox e35667a7ef1441bced213cf035efc9d6c71a0dce7f8941e3fbb63f5a27265bca (91314e5594f72585f9df121ba16cc8d67c4e1fcb91fc3c7b9b0660aed1b3054a)” > 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.080685302-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e 825f4a6f68b1b81b24b2edc0b382deca116e72a75e6207036f24e18ba6434c81], retrying….” > 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.143624227-04:00” level=info msg=“Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option –bip can be used to set a preferred IP address” > 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.400700443-04:00” level=info msg=“Loading containers: done.” > 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.689183684-04:00” level=info msg=“Docker daemon” commit=v19.03.15 graphdriver(s)=overlay2 version=19.03.15-ce > 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.691171101-04:00” level=info msg=“Daemon has completed initialization” > 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.961049886-04:00” level=info msg=“API listen on /var/run/docker.sock” > 2022-06-12 18:43:43 time=“2022-06-12T18:43:43.503118343-04:00” level=info msg=“Starting up” > 2022-06-12 18:43:43 failed to start containerd: exec: “containerd”: executable file not found in $PATH It seems there's a race condition between containerd and docker (the later starts before the former is done launching and it fails to see it, aborting, if I understand). We should see if we can migrate the dockerd-service-type to use the newly introduced systemd-style constructor. Thanks, Maxim ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: [PATCH] services: docker: Fix race condition. 2022-06-12 22:56 bug#55936: dockerd fails to start on boot Luciano Laratelli 2022-06-24 5:11 ` Maxim Cournoyer @ 2022-07-02 10:41 ` Oleg Pykhalov 2022-07-10 5:10 ` Maxim Cournoyer 1 sibling, 1 reply; 8+ messages in thread From: Oleg Pykhalov @ 2022-07-02 10:41 UTC (permalink / raw) To: 55936; +Cc: Oleg Pykhalov, Maxim Cournoyer Fixes <https://issues.guix.gnu.org/38432>. * gnu/packages/patches/containerd-create-pid-file.patch: New file. * gnu/local.mk (dist_patch_DATA): Add this. * gnu/packages/docker.scm (containerd)[source]: Add this patch. * gnu/services/docker.scm (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout. * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag. --- gnu/local.mk | 3 +- gnu/packages/docker.scm | 6 ++-- .../patches/containerd-create-pid-file.patch | 31 +++++++++++++++++++ gnu/services/docker.scm | 5 ++- 4 files changed, 41 insertions(+), 4 deletions(-) create mode 100644 gnu/packages/patches/containerd-create-pid-file.patch diff --git a/gnu/local.mk b/gnu/local.mk index 3a56ad371d..5cd235286c 100644 --- a/gnu/local.mk +++ b/gnu/local.mk @@ -17,7 +17,7 @@ # Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com> # Copyright © 2017, 2018, 2019 Gábor Boskovits <boskovits@gmail.com> # Copyright © 2018 Amirouche Boubekki <amirouche@hypermove.net> -# Copyright © 2018, 2019, 2020, 2021 Oleg Pykhalov <go.wigust@gmail.com> +# Copyright © 2018, 2019, 2020, 2021, 2022 Oleg Pykhalov <go.wigust@gmail.com> # Copyright © 2018 Stefan Stefanović <stefanx2ovic@gmail.com> # Copyright © 2018, 2020, 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com> # Copyright © 2019, 2020, 2021, 2022 Guillaume Le Vaillant <glv@posteo.net> @@ -965,6 +965,7 @@ dist_patch_DATA = \ %D%/packages/patches/cmh-support-fplll.patch \ %D%/packages/patches/coda-use-system-libs.patch \ %D%/packages/patches/collectd-5.11.0-noinstallvar.patch \ + %D%/packages/patches/containerd-create-pid-file.patch \ %D%/packages/patches/combinatorial-blas-awpm.patch \ %D%/packages/patches/combinatorial-blas-io-fix.patch \ %D%/packages/patches/cool-retro-term-wctype.patch \ diff --git a/gnu/packages/docker.scm b/gnu/packages/docker.scm index ae4ee419af..184280b38f 100644 --- a/gnu/packages/docker.scm +++ b/gnu/packages/docker.scm @@ -6,7 +6,7 @@ ;;; Copyright © 2020 Michael Rohleder <mike@rohleder.de> ;;; Copyright © 2020 Katherine Cox-Buday <cox.katherine.e@gmail.com> ;;; Copyright © 2020 Jesse Dowell <jessedowell@gmail.com> -;;; Copyright © 2021 Oleg Pykhalov <go.wigust@gmail.com> +;;; Copyright © 2021, 2022 Oleg Pykhalov <go.wigust@gmail.com> ;;; Copyright © 2022 Pierre Langlois <pierre.langlois@gmx.com> ;;; ;;; This file is part of GNU Guix. @@ -184,7 +184,9 @@ (define-public containerd (commit (string-append "v" version)))) (file-name (git-file-name name version)) (sha256 - (base32 "1vsl747i3wyy68j4lp4nprwxadbyga8qxlrk892afcd2990zp5mr")))) + (base32 "1vsl747i3wyy68j4lp4nprwxadbyga8qxlrk892afcd2990zp5mr")) + (patches + (search-patches "containerd-create-pid-file.patch")))) (build-system go-build-system) (arguments (let ((make-flags #~(list (string-append "VERSION=" #$version) diff --git a/gnu/packages/patches/containerd-create-pid-file.patch b/gnu/packages/patches/containerd-create-pid-file.patch new file mode 100644 index 0000000000..668ffcd9e9 --- /dev/null +++ b/gnu/packages/patches/containerd-create-pid-file.patch @@ -0,0 +1,31 @@ +Copyright © 2022 Oleg Pykhalov <go.wigust@gmail.com> + +Create a PID file after containerd is ready to serve requests. + +Fixes <https://issues.guix.gnu.org/38432>. + +--- a/cmd/containerd/command/notify_linux.go 1970-01-01 03:00:01.000000000 +0300 ++++ b/cmd/containerd/command/notify_linux.go 2022-07-02 04:42:35.553753495 +0300 +@@ -22,15 +22,22 @@ + sd "github.com/coreos/go-systemd/v22/daemon" + + "github.com/containerd/containerd/log" ++ ++ "os" ++ "strconv" + ) + + // notifyReady notifies systemd that the daemon is ready to serve requests + func notifyReady(ctx context.Context) error { ++ pidFile, _ := os.Create("/run/containerd/containerd.pid") ++ defer pidFile.Close() ++ pidFile.WriteString(strconv.FormatInt(int64(os.Getpid()), 10)) + return sdNotify(ctx, sd.SdNotifyReady) + } + + // notifyStopping notifies systemd that the daemon is about to be stopped + func notifyStopping(ctx context.Context) error { ++ os.Remove("/run/containerd/containerd.pid") + return sdNotify(ctx, sd.SdNotifyStopping) + } + diff --git a/gnu/services/docker.scm b/gnu/services/docker.scm index 846ebe8334..741bab5a8c 100644 --- a/gnu/services/docker.scm +++ b/gnu/services/docker.scm @@ -98,6 +98,8 @@ (define (containerd-shepherd-service config) ;; For finding containerd-shim binary. #:environment-variables (list (string-append "PATH=" #$containerd "/bin")) + #:pid-file "/run/containerd/containerd.pid" + #:pid-file-timeout 300 #:log-file "/var/log/containerd.log")) (stop #~(make-kill-destructor))))) @@ -135,7 +137,8 @@ (define (docker-shepherd-service config) '("--userland-proxy=false")) (if #$enable-iptables? "--iptables" - "--iptables=false")) + "--iptables=false") + "--containerd" "/run/containerd/containerd.sock") #:environment-variables (list #$@environment-variables) #:pid-file "/var/run/docker.pid" -- 2.36.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#55936: [PATCH] services: docker: Fix race condition. 2022-07-02 10:41 ` bug#55936: [PATCH] services: docker: Fix race condition Oleg Pykhalov @ 2022-07-10 5:10 ` Maxim Cournoyer 2022-07-13 21:06 ` bug#55936: dockerd fails to start on boot Maxim Cournoyer 0 siblings, 1 reply; 8+ messages in thread From: Maxim Cournoyer @ 2022-07-10 5:10 UTC (permalink / raw) To: Oleg Pykhalov; +Cc: 55936 Hi Oleg, Oleg Pykhalov <go.wigust@gmail.com> writes: > Fixes <https://issues.guix.gnu.org/38432>. > > * gnu/packages/patches/containerd-create-pid-file.patch: New file. > * gnu/local.mk (dist_patch_DATA): Add this. > * gnu/packages/docker.scm (containerd)[source]: Add this patch. > * gnu/services/docker.scm > (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout. > * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag. Thanks for this, it looks promising! Before we go forward though, had you consider using a 'make-systemd-constructor' as now available in Shepherd 0.9+ ? I remember Docker supports systemd socket activation for synchronizing its services; it could be a simpler, no-code solution. Would you like to give it a try? Thanks, Maxim ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: dockerd fails to start on boot 2022-07-10 5:10 ` Maxim Cournoyer @ 2022-07-13 21:06 ` Maxim Cournoyer 2022-07-14 1:40 ` Maxim Cournoyer 2022-07-15 10:20 ` Ludovic Courtès 0 siblings, 2 replies; 8+ messages in thread From: Maxim Cournoyer @ 2022-07-13 21:06 UTC (permalink / raw) To: Oleg Pykhalov; +Cc: 55936 Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Hi Oleg, > > Oleg Pykhalov <go.wigust@gmail.com> writes: > >> Fixes <https://issues.guix.gnu.org/38432>. >> >> * gnu/packages/patches/containerd-create-pid-file.patch: New file. >> * gnu/local.mk (dist_patch_DATA): Add this. >> * gnu/packages/docker.scm (containerd)[source]: Add this patch. >> * gnu/services/docker.scm >> (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout. >> * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag. > > Thanks for this, it looks promising! > > Before we go forward though, had you consider using a > 'make-systemd-constructor' as now available in Shepherd 0.9+ ? I > remember Docker supports systemd socket activation for synchronizing its > services; it could be a simpler, no-code solution. I've researched more on the topic, and it appears what I had on mind is rather systemd's socket *notification* (what they call 'sdNotify') rather than activation. Activation is just to lazy start things... it probably wouldn't help here, rather it seems it'd be a bad idea, as realized elsewhere [0]. [0] https://github.com/containerd/containerd/issues/164#issuecomment-657536515 All that to say that I shall be reviewing your patches shortly :-). Thank you, Maxim ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: dockerd fails to start on boot 2022-07-13 21:06 ` bug#55936: dockerd fails to start on boot Maxim Cournoyer @ 2022-07-14 1:40 ` Maxim Cournoyer 2022-07-15 10:20 ` Ludovic Courtès 1 sibling, 0 replies; 8+ messages in thread From: Maxim Cournoyer @ 2022-07-14 1:40 UTC (permalink / raw) To: Oleg Pykhalov; +Cc: 55936-done Hi Oleg, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Hi, > > Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > >> Hi Oleg, >> >> Oleg Pykhalov <go.wigust@gmail.com> writes: >> >>> Fixes <https://issues.guix.gnu.org/38432>. >>> >>> * gnu/packages/patches/containerd-create-pid-file.patch: New file. >>> * gnu/local.mk (dist_patch_DATA): Add this. >>> * gnu/packages/docker.scm (containerd)[source]: Add this patch. >>> * gnu/services/docker.scm >>> (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout. >>> * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag. >> >> Thanks for this, it looks promising! [...] > All that to say that I shall be reviewing your patches shortly :-). Now done; it all looks good to me! I've run the docker system test, and installed it on my machine, rebooted, confirmed it was up, restarted containerd a couple times and checked the PID content matched its actual PID, and it seems to behave as expected! Pushed as b33e1a183f6756514e6b6a3b84054a232dbddad4. Thank you! Maxim ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: dockerd fails to start on boot 2022-07-13 21:06 ` bug#55936: dockerd fails to start on boot Maxim Cournoyer 2022-07-14 1:40 ` Maxim Cournoyer @ 2022-07-15 10:20 ` Ludovic Courtès 2022-07-16 1:55 ` Maxim Cournoyer 1 sibling, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2022-07-15 10:20 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: Oleg Pykhalov, 55936 Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > I've researched more on the topic, and it appears what I had on mind is > rather systemd's socket *notification* (what they call 'sdNotify') > rather than activation. Activation is just to lazy start things... it > probably wouldn't help here, rather it seems it'd be a bad idea, as > realized elsewhere [0]. > > [0] https://github.com/containerd/containerd/issues/164#issuecomment-657536515 Currently the Shepherd implements activation as lazy start, but we should add an option for “eager socket activation” where the daemon is started right away. Such activation is still useful as a synchronization mechanism: you can tell the service is ready to serve requests as soon as the socket has been created. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#55936: dockerd fails to start on boot 2022-07-15 10:20 ` Ludovic Courtès @ 2022-07-16 1:55 ` Maxim Cournoyer 0 siblings, 0 replies; 8+ messages in thread From: Maxim Cournoyer @ 2022-07-16 1:55 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Oleg Pykhalov, 55936 Hi, Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > >> I've researched more on the topic, and it appears what I had on mind is >> rather systemd's socket *notification* (what they call 'sdNotify') >> rather than activation. Activation is just to lazy start things... it >> probably wouldn't help here, rather it seems it'd be a bad idea, as >> realized elsewhere [0]. >> >> [0] https://github.com/containerd/containerd/issues/164#issuecomment-657536515 > > Currently the Shepherd implements activation as lazy start, but we > should add an option for “eager socket activation” where the daemon is > started right away. > > Such activation is still useful as a synchronization mechanism: you can > tell the service is ready to serve requests as soon as the socket has > been created. But this relies on the application behaving that way (e.g., waiting for the socket to be opened, rather than expecting things to be ready and failing), right? If I understand correctly, the sdNotify mechanism in systemd is a means that let the application notify systemd when it is ready, so that systemd itself can ensure the ordering relationships. So on systemd containerd would be marked as 'starting' by systemd until it notifies it that it's good via sdNotify, and docker.service would be waiting on it until after containerd has started since it is ordered to start after it [0] [0] https://github.com/moby/moby/blob/master/contrib/init/systemd/docker.service#L4 Thanks, Maxim ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-07-16 1:56 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-12 22:56 bug#55936: dockerd fails to start on boot Luciano Laratelli 2022-06-24 5:11 ` Maxim Cournoyer 2022-07-02 10:41 ` bug#55936: [PATCH] services: docker: Fix race condition Oleg Pykhalov 2022-07-10 5:10 ` Maxim Cournoyer 2022-07-13 21:06 ` bug#55936: dockerd fails to start on boot Maxim Cournoyer 2022-07-14 1:40 ` Maxim Cournoyer 2022-07-15 10:20 ` Ludovic Courtès 2022-07-16 1:55 ` Maxim Cournoyer
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.