* bug#58485: [shepherd] Restarting guix-publish fails @ 2022-10-13 7:51 Lars-Dominik Braun 2022-10-13 9:28 ` Liliana Marie Prikler 2022-11-17 8:24 ` Ludovic Courtès 0 siblings, 2 replies; 15+ messages in thread From: Lars-Dominik Braun @ 2022-10-13 7:51 UTC (permalink / raw) To: 58485; +Cc: Ludovic Courtès Hi, it seems that `herd restart guix-publish` stopped working after the introduction of socket activation into shepherd. This is a problem, because I restart guix-publish automatically after unattended-upgrades. It fails with the following error for me: ---snip--- Backtrace: 7 (primitive-load "/gnu/store/7xrg2sbb529ki6hv99n27svg0fi?") In ice-9/boot-9.scm: 724:2 6 (call-with-prompt ("prompt") #<procedure 7f8173184940 ?> ?) 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _) In ice-9/eval.scm: 619:8 4 (_ #(#(#<directory (guile-user) 7f817318ac80>))) In ice-9/boot-9.scm: 260:13 3 (for-each #<procedure restart-service (name)> _) In gnu/services/herd.scm: 168:4 2 (invoke-action guix-publish restart () #<procedure 7f81?>) 176:7 1 (failure) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: ERROR: 1. &action-exception-error: service: guix-publish action: start key: system-error args: ("bind" "~A" ("Address already in use") (98)) ---snap--- Note that due to the socket activation you must visit the URL at least once to start up the guix-publish process. Otherwise a restart will work fine. It also works fine the second time I invoke `herd restart guix-publish`, because `guix-publish` is dead by that time. Looking at an strace shepherd is indeed trying to kill `guix-publish` and re-bind to the same address: ---snip--- 1 read(23, "(shepherd-command (version 0) (action restart) (service guix-publish) (arguments ()) (directory \"/root\"))", 1024) = 105 1 getpgid(18096) = 18096 1 getpgid(0) = 0 1 kill(-18096, SIGTERM) = 0 1 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2298, ...}, 0) = 0 1 write(17, "shepherd[1]: Service guix-publish has been stopped.\n", 52) = 52 1 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 36 1 setsockopt(36, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 1 bind(36, {sa_family=AF_INET, sin_port=htons(8082), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRINUSE (Address already in use) 1 write(23, "(reply (version 0) (result #f) (error (error (version 0) action-exception start guix-publish system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service guix-publish has been stopped.\")))", 208) = 208 1 close(23) ---snap--- The obvious explanation would be that stopping does not wait for the process to actually exit. make-kill-destructor does not waitpid it seems and 'running is set unconditionally to #f after 'stop has finished. Cheers, Lars ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-13 7:51 bug#58485: [shepherd] Restarting guix-publish fails Lars-Dominik Braun @ 2022-10-13 9:28 ` Liliana Marie Prikler 2022-10-13 11:35 ` Lars-Dominik Braun 2022-11-17 8:24 ` Ludovic Courtès 1 sibling, 1 reply; 15+ messages in thread From: Liliana Marie Prikler @ 2022-10-13 9:28 UTC (permalink / raw) To: Lars-Dominik Braun, 58485; +Cc: Ludovic Courtès Am Donnerstag, dem 13.10.2022 um 09:51 +0200 schrieb Lars-Dominik Braun: > The obvious explanation would be that stopping does not wait for the > process to actually exit. make-kill-destructor does not waitpid it > seems and 'running is set unconditionally to #f after 'stop has > finished. Shouldn't [1] address this very issue? [1] http://git.savannah.gnu.org/cgit/guix.git/commit/?id=2a37f174becbafd70591f6eb1d98493c5c1df0e2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-13 9:28 ` Liliana Marie Prikler @ 2022-10-13 11:35 ` Lars-Dominik Braun 2022-10-13 13:38 ` Liliana Marie Prikler 0 siblings, 1 reply; 15+ messages in thread From: Lars-Dominik Braun @ 2022-10-13 11:35 UTC (permalink / raw) To: Liliana Marie Prikler; +Cc: Ludovic Courtès, 58485 Hi Liliana, > Shouldn't [1] address this very issue? > [1] > http://git.savannah.gnu.org/cgit/guix.git/commit/?id=2a37f174becbafd70591f6eb1d98493c5c1df0e2 no, if the process is running make-systemd-destructor is just an alias for make-kill-destructor. So it does not matter which one we use in this case. Lars ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-13 11:35 ` Lars-Dominik Braun @ 2022-10-13 13:38 ` Liliana Marie Prikler 2022-10-14 6:18 ` Lars-Dominik Braun 0 siblings, 1 reply; 15+ messages in thread From: Liliana Marie Prikler @ 2022-10-13 13:38 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: Ludovic Courtès, 58485 Am Donnerstag, dem 13.10.2022 um 13:35 +0200 schrieb Lars-Dominik Braun: > Hi Liliana, > > > Shouldn't [1] address this very issue? > > [1] > > http://git.savannah.gnu.org/cgit/guix.git/commit/?id=2a37f174becbafd70591f6eb1d98493c5c1df0e2 > no, if the process is running make-systemd-destructor is just an > alias for make-kill-destructor. So it does not matter which one we > use in this case. Ahh, so the issue is that shepherd waits neither for the process to be actually killed nor for the socket to become available, isn't it? ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-13 13:38 ` Liliana Marie Prikler @ 2022-10-14 6:18 ` Lars-Dominik Braun 2022-10-14 6:57 ` Liliana Marie Prikler 0 siblings, 1 reply; 15+ messages in thread From: Lars-Dominik Braun @ 2022-10-14 6:18 UTC (permalink / raw) To: Liliana Marie Prikler; +Cc: Ludovic Courtès, 58485 Hi, > Ahh, so the issue is that shepherd waits neither for the process to be > actually killed nor for the socket to become available, isn't it? I would argue it’s the former, but having either of them would solve the problem, I think. Lars ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-14 6:18 ` Lars-Dominik Braun @ 2022-10-14 6:57 ` Liliana Marie Prikler 0 siblings, 0 replies; 15+ messages in thread From: Liliana Marie Prikler @ 2022-10-14 6:57 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: Ludovic Courtès, 58485 Am Freitag, dem 14.10.2022 um 08:18 +0200 schrieb Lars-Dominik Braun: > Hi, > > > Ahh, so the issue is that shepherd waits neither for the process to > > be > > actually killed nor for the socket to become available, isn't it? > I would argue it’s the former, but having either of them would solve > the problem, I think. I think you need both: if the process is killed, but the socket remains, you need to clean it up. As far as I'm aware, that does not happen automatically. Cheers ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-10-13 7:51 bug#58485: [shepherd] Restarting guix-publish fails Lars-Dominik Braun 2022-10-13 9:28 ` Liliana Marie Prikler @ 2022-11-17 8:24 ` Ludovic Courtès 2022-11-17 10:19 ` Ludovic Courtès ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Ludovic Courtès @ 2022-11-17 8:24 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: 58485 Hi, Lars-Dominik Braun <lars@6xq.net> skribis: > 1 kill(-18096, SIGTERM) = 0 > 1 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2298, ...}, 0) = 0 > 1 write(17, "shepherd[1]: Service guix-publish has been stopped.\n", 52) = 52 > 1 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 36 > 1 setsockopt(36, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 1 bind(36, {sa_family=AF_INET, sin_port=htons(8082), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EADDRINUSE (Address already in use) > 1 write(23, "(reply (version 0) (result #f) (error (error (version 0) action-exception start guix-publish system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service guix-publish has been stopped.\")))", 208) = 208 > 1 close(23) > ---snap--- > > The obvious explanation would be that stopping does not wait for the > process to actually exit. make-kill-destructor does not waitpid it seems > and 'running is set unconditionally to #f after 'stop has finished. Indeed. This is fixed by Shepherd commit d97592f58603ff51cb280ae57d413c8731e601b3, which will be in the upcoming 0.9.3 release. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-11-17 8:24 ` Ludovic Courtès @ 2022-11-17 10:19 ` Ludovic Courtès 2023-02-07 8:39 ` Lars-Dominik Braun [not found] ` <Y+IM4IrO4V05o3V9@zpidnb93> 2 siblings, 0 replies; 15+ messages in thread From: Ludovic Courtès @ 2022-11-17 10:19 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: 58485-done Ludovic Courtès <ludo@gnu.org> skribis: > Indeed. This is fixed by Shepherd commit > d97592f58603ff51cb280ae57d413c8731e601b3, which will be in the upcoming > 0.9.3 release. The Shepherd 0.9.3 has landed in Guix commit 283d7318c5b312d7129adb6dbeea6ad205ce89d1. Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2022-11-17 8:24 ` Ludovic Courtès 2022-11-17 10:19 ` Ludovic Courtès @ 2023-02-07 8:39 ` Lars-Dominik Braun [not found] ` <Y+IM4IrO4V05o3V9@zpidnb93> 2 siblings, 0 replies; 15+ messages in thread From: Lars-Dominik Braun @ 2023-02-07 8:39 UTC (permalink / raw) To: 58485 [-- Attachment #1: Type: text/plain, Size: 1979 bytes --] Hi Ludo, > Indeed. This is fixed by Shepherd commit > d97592f58603ff51cb280ae57d413c8731e601b3, which will be in the upcoming > 0.9.3 release. I’m on 0.9.3 and it works fine with `herd restart` now. But ssh-daemon has the same issue when being restarted by unattended-upgrades (which is fatal, because unable to use SSH I have to restart the entire box): ---snip--- shepherd: Service nginx has been stopped. shepherd: Service nginx has been started. shepherd: Service collectd has been stopped. shepherd: Service collectd has been started. shepherd: Service ntpd has been stopped. shepherd: Service ntpd has been started. shepherd: Service guix-publish has been stopped. shepherd: Service guix-publish has been started. shepherd: Service ssh-daemon has been stopped. Backtrace: 7 (primitive-load "/gnu/store/ip5m1n8kb6p0rfglzpkk17k060a?") In ice-9/boot-9.scm: 724:2 6 (call-with-prompt ("prompt") #<procedure 7f89a11f3840 ?> ?) 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _) In ice-9/eval.scm: 619:8 4 (_ #(#(#<directory (guile-user) 7f89a11ffc80>))) In ice-9/boot-9.scm: 260:13 3 (for-each #<procedure restart-service (name)> _) In gnu/services/herd.scm: 168:4 2 (invoke-action ssh-daemon restart () #<procedure 7f89a0?>) 176:7 1 (failure) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: ERROR: 1. &action-exception-error: service: ssh-daemon action: start key: system-error args: ("bind" "~A" ("Address already in use") (98) ---snap--- Maybe I can strace herd and see what happens exactly. Thanks, Lars -- Lars-Dominik Braun Wissenschaftlicher Mitarbeiter/Research Associate www.leibniz-psychology.org ZPID - Leibniz-Institut für Psychologie / ZPID - Leibniz Institute for Psychology Universitätsring 15 D-54296 Trier - Germany Tel.: +49–651–201-4964 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <Y+IM4IrO4V05o3V9@zpidnb93>]
* bug#58485: [shepherd] Restarting guix-publish fails [not found] ` <Y+IM4IrO4V05o3V9@zpidnb93> @ 2023-02-20 10:20 ` Ludovic Courtès 2023-02-20 13:25 ` Lars-Dominik Braun 0 siblings, 1 reply; 15+ messages in thread From: Ludovic Courtès @ 2023-02-20 10:20 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: 58485, Lars-Dominik Braun Hi Lars, Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis: >> Indeed. This is fixed by Shepherd commit >> d97592f58603ff51cb280ae57d413c8731e601b3, which will be in the upcoming >> 0.9.3 release. > I’m on 0.9.3 and it works fine with `herd restart` now. But ssh-daemon > has the same issue when being restarted by unattended-upgrades (which > is fatal, because unable to use SSH I have to restart the entire box): > > ---snip--- > shepherd: Service nginx has been stopped. > shepherd: Service nginx has been started. > shepherd: Service collectd has been stopped. > shepherd: Service collectd has been started. > shepherd: Service ntpd has been stopped. > shepherd: Service ntpd has been started. > shepherd: Service guix-publish has been stopped. > shepherd: Service guix-publish has been started. > shepherd: Service ssh-daemon has been stopped. > Backtrace: > 7 (primitive-load "/gnu/store/ip5m1n8kb6p0rfglzpkk17k060a?") > In ice-9/boot-9.scm: > 724:2 6 (call-with-prompt ("prompt") #<procedure 7f89a11f3840 ?> ?) > 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _) > In ice-9/eval.scm: > 619:8 4 (_ #(#(#<directory (guile-user) 7f89a11ffc80>))) > In ice-9/boot-9.scm: > 260:13 3 (for-each #<procedure restart-service (name)> _) > In gnu/services/herd.scm: > 168:4 2 (invoke-action ssh-daemon restart () #<procedure 7f89a0?>) > 176:7 1 (failure) > In ice-9/boot-9.scm: > 1685:16 0 (raise-exception _ #:continuable? _) > > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > ERROR: > 1. &action-exception-error: > service: ssh-daemon > action: start > key: system-error > args: ("bind" "~A" ("Address already in use") (98) > ---snap--- > > Maybe I can strace herd and see what happens exactly. Can you confirm shepherd (PID 1) is 0.9.3? ‘sudo herd restart ssh-daemon’ works fine on my laptop FWIW. Note that the situation is different from that of ‘guix publish’: here it’s inetd style, as opposed to systemd style for ‘guix publish’. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2023-02-20 10:20 ` Ludovic Courtès @ 2023-02-20 13:25 ` Lars-Dominik Braun 2023-04-27 21:23 ` Ludovic Courtès 0 siblings, 1 reply; 15+ messages in thread From: Lars-Dominik Braun @ 2023-02-20 13:25 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 58485, Lars-Dominik Braun [-- Attachment #1: Type: text/plain, Size: 3732 bytes --] Hi Ludo, > Can you confirm shepherd (PID 1) is 0.9.3? it is: root 1 0.2 0.2 308148 76816 ? Sl Feb07 52:08 /gnu/store/kphp5d85rrb3q1rdc2lfqc1mdklwh3qp-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/4nw0zb4swga0cb8i35nvng3rg6z5qm8p-shepherd-0.9.3/bin/shepherd --config /gnu/store/cvrai6z8777jf7860rnvppfznl1lcxi1-shepherd.conf > ‘sudo herd restart ssh-daemon’ works fine on my laptop FWIW. This works fine too. Only unattended-upgrades seems to have this issue :/ The strace looks unsuspicious right now: ---snip--- 1 14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103 1 14:12:15.117254 close(27) = 0 1 14:12:15.117283 close(30) = 0 1 14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0 1 14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50 1 14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26 1 14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 1 14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use) 1 14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204 1 14:12:15.117754 close(21) = 0 ---snap--- But nginx seems to have the same issue, except that it does not fail entirely and succeeds after waiting a short period of time: ---snip--- 2023/02/20 14:12:14 [notice] 7136#0: signal 15 (SIGTERM) received from 6644, exiting 2023/02/20 14:12:14 [notice] 7137#0: exiting 2023/02/20 14:12:14 [notice] 7137#0: exit 2023/02/20 14:12:14 [notice] 7136#0: signal 17 (SIGCHLD) received from 7137 2023/02/20 14:12:14 [notice] 7136#0: worker process 7137 exited with code 0 2023/02/20 14:12:14 [emerg] 6645#0: bind() to 0.0.0.0:443 failed (98: Address already in use) 2023/02/20 14:12:14 [emerg] 6645#0: bind() to 0.0.0.0:80 failed (98: Address already in use) 2023/02/20 14:12:14 [emerg] 6645#0: bind() to [::]:80 failed (98: Address already in use) 2023/02/20 14:12:14 [notice] 7136#0: exit 2023/02/20 14:12:14 [notice] 6645#0: try again to bind() after 500ms 2023/02/20 14:12:14 [notice] 6645#0: using the "epoll" event method 2023/02/20 14:12:14 [notice] 6645#0: nginx/1.23.3 2023/02/20 14:12:14 [notice] 6645#0: OS: Linux 6.1.9 2023/02/20 14:12:14 [notice] 6645#0: getrlimit(RLIMIT_NOFILE): 1024:4096 2023/02/20 14:12:14 [notice] 6648#0: start worker processes 2023/02/20 14:12:14 [notice] 6648#0: start worker process 6649 2023/02/20 14:12:32 [info] 6649#0: epoll_wait() failed (4: Interrupted system call) ---snap--- I see we’re already using SO_REUSEADDR, so all of this is a bit of a mystery to me. Thanks, Lars -- Lars-Dominik Braun Wissenschaftlicher Mitarbeiter/Research Associate www.leibniz-psychology.org ZPID - Leibniz-Institut für Psychologie / ZPID - Leibniz Institute for Psychology Universitätsring 15 D-54296 Trier - Germany Tel.: +49–651–201-4964 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2023-02-20 13:25 ` Lars-Dominik Braun @ 2023-04-27 21:23 ` Ludovic Courtès 2023-04-28 12:31 ` Lars-Dominik Braun 2023-04-28 13:09 ` bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon Ludovic Courtès 0 siblings, 2 replies; 15+ messages in thread From: Ludovic Courtès @ 2023-04-27 21:23 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: 58485, Lars-Dominik Braun Hi, Sorry for the late reply. I’m going through Shepherd bug reports and I remembered this discussion… Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis: >> Can you confirm shepherd (PID 1) is 0.9.3? > it is: > > root 1 0.2 0.2 308148 76816 ? Sl Feb07 52:08 /gnu/store/kphp5d85rrb3q1rdc2lfqc1mdklwh3qp-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/4nw0zb4swga0cb8i35nvng3rg6z5qm8p-shepherd-0.9.3/bin/shepherd --config /gnu/store/cvrai6z8777jf7860rnvppfznl1lcxi1-shepherd.conf > >> ‘sudo herd restart ssh-daemon’ works fine on my laptop FWIW. > This works fine too. Only unattended-upgrades seems to have this issue :/ > > The strace looks unsuspicious right now: > > ---snip--- > 1 14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103 > 1 14:12:15.117254 close(27) = 0 > 1 14:12:15.117283 close(30) = 0 > 1 14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s > t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c > time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0 > 1 14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50 > 1 14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26 > 1 14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 1 14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use) > 1 14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204 > 1 14:12:15.117754 close(21) = 0 This suggests ‘bind’ can return EADDRINUSE even though the sockets have been closed before (presumably file descriptors 27 and 30 above). Can you confirm nothing else is competing to bind port 2222 on that machine? I tried to reproduce it with something as brutal as: while sudo herd restart sshd ; do : ; done … to no avail (I’m on current Shepherd ‘master’ though). Maybe we should just have shepherd retry upon EADDRINUSE (like nginx does, as you wrote), though I’d like to understand under what conditions we can get EADDRINUSE in the first place. Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] Restarting guix-publish fails 2023-04-27 21:23 ` Ludovic Courtès @ 2023-04-28 12:31 ` Lars-Dominik Braun 2023-04-28 13:09 ` bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon Ludovic Courtès 1 sibling, 0 replies; 15+ messages in thread From: Lars-Dominik Braun @ 2023-04-28 12:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 58485, Lars-Dominik Braun [-- Attachment #1: Type: text/plain, Size: 669 bytes --] Hi, > Can you confirm nothing else is competing to bind port 2222 on that > machine? not sure how to confirm that with certainty (it’s hard to get an lsof in the exact right moment), but according to the OS config only SSHd is supposed to use port 2222, see [1]. [1] https://github.com/leibniz-psychology/psychnotebook-deploy/blob/master/src/zpid/machines/patna/os.scm Cheers, Lars -- Lars-Dominik Braun Wissenschaftlicher Mitarbeiter/Research Associate www.leibniz-psychology.org ZPID - Leibniz-Institut für Psychologie / ZPID - Leibniz Institute for Psychology Universitätsring 15 D-54296 Trier - Germany Tel.: +49–651–201-4964 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon 2023-04-27 21:23 ` Ludovic Courtès 2023-04-28 12:31 ` Lars-Dominik Braun @ 2023-04-28 13:09 ` Ludovic Courtès 2023-06-11 14:20 ` Ludovic Courtès 1 sibling, 1 reply; 15+ messages in thread From: Ludovic Courtès @ 2023-04-28 13:09 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: 58485, Lars-Dominik Braun Hi, Ludovic Courtès <ludo@gnu.org> skribis: >> 1 14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103 >> 1 14:12:15.117254 close(27) = 0 >> 1 14:12:15.117283 close(30) = 0 >> 1 14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s >> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c >> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0 >> 1 14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50 >> 1 14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26 >> 1 14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >> 1 14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use) >> 1 14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204 >> 1 14:12:15.117754 close(21) = 0 [...] > Maybe we should just have shepherd retry upon EADDRINUSE (like nginx > does, as you wrote), though I’d like to understand under what conditions > we can get EADDRINUSE in the first place. Done: https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=41789ee8d0e164967f9ca196db4e9601400a462e Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon 2023-04-28 13:09 ` bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon Ludovic Courtès @ 2023-06-11 14:20 ` Ludovic Courtès 0 siblings, 0 replies; 15+ messages in thread From: Ludovic Courtès @ 2023-06-11 14:20 UTC (permalink / raw) To: Lars-Dominik Braun; +Cc: Lars-Dominik Braun, 58485-done Hi Lars, Ludovic Courtès <ludo@gnu.org> skribis: > Ludovic Courtès <ludo@gnu.org> skribis: > >>> 1 14:12:15.117035 read(21, "(shepherd-command (version 0) (action restart) (service ssh-daemon) (arguments ()) (directory \"/root\"))", 1024) = 103 >>> 1 14:12:15.117254 close(27) = 0 >>> 1 14:12:15.117283 close(30) = 0 >>> 1 14:12:15.117416 newfstatat(AT_FDCWD, "/etc/localtime", {st_dev=makedev(0x8, 0x2), st_ino=110100491, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, s >>> t_size=2298, st_atime=1676898665 /* 2023-02-20T14:11:05.338746772+0100 */, st_atime_nsec=338746772, st_mtime=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_mtime_nsec=874743456, st_c >>> time=1676898664 /* 2023-02-20T14:11:04.874743456+0100 */, st_ctime_nsec=874743456}, 0) = 0 >>> 1 14:12:15.117475 write(17, "shepherd[1]: Service ssh-daemon has been stopped.\n", 50) = 50 >>> 1 14:12:15.117524 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 26 >>> 1 14:12:15.117561 setsockopt(26, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >>> 1 14:12:15.117598 bind(26, {sa_family=AF_INET, sin_port=htons(2222), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use) >>> 1 14:12:15.117724 write(21, "(reply (version 0) (result #f) (error (error (version 0) action-exception start ssh-daemon system-error (\"bind\" \"~A\" (\"Address already in use\") (98)))) (messages (\"Service ssh-daemon has been stopped.\")))", 204) = 204 >>> 1 14:12:15.117754 close(21) = 0 > > [...] > >> Maybe we should just have shepherd retry upon EADDRINUSE (like nginx >> does, as you wrote), though I’d like to understand under what conditions >> we can get EADDRINUSE in the first place. > > Done: > > https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=41789ee8d0e164967f9ca196db4e9601400a462e I’m assuming that this is fixed in Shepherd 0.10.x. Please reopen if you stumble upon this issue again. Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-06-11 14:21 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-13 7:51 bug#58485: [shepherd] Restarting guix-publish fails Lars-Dominik Braun 2022-10-13 9:28 ` Liliana Marie Prikler 2022-10-13 11:35 ` Lars-Dominik Braun 2022-10-13 13:38 ` Liliana Marie Prikler 2022-10-14 6:18 ` Lars-Dominik Braun 2022-10-14 6:57 ` Liliana Marie Prikler 2022-11-17 8:24 ` Ludovic Courtès 2022-11-17 10:19 ` Ludovic Courtès 2023-02-07 8:39 ` Lars-Dominik Braun [not found] ` <Y+IM4IrO4V05o3V9@zpidnb93> 2023-02-20 10:20 ` Ludovic Courtès 2023-02-20 13:25 ` Lars-Dominik Braun 2023-04-27 21:23 ` Ludovic Courtès 2023-04-28 12:31 ` Lars-Dominik Braun 2023-04-28 13:09 ` bug#58485: [shepherd] EADDRINUSE while restarting ssh-daemon Ludovic Courtès 2023-06-11 14:20 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).