From: "Ludovic Courtès" <ludo@gnu.org>
To: Mathieu Othacehe <othacehe@gnu.org>
Cc: 57827-done@debbugs.gnu.org
Subject: bug#57827: Shepherd 0.9.2 possible regressions
Date: Tue, 20 Sep 2022 19:30:53 +0200 [thread overview]
Message-ID: <87r106lzqq.fsf@gnu.org> (raw)
In-Reply-To: <871qs76f3y.fsf@gnu.org> (Mathieu Othacehe's message of "Mon, 19 Sep 2022 08:41:05 +0200")
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
> Regarding those four, I was able to reproduce the issue this way:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon (list (number->string (getpid))))
Or from the shell:
herd stop guix-daemon
herd start guix-daemon $$
I was able to reproduce it using a bare-bones.tmpl VM.
> The latter command hangs and Shepherd becomes unresponsive. I collected
> an (attached) strace dump of Shepherd showing that there is no response
> on the socket when the service is started.
>
> Note that, this works:
>
> $ guix repl
> (stop-service 'guix-daemon)
> (start-service 'guix-daemon)
>
> So the problem could be caused by the "container-excursion*" in the
> "fork+exec-command/container" procedure.
PID 1 gets stuck on read(16, …) forever, after reading the string “2866”
(a PID):
--8<---------------cut here---------------start------------->8---
[pid 2865] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 2866 attached
, child_tidptr=0x7fccfbe00a10) = 2866
[pid 2866] set_robust_list(0x7fccfbe00a20, 24) = 0
[pid 2866] close(3) = 0
[pid 2865] write(39, "2866", 4 <unfinished ...>
[pid 2866] close(4 <unfinished ...>
[pid 2865] <... write resumed>) = 4
[pid 2866] <... close resumed>) = 0
[pid 2866] pipe2( <unfinished ...>
[pid 2865] close(39 <unfinished ...>
[pid 2866] <... pipe2 resumed>[3, 4], O_CLOEXEC) = 0
[pid 2865] <... close resumed>) = 0
[pid 2865] exit_group(0) = ?
[pid 2866] rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER|SA_NOCLDSTOP, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid 2866] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid 2866] rt_sigaction(SIGHUP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid 2866] rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, {sa_handler=0x7fccfc427d50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fccfc304d80}, 8) = 0
[pid 2866] rt_sigprocmask(SIG_UNBLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0
[pid 2866] mkdir("/var", 0777) = -1 EEXIST (File exists)
[pid 2866] mkdir("/var/run", 0777) = -1 EEXIST (File exists)
[pid 2865] +++ exited with 0 +++
[pid 1] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 2865
[pid 1] close(39) = 0
[pid 2866] setsid( <unfinished ...>
[pid 1] read(16, <unfinished ...>
[pid 2866] <... setsid resumed>) = 2866
[pid 1] <... read resumed>"2866", 4096) = 4
[pid 2866] chdir("/") = 0
[pid 1] read(16, <unfinished ...>
[pid 2866] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0
[pid 2866] close(0) = 0
[pid 2866] openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0
[pid 2866] dup2(0, 0) = 0
[pid 2866] close(1) = 0
[pid 2866] close(2) = 0
[pid 2866] openat(AT_FDCWD, "/var/log/guix-daemon.log", O_WRONLY|O_CREAT|O_APPEND, 0640) = 1
[pid 2866] dup2(1, 1) = 1
[pid 2866] dup2(1, 2) = 2
[pid 2866] execve("/gnu/store/bxnkqnpbf4q4z6245b61wgpm8gkr9nj1-guix-1.3.0-29.9e46320/bin/guix-daemon", ["/gnu/store/bxnkqnpbf4q4z6245b61w"..., "--build-users-group", "guixbuild", "--max-silent-time", "0", "--timeout", "0", "--log-compression", "gzip", "--discover=yes", "--substitute-urls", "https://substitutes.nonguix.org "...], 0x7fccf71fa480 /* 3 vars */) = 0
--8<---------------cut here---------------end--------------->8---
This happens because the other end of the file descriptor happens to be
inherited by 2866, which will never close it because it just execs
guix-daemon.
This is fixed by 6abdcef4a68e98f538ab69fde096adc5f5ca4ff4; the log
contains extra details.
Thanks!
Ludo’.
next prev parent reply other threads:[~2022-09-20 22:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 13:05 bug#57827: Shepherd 0.9.2 possible regressions Mathieu Othacehe
2022-09-16 7:35 ` Christopher Baines
2022-09-16 8:35 ` Mathieu Othacehe
2022-09-19 6:41 ` Mathieu Othacehe
2022-09-20 17:30 ` Ludovic Courtès [this message]
2022-09-24 12:04 ` Mathieu Othacehe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r106lzqq.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=57827-done@debbugs.gnu.org \
--cc=othacehe@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.