From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: 57922@debbugs.gnu.org
Subject: bug#57922: Shepherd doesn't seem to correctly handle waitpid itself
Date: Mon, 19 Sep 2022 00:29:44 -0400 [thread overview]
Message-ID: <874jx4q953.fsf@gmail.com> (raw)
Hi,
I've tried to determine why a workaround in the jami-service-type is
required in the 'stop' slot to avoid failures in 'herd restart jami',
and haven't quite found the culprit, but it appears to me that:
1. waipid is only called in one place in Shepherd, which is in the
handle-SIGCHLD procedure in (shepherd service), which does not
specifically wait for an exact PID but rather does:
(waitpid* WAIT_ANY WNOHANG), which is waitpid with some special handling
in the case a system-error exception is thrown with an ECHILD or EINTR
error number.
This doesn't strike me as a strong guarantee that waitpid occurs when
stop is called, because:
1. It requires to be installed in the signal handlers for each
processes, with something like:
--8<---------------cut here---------------start------------->8---
(unless %sigchld-handler-installed?
(sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
(set! %sigchld-handler-installed? #t))
--8<---------------cut here---------------end--------------->8---
Done for fork+exec-command and make-inetd-forkexec-constructor, but not
for make-forkexec-constructor/container, AFAICT;
2. it has the WNOHANG flag, which means the stop simply does a kill the
the signal handling weakly (because of WNOHANG) waits on it, which means
the start may begin before the process was actually completely
terminated.
Here's a small reproducer to apply on our code base:
--8<---------------cut here---------------start------------->8---
modified gnu/services/telephony.scm
@@ -685,13 +685,7 @@ (define (archive-name->username archive)
;; Finally, return the PID of the daemon process.
daemon-pid))
- (stop
- #~(lambda (pid . args)
- (kill pid SIGKILL)
- ;; Wait for the process to exit; this prevents overlapping
- ;; processes when issuing 'herd restart'.
- (waitpid pid)
- #f))))))))
+ (stop #~(make-kill-destructor))))))))
(define jami-service-type
(service-type
--8<---------------cut here---------------end--------------->8---
Then run 'make check-system TESTS=jami-provisioning' to see new
failures, or if you want to investigate manually the system:
--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix system vm --no-grafts --no-offload --no-graphic \
-e '(@@ (gnu tests telephony) %jami-os-provisioning)'
$ /gnu/store/rxi7c14hga62qslb0sr6nac9qnkxr0nn-run-vm.sh -m 1G -smp 4 \
-nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22
# Connect to the QEMU VM:
$ ssh root@localhost -p10022
root@jami ~# herd restart jami
Service jami has been stopped.
herd: exception caught while executing 'start' on service 'jami':
dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying")
root@jami ~# herd status jami
Status of jami:
It is stopped.
It is enabled.
Provides (jami).
Requires (jami-dbus-session).
Conflicts with ().
Will be respawned.
root@jami ~# pgrep jami
--8<---------------cut here---------------end--------------->8---
Thanks,
Maxim
next reply other threads:[~2022-09-19 4:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 4:29 Maxim Cournoyer [this message]
2022-09-20 7:31 ` bug#57922: Shepherd doesn't seem to correctly handle waitpid itself Josselin Poiret via Bug reports for GNU Guix
2022-09-23 6:33 ` Ludovic Courtès
2022-09-23 17:49 ` Maxim Cournoyer
2022-09-24 3:32 ` Maxim Cournoyer
2022-09-24 8:09 ` Josselin Poiret via Bug reports for GNU Guix
2022-09-24 16:30 ` Ludovic Courtès
2022-09-26 0:12 ` Maxim Cournoyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874jx4q953.fsf@gmail.com \
--to=maxim.cournoyer@gmail.com \
--cc=57922@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).