unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#57922: Shepherd doesn't seem to correctly handle waitpid itself
@ 2022-09-19  4:29 Maxim Cournoyer
  2022-09-20  7:31 ` Josselin Poiret via Bug reports for GNU Guix
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Cournoyer @ 2022-09-19  4:29 UTC (permalink / raw)
  To: 57922

Hi,

I've tried to determine why a workaround in the jami-service-type is
required in the 'stop' slot to avoid failures in 'herd restart jami',
and haven't quite found the culprit, but it appears to me that:

1. waipid is only called in one place in Shepherd, which is in the
handle-SIGCHLD procedure in (shepherd service), which does not
specifically wait for an exact PID but rather does:

(waitpid* WAIT_ANY WNOHANG), which is waitpid with some special handling
in the case a system-error exception is thrown with an ECHILD or EINTR
error number.

This doesn't strike me as a strong guarantee that waitpid occurs when
stop is called, because:

1. It requires to be installed in the signal handlers for each
processes, with something like:

--8<---------------cut here---------------start------------->8---
  (unless %sigchld-handler-installed?
    (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
    (set! %sigchld-handler-installed? #t))
--8<---------------cut here---------------end--------------->8---

Done for fork+exec-command and make-inetd-forkexec-constructor, but not
for make-forkexec-constructor/container, AFAICT;

2. it has the WNOHANG flag, which means the stop simply does a kill the
the signal handling weakly (because of WNOHANG) waits on it, which means
the start may begin before the process was actually completely
terminated.

Here's a small reproducer to apply on our code base:

--8<---------------cut here---------------start------------->8---
modified   gnu/services/telephony.scm
@@ -685,13 +685,7 @@ (define (archive-name->username archive)
 
                     ;; Finally, return the PID of the daemon process.
                     daemon-pid))
-               (stop
-                #~(lambda (pid . args)
-                    (kill pid SIGKILL)
-                    ;; Wait for the process to exit; this prevents overlapping
-                    ;; processes when issuing 'herd restart'.
-                    (waitpid pid)
-                    #f))))))))
+               (stop #~(make-kill-destructor))))))))
 
 (define jami-service-type
   (service-type
--8<---------------cut here---------------end--------------->8---

Then run 'make check-system TESTS=jami-provisioning' to see new
failures, or if you want to investigate manually the system:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix system vm --no-grafts --no-offload --no-graphic \
   -e '(@@ (gnu tests telephony) %jami-os-provisioning)'

$ /gnu/store/rxi7c14hga62qslb0sr6nac9qnkxr0nn-run-vm.sh -m 1G -smp 4 \
  -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22

# Connect to the QEMU VM:
$ ssh root@localhost -p10022

root@jami ~# herd restart jami
Service jami has been stopped.
herd: exception caught while executing 'start' on service 'jami':
dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying")
root@jami ~# herd status jami
Status of jami:
  It is stopped.
  It is enabled.
  Provides (jami).
  Requires (jami-dbus-session).
  Conflicts with ().
  Will be respawned.
root@jami ~# pgrep jami
--8<---------------cut here---------------end--------------->8---

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-26  0:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-19  4:29 bug#57922: Shepherd doesn't seem to correctly handle waitpid itself Maxim Cournoyer
2022-09-20  7:31 ` Josselin Poiret via Bug reports for GNU Guix
2022-09-23  6:33   ` Ludovic Courtès
2022-09-23 17:49     ` Maxim Cournoyer
2022-09-24  3:32     ` Maxim Cournoyer
2022-09-24  8:09       ` Josselin Poiret via Bug reports for GNU Guix
2022-09-24 16:30         ` Ludovic Courtès
2022-09-26  0:12           ` Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).