On 20-07-2022 23:39, Ludovic Courtès wrote: > Hi! > > We’ve just had a bad experience with the nginx service on berlin, where > ‘herd restart nginx’ would cause shepherd to get stuck forever in > ‘waitpid’ on the process that was supposed to start nginx. > > The details are unclear, but one thing is clear is that using ‘waitpid’ > (either directly or indirectly with ‘system*’, which is what > ‘nginx-service-type’ does) is not great: > > 1. In the best case, shepherd (as of 0.9.1) is stuck while ‘system*’ > is in ‘waitpid’ waiting for child process completion (“stuck” as > in: doesn’t do anything, not even answering ‘herd’ requests or > inetd connections.) > > 2. I don’t think that can happen with ‘system*’ (because it’s in C), > but generally speaking, there’s a possibility that shepherd’s event > loop will handle child process termination before some other > user-made ‘waitpid’ call does. > > Anyway, that’s a bad situation. > > So I can think of several ways to address it: > > 1. Change the nginx service ‘stop’ method to just > (make-kill-destructor), which should work just as well as invoking > “nginx -s stop”. > > 2. Have Shepherd provide a replacement for ‘system*’. Why Shepherd and not guile fibers? Is this a Shepherd-specific problem? > > Thoughts? 3. Make waitpid (or a variant that does what we need) interact well with guile-fibers, like how 'accept' is doesn't inhibit switching to another fiber. There some Linux API with signal handlers or pid fds or such that might be useful here, though I don't recall the name. Presumably something similar can be done for the Hurd, though some C glue may be needed to access the right Hurd APIs if the signal handler API isn't portable. Alternatively: 4. Do the waitpid in a separate thread (needs work-around for the multi-threaded fork problem, probably C things? Or modifying Guile and maybe glibc to avoid async-unsafe things or make more things async-safe or whatever the appropriate ...-safe is here.) If not a Guile Fibers interaction problem, then the asynchronous signal handler API might still be useful. Greetings, Maxime