Hey Ludo, On Fri, Mar 02 2018, Ludovic Courtès wrote: >> I am doing that. The problem is that when a service dies >> (crashes, quits, etc.) the `respawn?` option cannot be honoured >> because shepherd is not notified that the process has >> terminated (because it never receives a SIGCHLD for the forked >> pid). My patch polls for the processes we expect, to make up >> for the lack of notification. > > I see. > > Actually, thinking more about it, we should be using > PR_SET_CHILD_SUBREAPER from prctl(2), which is designed exactly > for that. Excellent! This is exactly the information that I needed. This is what I've been looking for, but without enough knowledge to be able to find it. Thanks! > So what about this plan: > > 1. Add FFI bindings in (shepherd system) for prctl(2). We > should arrange for it to throw to 'system-error when the > ‘prctl’ symbol is missing, as is the case on GNU/Hurd. Are we okay with having this just not work on GNU/Hurd (or kernels older than 3.4, according to the prctl manpage)? We could fall back to a polling approach if prctl isn't available? I don't really like the idea of this working on some kernels but not others, given that process supervision is one of the main jobs of shepherd. > 2. Use prctl/PR_SET_CHILD_SUBREAPER in ‘exec-command’. Here we > must ‘catch-system-error’ around that call to cater to > GNU/Hurd. Why would we need to set it in exec-command? It looks like it modifies the state of the calling process, which means we'd want to set it in the shepherd service, not in each of the child processes. > That would address the main issue without having to resort to > polling. Respawning will work only when #:pid-file is used > though, but that’s already an improvement. > > Thoughts? I'll try to get this working in the next few days. Hopefully you'll see a patch from me soon. Carlo