From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1erk1h-0003v0-Bu for guix-patches@gnu.org; Fri, 02 Mar 2018 07:43:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1erk1e-0004vB-8t for guix-patches@gnu.org; Fri, 02 Mar 2018 07:43:05 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:60514) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1erk1e-0004v5-5D for guix-patches@gnu.org; Fri, 02 Mar 2018 07:43:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1erk1d-0005fr-VZ for guix-patches@gnu.org; Fri, 02 Mar 2018 07:43:02 -0500 Subject: [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services Resent-Message-ID: From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) References: <878tbe9jvx.fsf@zancanaro.id.au> <87y3jcu5v5.fsf@gnu.org> <87d10nwhfl.fsf@zancanaro.id.au> <87r2p2izgz.fsf@gnu.org> <87371ihjj2.fsf@zancanaro.id.au> Date: Fri, 02 Mar 2018 13:42:41 +0100 In-Reply-To: <87371ihjj2.fsf@zancanaro.id.au> (Carlo Zancanaro's message of "Fri, 02 Mar 2018 21:13:53 +1100") Message-ID: <87po4mhcn2.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Carlo Zancanaro Cc: 30637@debbugs.gnu.org Hello! Carlo Zancanaro skribis: > On Fri, Mar 02 2018, Ludovic Court=C3=A8s wrote: [...] >> So what about this plan: >> >> 1. Add FFI bindings in (shepherd system) for prctl(2). We should >> arrange for it to throw to 'system-error when the =E2=80=98prctl=E2=80= =99 symbol >> is missing, as is the case on GNU/Hurd. > > Are we okay with having this just not work on GNU/Hurd (or kernels > older than 3.4, according to the prctl manpage)? We could fall back to > a polling approach if prctl isn't available? I don't really like the > idea of this working on some kernels but not others, given that > process supervision is one of the main jobs of shepherd. Yeah, I agree. The =E2=80=98prctl=E2=80=99 procedure itself should simply throw to 'system= -error on GNU/Hurd. But then, in (shepherd), we could add the polling thing when (not (string-contains %host-type "linux")). WDYT? >> 2. Use prctl/PR_SET_CHILD_SUBREAPER in =E2=80=98exec-command=E2=80=99.= Here we >> must =E2=80=98catch-system-error=E2=80=99 around that call to cater to = GNU/Hurd. Actually this should be done in =E2=80=98fork+exec-command=E2=80=99 in the = child process. > Why would we need to set it in exec-command? It looks like it modifies > the state of the calling process, which means we'd want to set it in > the shepherd service, not in each of the child processes. We want to set the =E2=80=9Creaper=E2=80=9D of child processes to Shepherd = itself, so we must do that in child processes. The shepherd process cannot be its own reaper I suppose. BTW, we should do PR_SET_CHILD_SUBREAPER only when (not (=3D 1 (getpid))). > I'll try to get this working in the next few days. Hopefully you'll > see a patch from me soon. Awesome, thank you! Ludo=E2=80=99.