From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42076) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1escSE-0005TB-Ll for guix-patches@gnu.org; Sun, 04 Mar 2018 17:50:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1escSA-0004iu-OY for guix-patches@gnu.org; Sun, 04 Mar 2018 17:50:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:36919) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1escSA-0004iU-LH for guix-patches@gnu.org; Sun, 04 Mar 2018 17:50:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1escSA-00017Q-Ab for guix-patches@gnu.org; Sun, 04 Mar 2018 17:50:02 -0500 Subject: [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services Resent-Message-ID: From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) References: <878tbe9jvx.fsf@zancanaro.id.au> <87y3jcu5v5.fsf@gnu.org> <87d10nwhfl.fsf@zancanaro.id.au> <87r2p2izgz.fsf@gnu.org> <87371ihjj2.fsf@zancanaro.id.au> <87po4mhcn2.fsf@gnu.org> <87inadr3np.fsf@zancanaro.id.au> <87h8px9od8.fsf@gnu.org> <87woys9961.fsf@zancanaro.id.au> <87371f4hkf.fsf@gnu.org> <871sgzpiy9.fsf@zancanaro.id.au> Date: Sun, 04 Mar 2018 23:49:06 +0100 In-Reply-To: <871sgzpiy9.fsf@zancanaro.id.au> (Carlo Zancanaro's message of "Mon, 05 Mar 2018 09:35:58 +1100") Message-ID: <87o9k33199.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Carlo Zancanaro Cc: 30637@debbugs.gnu.org Carlo Zancanaro skribis: > On Sun, Mar 04 2018, Ludovic Court=C3=A8s wrote: >> Good catch. We could add this in gnu-build-system.scm in >> core-updates, though it=E2=80=99s no big deal anyway since these are >> throw-away environments. >> >> Thoughts? > > The current forking-service.sh test fails in that environment, so we > won't be able to build shepherd on Hurd, or systems with Linux pre > 3.4. This is already the case without my third commit, though, because > the prctl fallback logic isn't in place yet. > > I think we should add it in core-updates. It does affect the behaviour > of processes within the build environment, and can lead to test > failures if people rely on pid 1 to reap zombie processes (which, from > what I understand, they should be able to). This could even be leading > to test failures in other packages which we have just disabled. Yeah, makes sense. >>> + (match (select (list sock) (list) (list) 0.5) >>> + (((sock) _ _) >>> + (read-from sock)) >>> + (_ >>> + #f)) >>> + (poll-services) >> >> Here everyone ends up paying some overhead (the 0.5 second timeout), >> which isn=E2=80=99t great. >> >> How about something like: >> >> (define poll-services >> (and (not (=3D 1 (getpid))) >> =E2=80=A6)) >> >> (match (select (list sock) '() '() (if poll-services 0.5 0)) >> =E2=80=A6) > > The wait for 0.5 seconds is only an upper-bound for the > timeout. Changing it to a 0 would actually be worse, because it would > spend longer polling for running services. The `select` procedure > waits for `sock` to be ready to read from. When it's ready it returns > immediately, but if `sock` takes more than 0.5 seconds to be ready > then it will return anyway (and take the second branch in the match, > which does nothing). Sorry, I didn=E2=80=99t mean 0 but rather #f (indefinite wait). My point is: we shouldn=E2=80=99t wake up every 0.5 seconds for no reason. = IOW, we should wake up periodically only in the non-pid-1-no-prctl case. Does that make sense? Ludo=E2=80=99.