From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48680) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1erWqy-0003eZ-5L for guix-patches@gnu.org; Thu, 01 Mar 2018 17:39:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1erWqt-0004QO-4y for guix-patches@gnu.org; Thu, 01 Mar 2018 17:39:08 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:60168) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1erWqs-0004P7-Uz for guix-patches@gnu.org; Thu, 01 Mar 2018 17:39:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1erWqs-0003Vf-FH for guix-patches@gnu.org; Thu, 01 Mar 2018 17:39:02 -0500 Subject: [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services Resent-Message-ID: References: <878tbe9jvx.fsf@zancanaro.id.au> <87y3jcu5v5.fsf@gnu.org> From: Carlo Zancanaro In-reply-to: <87y3jcu5v5.fsf@gnu.org> Date: Fri, 02 Mar 2018 09:37:50 +1100 Message-ID: <87d10nwhfl.fsf@zancanaro.id.au> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 30637@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hey Ludo, On Wed, Feb 28 2018, Ludovic Court=C3=A8s wrote: >> The problem is that shepherd, when run as a user process, can=20 >> "lose" >> services which fork away. Shepherd can still kill them, but a=20 >> SIGCHLD >> won't be delivered if they die, so shepherd can't=20 >> restart/disable >> them. My prime example is emacs, which I run with --daemon. If=20 >> I then >> kill emacs, shepherd will still think that it is running. > > There are two issues here, I think. > > 1. shepherd cannot lose SIGCHLD: if a process dies immediately=20 > once > it=E2=80=99s been spawned, as is the case with =E2=80=9Cemacs --daem= on=E2=80=9D or=20 > any > other daemon-style program, it should receive SIGCHLD and=20 > process > it. Yeah, that's true, but the problem is that shepherd only processes=20 the SIGCHLD if there is a service with its `running` slot set to=20 the pid. When emacs forks, the original process may have its=20 SIGCHLD handled, but that doesn't affect shepherd's service state=20 (as it shouldn't, because it's using #:pid-file to track the=20 forked process). > 2. shepherd currently can=E2=80=99t do much with real daemons. So=20 > what we do > in GuixSD is to either start programs in non-daemon mode,=20 > when > that=E2=80=99s an option, or pass #:pid-file to retrieve the forked= =20 > process > PID. I think you should do one of these as well. I am doing that. The problem is that when a service dies (crashes,=20 quits, etc.) the `respawn?` option cannot be honoured because=20 shepherd is not notified that the process has terminated (because=20 it never receives a SIGCHLD for the forked pid). My patch polls=20 for the processes we expect, to make up for the lack of=20 notification. I would much rather it receive an event/signal to=20 notify that the forked process has died, but I don't know how to=20 do that in a robust, portable way so I chose to poll instead. If you look at my test case in tests/respawn-service.sh (which can=20 be read in its entirety in the diff attached to my previous email)=20 you can see the problem that this patch solves. The test will fail=20 without the rest of my patch, but will pass with them (guix build=20 container issue notwithstanding). Carlo --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlqYgL8ACgkQqdyPv9aw IbzJhxAAoJLq07/eAo2bipD6ie8voBqtBfrlqn5PbwcAiHUtN1WEJQ8xyl0fW3bt 7XNbH8DNIkHWXj1KTrCVRVgz8rOtCav1APoHh8n3CMx+jhqQV2dtAON/Fizj1zHB rKYooZt1WprbWzDm6KRm4NokwIFNAYG2vN19DNqtuhHj9rC+tv7ZIN4jSJKb6vdY 5mwWwy1nE1yNI6CUl6x/oIQL0oNfnKKFM0SDLYKihgbjTIjykRVbMOJnDdze+MYI rbrsWmrZaese2Jj2NC2ZiEBFrKx2igvieu2HEI+zijV07QiQG+XPR0pdnf/lCng4 b/wVOZBunW4ouSVvOIaSqo3adOyu6J1mZEK7MXf1+DbpZB8TYB5SeSkEJ0Qlhn8U Chq/Xipggpf4Wwpd6Fu6jrNwQEAk3F+pFqqpDrQLo3bFGlb26hhJpyAScDQ3UGAY ktCPwEyAsIXoiTQtsYEoVaysnmxW3ZnCaSF/zwB56gV50bD45TUGgTcf2R7W1TkH etUHfUQ74P8G2OMuvF1VkuuUFCxjq1bDUgZkxNWeQhBqjhnZy9B6XtUTcas1We/h bMlllsBU3JFVPbGtLbsTTC2FJof7yzyrcJ3XCMew5xutY4GsZKRauNNdYR/5ybEW z9X38N2oWQ8l6U9t4leKjrxsEk5AvFdwBnfUj8CoEFOFRaD1VaI= =TndF -----END PGP SIGNATURE----- --=-=-=--