From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jelle Licht Subject: Re: Improving Shepherd Date: Sat, 10 Feb 2018 14:34:21 +0100 Message-ID: References: <871si8bc5g.fsf@zancanaro.id.au> <877errn23f.fsf@zancanaro.id.au> <871shzeg8m.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a114e51b4d8ea8d0564dbb34b" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:50868) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ekVIV-0007ir-Gs for guix-devel@gnu.org; Sat, 10 Feb 2018 08:34:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ekVIT-0002yk-8U for guix-devel@gnu.org; Sat, 10 Feb 2018 08:34:30 -0500 Received: from mail.fsfe.org ([2001:aa8:ffed::3:102]:60619) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ekVIS-0002x8-Tl for guix-devel@gnu.org; Sat, 10 Feb 2018 08:34:29 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.fsfe.org (Postfix) with ESMTP id 8199163BB25 for ; Sat, 10 Feb 2018 14:34:26 +0100 (CET) Received: from mail.fsfe.org ([127.0.0.1]) by localhost (cavendish.fsfeurope.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6s1PIZ7Lg+i8 for ; Sat, 10 Feb 2018 14:34:26 +0100 (CET) Received: by mail-vk0-f44.google.com with SMTP id z9so6471308vkd.5 for ; Sat, 10 Feb 2018 05:34:25 -0800 (PST) In-Reply-To: <871shzeg8m.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: guix-devel --001a114e51b4d8ea8d0564dbb34b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey all, 2018-02-05 14:08 GMT+01:00 Ludovic Court=C3=A8s : > Hello! > > [...] > > Currently shepherd monitors SIGCHLD, and it=E2=80=99s not supposed to mis= s > those; in some cases it might handle them later than you=E2=80=99d expect= , which > means that in the meantime you see a zombie process, but otherwise it > seems to work. > > ISTR you reported an issue when using =E2=80=98shepherd --daemonize=E2=80= =99, right? > Perhaps the issue is limited to that mode? > Playing around with signalfd(2) for a bit, it seems that implementations are allowed to coalesce several 'pending' signals at the same time. In the case of SIGCHLD, this means the parent process might never be properly informed of *mutliple* signals being received around the same time. Could it have something to do with this problem as well? > > > Concurrency/parallelism - I think Jelle was planning to work on this, > > but I might be wrong about that. Maybe I volunteered? We're keen to > > see Shepherd starting services in parallel, where possible. This will > > require some changes to the way we start/stop services (because at the > > moment we just send a "start" signal to a single service to start it, > > which makes it hard to be parallel), and will require us to actually > > build some sort of real dependency resolution. Longer-term our goal > > should be to bring fibers into Shepherd, but Efraim mentioned that > > fibers doesn't compile on ARM at the moment, so we'll have to get that > > working first at least. > > I=E2=80=99d really like to see that happen. I=E2=80=99ve become more fam= iliar with > Fibers, and I think it=E2=80=99ll be perfect for the Shepherd (and we=E2= =80=99ll fix the > ARM build issue, no doubt.) > > One thing I=E2=80=99d like to do is to handle SIGCHLD via signalfd(2) ins= tead of > an actual signal handler like we do now. That would make it easy to > have signal handling part of the main event loop and thus, it would > integrate well with Fibers. > > It seems that signalfd(2) is Linux-only though, which is a bummer. The > solution might be to get over it and have it implemented on GNU/Hurd=E2= =80=A6 > (I saw this discussion: > ; I > suspect it=E2=80=99s within reach.) > Good news: signfalfd seems to work as far as I can see. I am not quite sure how to make it work consistently with guile ports yet though. To make use of signalfd, one normally masks signals so that these can handled via signalfd instead of the default signal handlers; any process forked start out with the same signal mask, so we would need to make sure to either reset the signal mask for spawned processes. > > [...] > > Ludo=E2=80=99. > > Jelle --001a114e51b4d8ea8d0564dbb34b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey all,

2018-02-05 14:08 GMT+01:00 Ludovic Court=C3=A8s <ludo@gnu.org&g= t;:
Hello!

[...]

Currently shepherd monitors SIGCHLD, and it=E2=80=99s not supposed t= o miss
those; in some cases it might handle them later than you=E2=80=99d expect, = which
means that in the meantime you see a zombie process, but otherwise it
seems to work.

ISTR you reported an issue when using =E2=80=98shepherd --daemonize=E2=80= =99, right?
Perhaps the issue is limited to that mode?

<= div>Playing around with signalfd(2) for a bit, it seems that implementation= s are
allowed to coalesce several 'pending' signals at the same= time. In the case
of SIGCHLD, this means the parent process might neve= r be properly
informed of *mutliple* signals being received around the = same time. Could
it have something to do with this problem as= well?

> Concurrency/parallelism - I think Jelle was planning to work on this,<= br> > but I might be wrong about that. Maybe I volunteered? We're keen t= o
> see Shepherd starting services in parallel, where possible. This will<= br> > require some changes to the way we start/stop services (because at the=
> moment we just send a "start" signal to a single service to = start it,
> which makes it hard to be parallel), and will require us to actually > build some sort of real dependency resolution. Longer-term our goal > should be to bring fibers into Shepherd, but Efraim mentioned that
> fibers doesn't compile on ARM at the moment, so we'll have to = get that
> working first at least.

I=E2=80=99d really like to see that happen.=C2=A0 I=E2=80=99ve becom= e more familiar with
Fibers, and I think it=E2=80=99ll be perfect for the Shepherd (and we=E2=80= =99ll fix the
ARM build issue, no doubt.)

One thing I=E2=80=99d like to do is to handle SIGCHLD via signalfd(2) inste= ad of
an actual signal handler like we do now.=C2=A0 That would make it easy to have signal handling part of the main event loop and thus, it would
integrate well with Fibers.

It seems that signalfd(2) is Linux-only though, which is a bummer.=C2=A0 Th= e
solution might be to get over it and have it implemented on GNU/Hurd=E2=80= =A6
(I saw this discussion:
<https://www.gnu.org/software/hurd/glibc/signal/signal_thread.html>; I
suspect it=E2=80=99s within reach.)

Goo= d news: signfalfd seems to work as far as I can see. I am not quite surehow to make it work consistently with guile ports yet though.

To m= ake use of signalfd, one normally masks signals so that these can
handl= ed via signalfd instead of the default signal handlers; any process
fork= ed start out with the same signal mask, so we would need to make
sure to= either reset the signal mask for spawned processes.

[...]


Ludo=E2=80=99.


Jelle

--001a114e51b4d8ea8d0564dbb34b--