unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: raid5atemyhomework <raid5atemyhomework@protonmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: A Critique of Shepherd Design
Date: Sun, 21 Mar 2021 00:22:09 +0000	[thread overview]
Message-ID: <jbtNChDnTkYrPwwsIrdfvBGRTenkS_iuSJQXERarnI--EYEEet5GF6EOx0hecxa59D42QObhuborYQ9xL15yZPQlnJFPzVMmaK7UCcwhUHY=@protonmail.com> (raw)
In-Reply-To: <87a6qx7of5.fsf@gnu.org>

Hello Ludo',

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com skribis:
>
> > Now, let us combine this with the second feature (really a bug): GNU
> > shepherd is a simple, single-threaded Scheme program. That means that
> > if the single thread enters an infinite loop (because of a Shepherd
> > service description that entered an infinite loop), then Shepherd
> > itself hangs.
>
> You’re right that it’s an issue; in practice, it’s okay because we pay
> attention to the code we run there, but obviously, mistakes could lead
> to the situation you describe.
>
> It’s a known problem and there are plans to address it, discussed on
> this list a few times before. The Shepherd “recently” switched to
> ‘signalfd’ for signal handling in the main loop, with an eye on making
> the whole loop event-driven:
>
> https://issues.guix.gnu.org/41507
>
> This will address this issue and unlock things like “socket activation”.
>
> That said, let’s not lie to ourselves: the Shepherd’s design is
> simplistic. I think that’s okay though because there’s a way to address
> the main issues while keeping it simple.


I'm not sure you can afford to keep it simple.  Consider: https://issues.guix.gnu.org/47253

In that issue, the `networking` provision comes up potentially *before* the network is, in fact, up.  This means that other daemons that require `networking` could potentially be started before the network connection is up.

One example of such a daemon is `transmission-daemon`.  This daemon will bind itself to port 9091 so you can control it.  Unfortunately, if it gets started while network is down, it will be unable to bind to 9091 (so you can't control it) but still keep running.  On my system that means that on reboot I have to manually `sudo herd restart trannsmission-daemon`.

In another example, I have a custom daemon that I have set up to use the Tor proxy over 127.0.0.1:9050.  It requires both `networking` and `tor`.  When it starts after `networking` comes up but before the actual network does, it dies because it can't access the proxy at 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).  Then shepherd respawns it, then it dies again (network still not up) enough times that it gets disabled.  This means that on reboot I have to manually `sudo herd enable raid5atemyhomework-custom-daemon` and `sudo herd restart raid5atemyhomework-custom-daemon`.

On SystemD-based systems, there's a `NetworkManager-network-online.service` which just calls `nm-online -s -q --timeout=30`.  This delays network-requiring daemons until after the network is in fact actually up.

However in https://issues.guix.gnu.org/47253#1 Mark points out this is undesirable in the Guix case since it could potentially stall the (single-threaded) bootup process for up to 30 seconds if the network is physically disconnected, a bad UX for desktop and laptop users (who might still want to run `transmission-daemon`, BTW) because it potentially blocks the initialization of X and make the computer unusable for such users for up to 30 seconds after boot.  I note that I experienced such issues in some very old Ubuntu installations, as well.

SystemD can afford to *always* have `nm-online -s -q --timeout=30` because it's concurrent.  The `network-online.service` will block, but other services like X that don't ***need*** the network will continue booting.  So the user can still get to a usable system even if the boot isn't complete because the network isn't up yet due to factors beyond the control of the operating system.


Switching to a concurrent design for Shepherd --- *any* concurrent design --- is probably best done sooner rather than later, because it risks strongly affecting customized `configuration.scm`s like mine that have almost a half dozen custom Shepherd daemons.


Thanks
raid5atemyhomework


  reply	other threads:[~2021-03-21  0:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
2021-03-19 21:49 ` Maxime Devos
2021-03-20  2:42   ` raid5atemyhomework
2021-03-20  4:02 ` Maxim Cournoyer
2021-03-20  5:07 ` Mark H Weaver
2021-03-20 11:10   ` raid5atemyhomework
2021-03-20 16:58 ` Ludovic Courtès
2021-03-21  0:22   ` raid5atemyhomework [this message]
2021-03-22 17:02     ` Ludovic Courtès
2021-03-24 14:29       ` raid5atemyhomework
2021-03-24 14:48       ` raid5atemyhomework
2021-03-22 13:42 ` raingloom
2021-03-22 17:50   ` Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='jbtNChDnTkYrPwwsIrdfvBGRTenkS_iuSJQXERarnI--EYEEet5GF6EOx0hecxa59D42QObhuborYQ9xL15yZPQlnJFPzVMmaK7UCcwhUHY=@protonmail.com' \
    --to=raid5atemyhomework@protonmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).