Re: A Critique of Shepherd Design

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: raid5atemyhomework <raid5atemyhomework@protonmail.com>
To: Mark H Weaver <mhw@netris.org>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: A Critique of Shepherd Design
Date: Sat, 20 Mar 2021 11:10:53 +0000	[thread overview]
Message-ID: <tE54mZQSSBudziwsiLfC3ceNqV5cpnpRsdxR6PJoL-JX1GrECLAZSJwcPG4tWqOT77SemiQ4hBtJj1j-n7db5F9Ss9RD46ITQrmaWYT0PmU=@protonmail.com> (raw)
In-Reply-To: <87y2eil8eb.fsf@netris.org>

Good rmoning Mark,

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com writes:
>
> > GNU Shepherd is the `init` system used by GNU Guix. It features:
> >
> > -   A rich full Scheme language to describe actions.
> > -   A simple core that is easy to maintain.
> >
> > However, in this critique, I contend that these features are bugs.
> > The Shepherd language for describing actions on Shepherd daemons is a
> > Turing-complete Guile language. Turing completeness runs afoul of the
> > Principle of Least Power. In principle, all that actions have to do
> > is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.
>
> These 4 calls are already enough to run "sleep 100000000000" and wait
> for it to finish, or to rebuild your Guix system with an extra patch
> added to glibc.

I agree.  But this mechanism is intended to avoid stupid mistakes like what I committed, not protect against an attacker who is capable of invoking `guix system reconfigure` on arbitrary Scheme code (and can easily wrap anything nefarious in any `unsafe-turing-complete` or `without-static-analysis` escape mechanism).  Seatbelts, not steel walls.

>
> > Yet the language is a full Turing-complete language, including the
> > major weakness of Turing-completeness: the inability to solve the
> > halting problem.
> > The fact that the halting problem is unsolved in the language means it
> > is possible to trivially write an infinite loop in the language. In
> > the context of an `init` system, the possibility of an infinite loop
> > is dangerous, as it means the system may never complete bootup.
>
> Limiting ourselves to strictly total functions wouldn't help much here,
> because for all practical purposes, computing 10^100 digits of Pi is
> just as bad as an infinite loop.

Indeed.  Again, seatbelts, not steel walls.  It's fairly difficult to commit a mistake that causes you to accidentally write a program that computes 10^100 digits of pi, not so difficult to have a brain fart and use `(- count 1)` instead of `(+ count 1)` because you were wondering idly whether an increment or a decrement loop would be more Scemey or if both are just as Schemey as the other.

What I propose would protect against the latter (a much more likely mistake), as in-context the recursive loop would be flagged since the recursion would be flagged due to being a call to a function that is not a member of a whitelist.  Hopefully getting recursive loops flagged would make the sysad writing `configuration.scm` look for the "proper" way to wait for an event to be true, and hopefully lead to them discovering the (hopefully extant) documentation on whatever domain-specific language we have for waiting for the event to be true instead of rolling their own.

> That said, I certainly agree that Shepherd could use improvement, and
> I'm glad that you've started this discussion.
>
> At a glance, your idea of having Shepherd do more within subprocesses
> looks promising to me, although this is not my area of expertise.

An issue here is that we sometimes pass data across Shepherd actions using environment variables, which do not cross process boundaries.  Xref. the `set-http-proxy` of `guix-daemon`; the environment variable is used as a global namespace that is accessible from both the `set-http-proxy` and `start` actions.

On the other hand, arguably the environment variable table is a global resource shared amongst multiple shepherd daemons.  This technique in general may not scale well for large numbers of daemons; environment variable name conflicts may cause subtle problems later.  I think it would be better if in addition to the "value" (typically the PID) each Shepherd service also had a `settings` (which can be used to contain anything that satisfies `(lambda (x) (equal? x (read (print x))))` so that it can be easily serialized across each subprocess launched by each action) that can be read and modified by each action.  Then the `set-http-proxy` action would update this `settings` field for the shepherd service, then queue up a `restart` action.  It could by convention be an association list.

This would also persist the `http_proxy` setting, BTW --- currently if you `herd set-http-proxy guix-daemon <whatever>` and then `herd restart guix-daemon` later, the HTTP proxy is lost (since the environment variable is cleared after `set-http-proxy` restarts the `guix-daemon`).  In short, this `set-http-proxy` example looks like a fairly brittle hack anyway, and maybe worth avoiding as a pattern.

Then there's actions that invoke other actions.  From a cursory glance at the Guix code it looks like only Ganeti and Guix-Daemon have actions that invoke actions, and they only invoke actions on their own Shepherd services.  It seems to me safe for an action invoked in another action of the same service to *not* spawn a new process, but to execute as the same process.  Not sure how safe it would be to allow one shepherd service to invoke an action on another shepherd service --- but then the `start` action of any service may cause other services it requires to be started as well, so we still do need to figure out what subprocesses to launch or not launch.

Or maybe each Shepherd service has its own subprocess that is its own mainloop, and the "main" Shepherd process mainloop "just" serves as a switching center to forward commands to each service's mainloop-subprocess, and also incidentally monitors per-service mainloop-subprocess that are not responding fast enough (and possibly decide to kill those mainloops and all its children, then disable that service).  This would make each service's environment variables a persistent but local store that is specific to each service and makes its use in `guix-daemon` safe, and the `set-http-proxy` would simply not clear the env vars so that the setting persists.  This allows Shepherd to remain responsive at all times even if some action of some Shepherd service enters an infloop or 10^100 pi digits condition; it could even have `herd status` report the number of pending unhandled commands for each service to inform the sysad about possible problems with specific services.

Thanks
raid5atemyhomework

next prev parent reply	other threads:[~2021-03-20 11:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
2021-03-19 21:49 ` Maxime Devos
2021-03-20  2:42   ` raid5atemyhomework
2021-03-20  4:02 ` Maxim Cournoyer
2021-03-20  5:07 ` Mark H Weaver
2021-03-20 11:10   ` raid5atemyhomework [this message]
2021-03-20 16:58 ` Ludovic Courtès
2021-03-21  0:22   ` raid5atemyhomework
2021-03-22 17:02     ` Ludovic Courtès
2021-03-24 14:29       ` raid5atemyhomework
2021-03-24 14:48       ` raid5atemyhomework
2021-03-22 13:42 ` raingloom
2021-03-22 17:50   ` Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='tE54mZQSSBudziwsiLfC3ceNqV5cpnpRsdxR6PJoL-JX1GrECLAZSJwcPG4tWqOT77SemiQ4hBtJj1j-n7db5F9Ss9RD46ITQrmaWYT0PmU=@protonmail.com' \
    --to=raid5atemyhomework@protonmail.com \
    --cc=guix-devel@gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).