Re: A Critique of Shepherd Design

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: raid5atemyhomework <raid5atemyhomework@protonmail.com>
To: Maxime Devos <maximedevos@telenet.be>
Cc: "guix-devel@gnu.org" <guix-devel@gnu.org>
Subject: Re: A Critique of Shepherd Design
Date: Sat, 20 Mar 2021 02:42:38 +0000	[thread overview]
Message-ID: <i2RYnzzEDl6LOUMEsZlhEAuqjNObTyJM7gpg2en5tQEAvqCjIBM2JkocbhWoNxSIukHZNCT3ATP2roxZvyQ6J8w4LPDw0F-qkLq3ZW4f2Ws=@protonmail.com> (raw)
In-Reply-To: <6286d7101ae8219a539bc34437ab46bd48d38476.camel@telenet.be>

Hello Maxime,

> Multi-threading seems complicated (but can be worth it). What work would you put
> on which thread (please give us some concrete to reason about, ‘thread X does A,
> and is created on $EVENT ...’)? A complication is that "fork" is ‘unsafe’ in a
> process with multiple threads (I don't know the details). Maybe that could be
> worked around. (E.g. by using system* on a helper binary that closes unneeded
> file descriptors and sets the umask .... and eventually uses exec to run the
> service binary.)

Perhaps the point is not so much to be *multi-threaded* but to be *concurrent*, and whether to be multi-*thread* or multi-*process* or multi-*fiber* is a lower-level detail.

For example, we can consider that for 99% of shepherd services, the `start` action returns a numeric PID, and the remaining ones return `#t`.  Rather than call the `start` action in the same thread/process/fiber that implements the Shepherd main loop:

* Create a pipe.
* `fork`.
  * On the child:
    * Close the pipe output end, make the input end cloexec.
    * Call the `start` action.
    * If an error is thrown, print out `(error <>)` with the serialization of the error to the pipe.
    * Otherwise, print out `(return <>)` with the return value of the `start` action.
  * On the parent:
    * Close the pipe input end, and add the output end and the pid of the forked child to one of the things we will poll in the mainloop.
    * Return to the mainloop.
* In the mainloop:
  * `poll` (or `select`, or whatever) the action pipes in addition to the `herd` command socket(s).
  * If an action pipe is ready, read it in:
    * If it EOFed then the sub-process died or execed without returning anything, treat as an error.
    * Similarly if unable to `read` from it.
    * Otherwise deserialize it and react appropriately as to whether it is an error or a return of some value.
  * If an action pipe has been around too long, kill the pid (and all child pids) and close the pipe and treat as an error.

I mean --- single-threaded/process/fibered, non-concurrent `init` systems have fallen by the wayside because of their brittleness.  SystemD is getting wide support precisely because it's an excellent concurrent `init` mechanism that is fairly tolerant of mistakes; a problem in one service unit is generally isolated in that service unit and its dependents and will not cause problems with the other service units.

Presumably, `fork`ing within a `fork`ed process has no issues since that is what is normally done in Unix anyway, it's threads that are weird.


> That's a bit of an over-simplification. At least in a singly-threaded Shepherd,
> waitpid could not be used by an individual service, as it
> (1) returns not often
> enough (if a process of another service exits and this action was waiting on
> a single pid) (2) too often (likewise,
> and this action was ‘just waiting’
> (3) potentially never (imagine every process has been started and an action
> was waiting for one to exit,
> and now the user tries to reboot. Then shepherd
> daemon would never accept the connection from the user!).

Yes, I agree it is an oversimplification, but in any case, there should be some smaller subset of things that can be (typically) done.

Then any escape hatch like `unsafe-turing-complete` or `without-static-analysis` can retain the full power of the language for those times when you *really* do need it.

>
> And in a multi-threaded Shepherd, "fork" is unsafe. (Anyway, "fork" captures
> too much state, and no action should ever call "exec" without "fork".) Also,
> "exec" replaces too little state (e.g. open file descriptors, umask, root
> directory, current working directory ...).
>
> But perhaps that's just a call for a different set of primitives (which could
> be implemented on top of these system calls by shepherd, or directly on top
> of the relevant Hurd RPCs when running on GNU/Hurd.).

Yes.

> > Yet the language is a full Turing-complete language, including the major
> > weakness of Turing-completeness: the inability to solve the halting problem.
>
> IIUC, there are plans to perform the network configuration inside shepherd
> (search for guile-netlink). So arbitrary code still needs to be allowed.
> This is the Procedural (extensible) I'll referring to later.
>
> > The fact that the halting problem is unsolved in the language means it is
> > possible to trivially write an infinite loop in the language. In the context
> > of an `init` system, the possibility of an infinite loop is dangerous, as it
> > means the system may never complete bootup.
>
> I'm assuming the hypothetical infinite loop is in, say, the code for starting
> guix-daemon which isn't very essential (try "herd stop guix-daemon"!)
> (If there's an infinite loop is in e.g. udev-service-type then there's indeed
> a problem, then that infinite loop might as well be in eudev itself, so I'm
> only speaking of non-essential services blocking the boot process.)
>
> I'm not convinced we need Turing-incompleteness here, at least if the
> start / do-a-dance action / stop code of each service definition is run in
> a separate thread (not currently the case).
>
> > I have experienced this firsthand since I wrote a Shepherd service to launch
> > a daemon, and I needed to wait for the daemon initialization to complete.
> > My implementation of this had a bug that caused an infinite loop to be entered,
> > but would only tickle this bug when a very particular (persistent on-disk) state
> > existed.
> > I wrote this code about a month or so ago, and never got to test it until last week,
> > when the bug was tickled. Unfortunately, by then I had removed older system versions
> > that predated the bug. When I looked at a backup copy of the `configuration.scm`, I
> > discovered the bug soon afterwards.
> > But the amount of code verbiage needed here had apparently overwhelmed me at the time I
> > wrote the code to do the waiting, and the bug got into the code and broke my system.
> > I had to completely reinstall Guix (fortunately the backup copy of `configuration.scm`
> > was helpful in recovering most of the system, also ZFS rocks).
>
> Ok, shepherd should be more robust (at least for non-essential services like guix-publish).
>
> > Yes, I made a mistake. I'm only human. It should be easy to recover from mistakes.
>
> I would need to know what the mistake and what ‘recovering’ would be exactly in this
> case, before agreeing or disagreeing whether this is something shepherd should help
> with.

It's what I described in the previous paragraph.  I wrote code to wait for startup, the code to wait for startup had a bug that caused it to enter an infinite loop under a certain edge case.

The edge case was not reached for a long time (almost a month), so I only found it a week ago when it triggered and broke my system.  In order to recover I had to reinstall Guix System completely, and none of the saved system generations had a configuration that predated the introduction of the bug.

If you want specifics --- as a sketch, what I did was something similar to this:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (- count 1)))) ;;; <--- (fuck)

The daemon died (this was the persistent on-disk edge case, arguably a bug in `daemon-started?` as well) causing `(daemon-started?)` to return `#f` permanently, and then I started counting *in the wrong direction*.


> > The full Turing-completeness of the language invites mistakes,
>
> Agreed (though ‘invites’ seems a little too strong here).

Come on.  Something as simple as "Wait for some condition to be true" needs code like the above to implement fully in the Shepherd action-description language (i.e. Scheme).  And because I was debating on "just put `(count 30)` in the `let` and decrement" versus "Naaa just count up like a normal person" and got my wires crossed, I got into the above completely dumb mistake.

Indeed, not only should `-` be changed to `+` in the above, but it's also better to do something like this as well:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((not (zero? (car (waitpid daemon-pid WNOHANG))))
         (format #t "daemon died!~%")
         #f)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (+ count 1))))

Because if the daemon died, waiting for 30 seconds would be fairly dumb as well.

That is a fair bit of verbiage, and much harder to read through for bugs.

As per information theory: the more possible things I can talk about, the more bits I have to push through a communication channel to convey it.  Turing-complete languages can do more things than more restricted  domain-specific languages, so on average it requires more code to implement an arbitrary thing in the Turing-complete language (thus requiring more careful review) than the restricted domain-specific language.

> -   Section 3: A nicely behaving API.
>
> > So what can we do?
> > For one, a Turing-complete language is a strict superset of non-Turing-complete
> > languages. So one thing we can do is to define a more dedicated language for Shepherd
> > actions, strongly encourage the use of that sub-language, and, at some point, require
> > that truly Turing-complete actions need to be wrapped in a `(unsafe-turing-complete ...)` form.
> > For example, in addition to the current existing `make-forkexec-constructor` and
> > `make-kill-destructor`, let us also add these syntax forms:
>
> > `(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a numeric `pid`, that does: Check if evaluating `<form>` in the lexical context results in `#f`. If so, wait one second and re-evaluate, but exit anyway and return `#f` if `<timeout>` seconds has passed, or if the `pid` has died. If `<form>` evaluates to non-`#f` then return it immediately.
> > `(timed-action <timeout> <form> ...)` - Return a procedure that accepts a numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed process group?). The new thread evaluates `<form> ...` in sequence. If the thread completes or throws in `<timeout>` seconds, return the result or throw the exception in the main thread. IF the `<timeout>` is reached or the
> > given `pid` has died, kill the thread and any processes it may have spawned, then return `#f`.
> > `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that does: Check if the `pid` has died, if so, return #t. Else sleep 1 second and try again, and if `<timeout>` is reached, return `#f`.
>
> Hard to say if these syntaxes are useful in advance.

In my particular case I could have just written `(wait-for-condition 30 (daemon-started?))`.  I imagine a fair number of wait-for-daemon-startup things can use that or `(timed-action 30 (invoke #$waiter-program))`.

>
> In any case, I don't like numeric pids, as they are reused.
> Could we use something like pidfds (Linux) or whatever the Hurd has instead?
> (Wrapped in nice type in Scheme, maybe named <task> with a predicate task?,
> a predicate task-dead?, a wait-for-task-dead operation (if we use guile-fibers).)

Certainly, but that may require breaking changes to existing specific shepherd actions.


> Note: (perform-operation (choice-operation (wait-for-task-dead task) (sleep 5)))
> == wait until task is dead OR 5 seconds have passed. Operations in guile-fibers
> combine nicely!
>
> Also, sprinkling time-outs everywhere seems rather dubious to me (if that's
> what timed-action is for). If some process is taking long to start,
> then by killing it it will take forever to start.

Sprinkling timeouts makes the halting problem trivial: for any code that is wrapped in a timeout, that code will halt (either by itself, or by hitting the timeout).

Indeed, you can make a practical total functional language simply by requiring that recursions have a decrementing counter that will abort computation when it reaches 0.  It could use mostly partial code patterns with the addition of that decrementing counter, and it becomes total.  In many ways the programmer is just assuring the compiler "this operation will not take more than N recursions".


>
> > The above forms should also report as well what they are doing (e.g. `Have been waiting 10 of 30 seconds for <form>`) on the console and/or syslog.
>
> Something like that would be useful, yes. But printing <form> itself seems a bit
> low-level to me.

Yes, but if it's something that the end-user wrote in their `configuration.scm`, it seems fine to me.

>
> > In addition, `make-forkexec-constructor` should accept a `#:post-fork`, which is
> > a procedure that accepts a `pid`, and `make-kill-destructor` should accept a
> > `#:pre-kill`, which is a procedure that accepts a `pid`. Possibly we need to
> > add combinators for the above actions as well. For example a `sub-actions`
> > procedural form that accepts any number of functions of the above `pid -> bool`
> > type and creates a single combined `pid -> bool`.
>
> > So for example, in my case, I would have written a `make-forkexec-constructor`
> > that accepted a `#:post-fork` that had a `wait-for-condition` on the code that
> > hecked if the daemon initialization completed correctly.
> > I think it is a common enough pattern that:
> >
> > -   We have to spawn a daemon.
> > -   We have to wait for the daemon to be properly initialized (`#:post-fork`)
> > -   When shutting down the daemon, it's best to at least try to politely ask it
> >     to finish using whatever means the daemon has (`#:pre-kill`).
> >
> > -   If the daemon doesn't exit soon after we politely asked it to, be less polite and start sending signals.
> >
> > So I think the above should cover a good number of necessities.
>
> Agreed. My own proposal, which I like better (-: :
> (this is the ‘procedural’ (extensible) interface)
>
> -   Multi-threading using guile-fibers (we'll use condition variables & channels).
> -   Instead of make-forkexec-constructor, we have
>     make-process-constructor, as fork + exec as-is is troublesome in
>     a multi-threaded application (see POSIX).
>
>     (It still accepts #:user, #:group & the like)
>
>     make-process-constructor returns a thunk as usual,
>     but this thunk THUNK works differently:
>
>     THUNK : () -> <process-task>.
>
>
> <process-task> is a GOOPS class representing processes (parent: <shepherd-task>),
> that do not neccessarily exist yet / anymore. <process-task>
> has some fields:
>
> * stopped?-cond
> (condition variable, only signalled after exit-value is set)
> * exit-value
> (an atomic box, value #f or an integer,
> only read after stopped?-cond is signalled
> or whenever the user is feeling impatient and asking shepherd
> how it is going.)
> * stop!-cond (condition variable)
> * stop-forcefully!-cond (condition variable)
> * started-cond (condition variable, only signalled when ‘pid’ has been set)
> * pid (an atomic box)
> * maybe some other condition variables?
> * other atomic boxes?
>
> THUNK also starts a fiber associated with the returned <process-task>,
> which tries to start the service binary in the background (could also be used
> for other things than the service binary). When the binary has been started,
> the fiber signals started-cond.
>
> The fiber will wait for the process to exit, and sets exit-value and
> signals stopped?-cond when it does. It will also wait for a polite request
> to stop the service binary (stop!-cond) and impolite requests (for SIGKILL?)
> (stop-forcefully!-cond).
>
> (Does shepherd even have an equivalent to stop-forcefully! ?)
>
> kill-destructor is adjusted to not work with pids directly, but rather
> with <process-task>. It signals stop!-cond or stop-forcefully!-cond.
>
> (So process creation & destruction happens asynchronuously, but there
> still is an easy way to wait until they are actually started.)
>
> -   #:pre-kill & #:post-fork seem rather ad-hoc to me. I advise
>     still wrapping make-process-constructor & kill-destructor.
>     However, the thunk still needs to return a <shepherd-task> (or subclass
>     of course).
>
>     If the thunk does anything that could block / loop (e.g. if it's a HTTP
>     server, then the wrapper might want to wait until the process is actually
>     listening at port 80), then this blocking / looping must be done in a fiber.
>
>     Likewise if the killer does anything that could block / loop (e.g.
>     first asking the process to exit nicely before killing it after a time-out).

Sure, this design could work better as well.  But *some* restricted dialect should exist, and we should discourage people from using the full Guile language in Shepherd actions they write bespoke in their `configuration.scm`, not tout it as a feature.

> -   Section 4: Static analysis
>
> > Then, we can define a strict subset of Guile, containing a set of forms we know are
> > total (i.e. we have a proof they will always halt for all inputs). Then any Shepherd
> > actions, being Lisp code and therefore homoiconic, can be analyzed. Every list must
> > have a `car` that is a symbol naming a syntax or procedure that is known to be safe
> > --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, `quote`
> > (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk combinators,
> > as well as the domain-specific `make-forkexec-constructor`, `make-kill-destructor`,
> > `wait-for-condition`, `timed-action`, and probably some of the `(guix build utils)`
> > like `invoke`, `invoke/quiet`, [...]
>
> `invoke' has no place in your subset, as the invoked program can take arbitrarily long.
> (Shouldn't be a problem when multi-threading, but then there's less need for this kind
> of static analysis.)

Well, the thing is --- I can probably write a shell script and test it **outside of Shepherd**.  In fact, in my mistake above, on a previous non-Guix system, I ***did*** use a shell script to wait for daemon startup.  But when I ported it over to Guix, I decided to rewrite it as Scheme code to "really dive into the Guix system".

Before, when writing it as a shell script I was able to test it and indeed found problems and fixed them.  But as Scheme code, well, it's a bit harder, since I have to go import a bunch of things (that I have to go search for, and I have to figure out Guile path settings for the bits that are provided by Guix, etc.), and then afterwards I have to copy back the tested code into my `configuration.scm`.  This was enough of an annoyance to me that it discouraged me from testing it, which could have caught the above bug.

Since `invoke` calls an external program, which runs isolated in its own process (and can be `kill`ed in order to unstuck Shepherd), and can more easily be independently tested outside the `configuration.scm`, it's substantially safer than writing the same logic directly in the Shepherd language, so it gets a pass.

Many other daemons that have some kind of "wait-for-daemon-startup" program will often include tests for the "wait" program as well in its test suite, so using `invoke` is substantially safer than writing bespoke code in Scheme that performs the same action.

> > `mkdir-p` etc.
>
> I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some procedure
> (prepare-directory
> `("/var/stuff #:bits #o700 #:user ... #:group
> ("blabla" #:bits ...
> #:atomic-generate-contents ,(lambda (port) (display 'stuff port))))
> ...))
>
> ... automatically taking care of symlinks & detecting whether there are conflicts
> with different services.
>
> > Sub-forms (or the entire form for an action) can be wrapped in `(unsafe-turing-complete ...)`
> > to skip this analysis for the sub-form,
>
> An escape hatch seems good to me, though I would rather call it
> `(without-static-analysis (totality) ...)'. The forms are not neccesarily ‘unsafe’,
> only potentially so. We could also define a checkers for other issues this way.
>
> (I'm thinking of potential TOCTTOU (time of check to time of use) problems involving
> symbolic links.)
>
> > but otherwise, by default, the specific subset must be used, and users have to
> > explicitly put `(unsafe-turing-complete ...)` so they are aware that they can
> > seriously break their system if they make a mistake here. Ideally, as much of
> > the code for any Shepherd action should be outside an ``unsafe-turing-complete`, and only parts of the code that really need the full Guile language to implement should be rapped in`unsafe-turing-complete`.
>
> I'm getting Rusty vibes here. Sounds sound to me.
>
> > (`lambda` and `let-syntax` and `let`, because they can rebind the meanings of symbols,
> > would need to be in `unsafe-turing-complete` --- otherwise the analysis routine would
> > require a full syntax expander as well)
>
> No. The analysis routine should not directly work on Scheme code, but rather on
> the 'tree-il' (or maybe on the CPS code, I dunno).
> Try (macro-expand '(let ((a 0)) a)) from a Guile REPL.
>
> > Turing-completeness is a weakness, not a strength, and restricting languages to be
> > the Least Powerful necessary is better. The `unsafe-turing-complete` form allows
> > an escape, but by default Shepherd code should be in the restricted non-Turing-complete
> > subset, to reduce the scope for catastrophic mistakes.
>
> I'm assuming make-fork+exec-constructor/container would be defined in Scheme
> and be whitelisted in raid5atemyhomework-scheme?

Yes.

>
> -   Section 5 -- Declarative
>
>     Something you don't seem to have considered: defining services in a declarative
>     language! Hypothetical example à la SystemD, written in something vaguely
>     resembling Wisp (Wisp is an alternative read syntax for Scheme with less
>     parentheses):

I prefer SRFI 110 myself.

> (I'm a bit rusty with the details on defining shepherd services
>     in Guix:
>
>     shepherd-service
>     start `I-need-to-think-of-a-name-constructor #:binary #$(file-append package "/bin/stuff") #:arguments stuff #:umask suff #:user stuff #:group stuff ,@(if linux`(#:allowed-syscalls ',(x y z ...)))
>     #:friendly-shutdown
>     thunk
>     invoke #$(file-append package "/bin/stop-stuff-please")
>     #:polling-test-started?
>     thunk
>     if . file-exists? "/run/i-am-really-started"
>     #true
>
>     ;; tell I-need-to-think-of.... to wait 4 secs
>     ;; before trying again
>     seconds 4
>     ;; how much time starting may take
>     ;; until we consider things failed
>     ;; & stop the process
>     #:startup-timeout?
>     thunk
>     if spinning-disk?
>     plenty
>     little
>     #:actions etcetera
>     ...
>
>     Well, that's technically still procedural, but this starts to look
>     like a SystemD configuration file due to the many keyword arguments!
>
>     (Here (thunk X ...) == (lambda () X ...), and
>     I-need-to-think-of-a-name-constructor
>     is a procedure with very many options, that
>     ‘should be enough for everyone. If it doesn't support
>     enough arguments,
>     look in SystemD for inspiration.)
>
>     That seems easier to analyse. Although it's bit kitchen-sinky,
>     no kitchen is complete without a sink, so maybe that's ok ...
>     (I assume the kitchen sink I-need-to-think-of-a-name-constructor
>     would be defined in the guix source code.)


Well, yes ---- but then we can start arguing that using SystemD would be better, as it is more battle-tested and already exists and all the kitchen-sink features are already there and a lot more people can hack its language than can hack the specific dialect of scheme (augmented with Guixy things like `make-forkexec-constructor`) used by GNU Shepherd.

Thanks
raid5atemyhomework

next prev parent reply	other threads:[~2021-03-20  2:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
2021-03-19 21:49 ` Maxime Devos
2021-03-20  2:42   ` raid5atemyhomework [this message]
2021-03-20  4:02 ` Maxim Cournoyer
2021-03-20  5:07 ` Mark H Weaver
2021-03-20 11:10   ` raid5atemyhomework
2021-03-20 16:58 ` Ludovic Courtès
2021-03-21  0:22   ` raid5atemyhomework
2021-03-22 17:02     ` Ludovic Courtès
2021-03-24 14:29       ` raid5atemyhomework
2021-03-24 14:48       ` raid5atemyhomework
2021-03-22 13:42 ` raingloom
2021-03-22 17:50   ` Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='i2RYnzzEDl6LOUMEsZlhEAuqjNObTyJM7gpg2en5tQEAvqCjIBM2JkocbhWoNxSIukHZNCT3ATP2roxZvyQ6J8w4LPDw0F-qkLq3ZW4f2Ws=@protonmail.com' \
    --to=raid5atemyhomework@protonmail.com \
    --cc=guix-devel@gnu.org \
    --cc=maximedevos@telenet.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).