A Critique of Shepherd Design

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* A Critique of Shepherd Design
@ 2021-03-19 17:33 raid5atemyhomework
  2021-03-19 21:49 ` Maxime Devos
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-19 17:33 UTC (permalink / raw)
  To: guix-devel@gnu.org

GNU Shepherd is the `init` system used by GNU Guix.  It features:

* A rich full Scheme language to describe actions.
* A simple core that is easy to maintain.

However, in this critique, I contend that these features are bugs.

The Shepherd language for describing actions on Shepherd daemons is a Turing-complete Guile language.  Turing completeness runs afoul of the Principle of Least Power.  In principle, all that actions have to do is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.  Yet the language is a full Turing-complete language, including the major weakness of Turing-completeness: the inability to solve the halting problem.

The fact that the halting problem is unsolved in the language means it is possible to trivially write an infinite loop in the language.  In the context of an `init` system, the possibility of an infinite loop is dangerous, as it means the system may never complete bootup.

Now, let us combine this with the second feature (really a bug): GNU shepherd is a simple, single-threaded Scheme program.  That means that if the single thread enters an infinite loop (because of a Shepherd service description that entered an infinite loop), then Shepherd itself hangs.  This means that the system is severely broken.  You cannot `sudo reboot` because that communicates with the Shepherd daemon.  You cannot even `kill 1` because signals are handled in the mainloop, which never get entered if the `start` action of some Shepherd daemon managed to enter an infinite loop.

I have experienced this firsthand since I wrote a Shepherd service to launch a daemon, and I needed to wait for the daemon initialization to complete.  My implementation of this had a bug that caused an infinite loop to be entered, but would only tickle this bug when a very particular (persistent on-disk) state existed.  I wrote this code about a month or so ago, and never got to test it until last week, when the bug was tickled.  Unfortunately, by then I had removed older system versions that predated the bug.  When I looked at a backup copy of the `configuration.scm`, I discovered the bug soon afterwards.  But the amount of code verbiage needed here had apparently overwhelmed me at the time I wrote the code to do the waiting, and the bug got into the code and broke my system.  I had to completely reinstall Guix (fortunately the backup copy of `configuration.scm` was helpful in recovering most of the system, also ZFS rocks).

Yes, I made a mistake.  I'm only human.  It should be easy to recover from mistakes.  The full Turing-completeness of the language invites mistakes, and the single-threadedness makes the infinite-loop mistakes that Turing-completeness invites, into catastrophic system breakages.

So what can we do?

For one, a Turing-complete language is a strict *superset* of non-Turing-complete languages.  So one thing we can do is to define a more dedicated language for Shepherd actions, strongly encourage the use of that sub-language, and, at some point, require that truly Turing-complete actions need to be wrapped in a `(unsafe-turing-complete ...)` form.

For example, in addition to the current existing `make-forkexec-constructor` and `make-kill-destructor`, let us also add these syntax forms:

`(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a numeric `pid`, that does: Check if evaluating `<form>` in the lexical context results in `#f`.  If so, wait one second and re-evaluate, but exit anyway and return `#f` if `<timeout>` seconds has passed, or if the `pid` has died.  If `<form>` evaluates to non-`#f` then return it immediately.

`(timed-action <timeout> <form> ...)` - Return a procedure that accepts a numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed process group?).  The new thread evaluates `<form> ...` in sequence.  If the thread completes or throws in `<timeout>` seconds, return the result or throw the exception in the main thread.  IF the `<timeout>` is reached or the given `pid` has died, kill the thread and any processes it may have spawned, then return `#f`.

`(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that does: Check if the `pid` has died, if so, return #t.  Else sleep 1 second and try again, and if `<timeout>` is reached, return `#f`.

The above forms should also report as well what they are doing (e.g. `Have been waiting 10 of 30 seconds for <form>`) on the console and/or syslog.

In addition, `make-forkexec-constructor` should accept a `#:post-fork`, which is a procedure that accepts a `pid`, and `make-kill-destructor` should accept a `#:pre-kill`, which is a procedure that accepts a `pid`.  Possibly we need to add combinators for the above actions as well.  For example a `sub-actions` procedural form that accepts any number of functions of the above `pid -> bool` type and creates a single combined `pid -> bool`.

So for example, in my case, I would have written a `make-forkexec-constructor` that accepted a `#:post-fork` that had a `wait-for-condition` on the code that checked if the daemon initialization completed correctly.

I think it is a common enough pattern that:

* We have to spawn a daemon.
* We have to wait for the daemon to be properly initialized (`#:post-fork`)
* When shutting down the daemon, it's best to at least try to politely ask it to finish using whatever means the daemon has (`#:pre-kill`).
* If the daemon doesn't exit soon after we politely asked it to, be less polite and start sending signals.

So I think the above should cover a good number of necessities.

Then, we can define a strict subset of Guile, containing a set of forms we know are total (i.e. we have a proof they will always halt for all inputs).  Then any Shepherd actions, being Lisp code and therefore homoiconic, can be analyzed.  Every list must have a `car` that is a symbol naming a syntax or procedure that is known to be safe --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, `quote` (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk combinators, as well as the domain-specific `make-forkexec-constructor`, `make-kill-destructor`, `wait-for-condition`, `timed-action`, and probably some of the `(guix build utils)` like `invoke`, `invoke/quiet`, `mkdir-p` etc.

Sub-forms (or the entire form for an action) can be wrapped in `(unsafe-turing-complete ...)` to skip this analysis for the sub-form, but otherwise, by default, the specific subset must be used, and users have to explicitly put `(unsafe-turing-complete ...)` so they are aware that they can seriously break their system if they make a mistake here.  Ideally, as much of the code for any Shepherd action should be *outside* an ``unsafe-turing-complete`, and only parts of the code that really need the full Guile language to implement should be rapped in `unsafe-turing-complete`.

(`lambda` and `let-syntax` and `let`, because they can rebind the meanings of symbols, would need to be in `unsafe-turing-complete` --- otherwise the analysis routine would require a full syntax expander as well)

Turing-completeness is a weakness, not a strength, and restricting languages to be the Least Powerful necessary is better.  The `unsafe-turing-complete` form allows an escape, but by default Shepherd code should be in the restricted non-Turing-complete subset, to reduce the scope for catastrophic mistakes.

Thanks
raid5atemyhomework

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
@ 2021-03-19 21:49 ` Maxime Devos
  2021-03-20  2:42   ` raid5atemyhomework
  2021-03-20  4:02 ` Maxim Cournoyer
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Maxime Devos @ 2021-03-19 21:49 UTC (permalink / raw)
  To: raid5atemyhomework, guix-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 20805 bytes --]

On Fri, 2021-03-19 at 17:33 +0000, raid5atemyhomework wrote:
> GNU Shepherd is the `init` system used by GNU Guix.  It features:
> 
> * A rich full Scheme language to describe actions.
> * A simple core that is easy to maintain.
> 
> However, in this critique, I contend that these features are bugs.

A lot of unpack here!  Some agreements, some disagreements,
some alternative proposals, maybe some misunderstandings:

* Section 1: Singly-threading (simple) or multi-threading (robust if done well)

> Now, let us combine this with the second feature (really a bug): GNU shepherd
> is a simple, single-threaded Scheme program.  That means that if the single
> thread enters an infinite loop (because of a Shepherd service description
> that entered an infinite loop), then Shepherd itself hangs.  This means that
> the system is severely broken.  You cannot `sudo reboot` because that communicates
> with the Shepherd daemon.  You cannot even `kill 1` because signals are handled in
> the mainloop, which never get entered if the `start` action of some Shepherd daemon
> managed to enter an infinite loop.

I don't think simplicity is in the ‘design goals’ of shepherd -- it's primarily
used within Guix System (though it can be used outside Guix), which I would not
call simple (-:.

Multi-threading seems complicated (but can be worth it).  What work would you put
on which thread (please give us some concrete to reason about, ‘thread X does A,
and is created on $EVENT ...’)?  A complication is that "fork" is ‘unsafe’ in a
process with multiple threads (I don't know the details).  Maybe that could be
worked around.  (E.g. by using system* on a helper binary that closes unneeded
file descriptors and sets the umask .... and eventually uses exec to run the
service binary.)

* Section 2: Scheme too powerful?

> The Shepherd language for describing actions on Shepherd daemons is a Turing-complete Guile language.

There isn't really a ‘Shepherd language’, it's ‘just’ Scheme code and shepherd is ‘merely’ a library.
But I guess that's your point.

>   Turing completeness runs afoul of the Principle of Least Power.
>   In principle, all that actions have to do is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.

That's a bit of an over-simplification.  At least in a singly-threaded Shepherd,
waitpid could not be used by an individual service, as it
(1) returns not often
enough (if a process of another service exits and this action was waiting on
a single pid) (2) too often (likewise,
and this action was ‘just waiting’
(3) potentially never (imagine every process has been started and an action
was waiting for one to exit,
and now the user tries to reboot.  Then shepherd
daemon would never accept the connection from the user!).

And in a multi-threaded Shepherd, "fork" is unsafe.  (Anyway, "fork" captures
too much state, and no action should ever call "exec" without "fork".)  Also,
"exec" replaces *too little* state (e.g. open file descriptors, umask, root
directory, current working directory ...).

But perhaps that's just a call for a different set of primitives (which could
be implemented on top of these system calls by shepherd, or directly on top
of the relevant Hurd RPCs when running on GNU/Hurd.).

>   Yet the language is a full Turing-complete language, including the major
>   weakness of Turing-completeness: the inability to solve the halting problem.

IIUC, there are plans to perform the network configuration inside shepherd
(search for guile-netlink).  So arbitrary code still needs to be allowed.
This is the Procedural (extensible) I'll referring to later.

>  The fact that the halting problem is unsolved in the language means it is
>  possible to trivially write an infinite loop in the language.  In the context
>  of an `init` system, the possibility of an infinite loop is dangerous, as it
>  means the system may never complete bootup.

I'm assuming the hypothetical infinite loop is in, say, the code for starting
guix-daemon which isn't very essential (try "herd stop guix-daemon"!)
(If there's an infinite loop is in e.g. udev-service-type then there's indeed
a problem, then that infinite loop might as well be in eudev itself, so I'm
only speaking of *non-essential* services blocking the boot process.)

I'm not convinced we need Turing-incompleteness here, at least if the
start / do-a-dance action / stop code of each service definition is run in
a separate thread (not currently the case).

> I have experienced this firsthand since I wrote a Shepherd service to launch
> a daemon, and I needed to wait for the daemon initialization to complete.
> My implementation of this had a bug that caused an infinite loop to be entered,
> but would only tickle this bug when a very particular (persistent on-disk) state
> existed.
>
> I wrote this code about a month or so ago, and never got to test it until last week,
> when the bug was tickled.  Unfortunately, by then I had removed older system versions
> that predated the bug.  When I looked at a backup copy of the `configuration.scm`, I
> discovered the bug soon afterwards.
>
> But the amount of code verbiage needed here had apparently overwhelmed me at the time I
> wrote the code to do the waiting, and the bug got into the code and broke my system.
> I had to completely reinstall Guix (fortunately the backup copy of `configuration.scm`
> was helpful in recovering most of the system, also ZFS rocks).

Ok, shepherd should be more robust (at least for non-essential services like guix-publish).

> Yes, I made a mistake.  I'm only human.  It should be easy to recover from mistakes.
I would need to know what the mistake and what ‘recovering’ would be exactly in this
case, before agreeing or disagreeing whether this is something shepherd should help
with.

> The full Turing-completeness of the language invites mistakes,
Agreed (though ‘invites’ seems a little too strong here).
> and the single-threadedness makes the infinite-loop mistakes that Turing-completeness
> invites, into catastrophic system breakages.

Seems like an argument towards making shepherd multi-threaded (or using guile-fibers).

* Section 3: A nicely behaving API.

> So what can we do?
> 
> For one, a Turing-complete language is a strict *superset* of non-Turing-complete
> languages.  So one thing we can do is to define a more dedicated language for Shepherd
> actions, strongly encourage the use of that sub-language, and, at some point, require
> that truly Turing-complete actions need to be wrapped in a `(unsafe-turing-complete ...)` form.
> 
> For example, in addition to the current existing `make-forkexec-constructor` and
>  `make-kill-destructor`, let us also add these syntax forms:

> `(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a numeric `pid`, that does: Check if evaluating `<form>` in the lexical context results in `#f`.  If so, wait one second and re-evaluate, but exit anyway and return `#f` if `<timeout>` seconds has passed, or if the `pid` has died.  If `<form>` evaluates to non-`#f` then return it immediately.
> 
> `(timed-action <timeout> <form> ...)` - Return a procedure that accepts a numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed process group?).  The new thread evaluates `<form> ...` in sequence.  If the thread completes or throws in `<timeout>` seconds, return the result or throw the exception in the main thread.  IF the `<timeout>` is reached or the
> given `pid` has died, kill the thread and any processes it may have spawned, then return `#f`.
> 
> `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that does: Check if the `pid` has died, if so, return #t.  Else sleep 1 second and try again, and if `<timeout>` is reached, return `#f`.

Hard to say if these syntaxes are useful in advance.

In any case, I don't like numeric pids, as they are reused.
Could we use something like pidfds (Linux) or whatever the Hurd has instead?
(Wrapped in nice type in Scheme, maybe named <task> with a predicate task?,
a predicate task-dead?, a wait-for-task-dead operation (if we use guile-fibers).)

Note: (perform-operation (choice-operation (wait-for-task-dead task) (sleep 5)))
   == wait until task is dead OR 5 seconds have passed.  Operations in guile-fibers
   combine nicely!

Also, sprinkling time-outs everywhere seems rather dubious to me (if that's
what timed-action is for).  If some process is taking long to start,
then by killing it it will take forever to start.

> The above forms should also report as well what they are doing (e.g. `Have been
> waiting 10 of 30 seconds for <form>`) on the console and/or syslog.

Something like that would be useful, yes.  But printing <form> itself seems a bit
low-level to me.

> In addition, `make-forkexec-constructor` should accept a `#:post-fork`, which is
>  a procedure that accepts a `pid`, and `make-kill-destructor` should accept a
> `#:pre-kill`, which is a procedure that accepts a `pid`.  Possibly we need to
>  add combinators for the above actions as well.  For example a `sub-actions`
> procedural form that accepts any number of functions of the above `pid -> bool`
> type and creates a single combined `pid -> bool`.

> So for example, in my case, I would have written a `make-forkexec-constructor`
> that accepted a `#:post-fork` that had a `wait-for-condition` on the code that
>  hecked if the daemon initialization completed correctly.
> 
> I think it is a common enough pattern that:
> 
> * We have to spawn a daemon.
> * We have to wait for the daemon to be properly initialized (`#:post-fork`)
> * When shutting down the daemon, it's best to at least try to politely ask it
>   to finish using whatever means the daemon has (`#:pre-kill`).
> * If the daemon doesn't exit soon after we politely asked it to, be less polite and start sending signals.
> 
> So I think the above should cover a good number of necessities.

Agreed.  My own proposal, which I like better (-: :
(this is the ‘procedural’ (extensible) interface)

+ Multi-threading using guile-fibers (we'll use condition variables & channels).
+ Instead of make-forkexec-constructor, we have
  make-process-constructor, as fork + exec as-is is troublesome in
  a multi-threaded application (see POSIX).

  (It still accepts #:user, #:group & the like)

  make-process-constructor returns a thunk as usual,
  but this thunk THUNK works differently:

  THUNK : () -> <process-task>.

  <process-task> is a GOOPS class representing processes (parent: <shepherd-task>),
  that do not neccessarily exist yet / anymore.  <process-task>
  has some fields:

    * stopped?-cond
      (condition variable, only signalled after exit-value is set)
    * exit-value
      (an atomic box, value #f or an integer,
       only read after stopped?-cond is signalled
       or whenever the user is feeling impatient and asking shepherd
       how it is going.)
    * stop!-cond (condition variable)
    * stop-forcefully!-cond (condition variable)
    * started-cond (condition variable, only signalled when ‘pid’ has been set)
    * pid (an atomic box)
    * maybe some other condition variables?
    * other atomic boxes?

  THUNK also starts a fiber associated with the returned <process-task>,
  which tries to start the service binary in the background (could also be used
  for other things than the service binary).  When the binary has been started,
  the fiber signals started-cond.

  The fiber will wait for the process to exit, and sets exit-value and
  signals stopped?-cond when it does.  It will also wait for a polite request
  to stop the service binary (stop!-cond) and impolite requests (for SIGKILL?)
  (stop-forcefully!-cond).

  (Does shepherd even have an equivalent to stop-forcefully! ?)

  kill-destructor is adjusted to not work with pids directly, but rather
  with <process-task>.  It signals stop!-cond or stop-forcefully!-cond.

  (So process creation & destruction happens asynchronuously, but there
  still is an easy way to wait until they are actually started.)

+ #:pre-kill & #:post-fork seem rather ad-hoc to me.  I advise
  still wrapping make-process-constructor & kill-destructor.
  However, the thunk still needs to return a <shepherd-task> (or subclass
  of course).

  If the thunk does anything that could block / loop (e.g. if it's a HTTP
  server, then the wrapper might want to wait until the process is actually
  listening at port 80), then this blocking / looping must be done in a fiber.

  Likewise if the killer does anything that could block / loop (e.g.
  first asking the process to exit nicely before killing it after a time-out).

+ It is probably simplest if shepherd's logic for deciding which service to
  start / kill / which dependencies ... ran in a single fiber (though the
  actual starting & killing happens asynchronuously as described).

  To known when a service has been started / stopped, the following GOOPS
  generics must be defined on every <shepherd-task> (not the same as <service>!):

  Method (wait-until-stopped-operation (task <shepherd-task>))
    Make a guile-fibers operation for waiting until @var{task} has been stopped.
    This operation returns no values.

  Method (wait-until-task-started-operation (task <shepherd-task>))
    Make a guile-fibers operation for waiting until @var{task} has been started.
    This operation returns no value.

  Method (task-stopped? (task <shepherd-task>))
    Test whether @var{task} has been stopped, returning a generalized boolean.
    It is possible for a task to be stopped even if it has never been started,
    e.g. if the service binary could not be found.

  Method (task-started? (task <shepherd-task>))
    Test whether @var{task} has been started, returning a generalized boolean.
    Callers probably should also check whether the task has been stopped.

  Method (stop-task-asynchronuously! (task <shepherd-task>))
    Request that the task @var{task} be stopped, returning no values.
    This must succeed even if no fiber is handling @var{task} anymore
    (e.g. by signalling a condition variable).  This procedure is
    idempotent: stopping a task twice is the same as stopping a task
    once.

    There is no task start-task-asynchronuously! method, as dead
    tasks cannot be resurrected.

  This ‘service logic fiber’ starts the ‘let's create a process’
  and ‘wait until the process is *really* started’ fibers by calling
  the thunks returned by make-process-constructor & variants.

+ About

> The above forms should also report as well what they are doing (e.g. `Have been
> waiting 10 of 30 seconds for <form>`) on the console and/or syslog.

  The thunks could write to (current-task-info-port) (an output port to be defined
  by Shepherd).  task-info-port is a ‘parameter’ (see guile documentation)
  parameterized by Shepherd when running the thunk.  Or something, I'm not really
  sure about this part of my proposal)

* Section 4: Static analysis

> Then, we can define a strict subset of Guile, containing a set of forms we know are
> total (i.e. we have a proof they will always halt for all inputs).  Then any Shepherd
> actions, being Lisp code and therefore homoiconic, can be analyzed.  Every list must
> have a `car` that is a symbol naming a syntax or procedure that is known to be safe
> --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, `quote`
> (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk combinators,
> as well as the domain-specific `make-forkexec-constructor`, `make-kill-destructor`,
> `wait-for-condition`, `timed-action`, and probably some of the `(guix build utils)`
> like `invoke`, `invoke/quiet`, [...]

`invoke' has no place in your subset, as the invoked program can take arbitrarily long.
(Shouldn't be a problem when multi-threading, but then there's less need for this kind
of static analysis.)

is not ‘safe’ as the invoked program can 

> `mkdir-p` etc.

I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some procedure
(prepare-directory
  `("/var/stuff #:bits #o700 #:user ... #:group
     ("blabla" #:bits ...
               #:atomic-generate-contents ,(lambda (port) (display 'stuff port))))
     ...))

... automatically taking care of symlinks & detecting whether there are conflicts
with different services.

> Sub-forms (or the entire form for an action) can be wrapped in `(unsafe-turing-complete ...)`
> to skip this analysis for the sub-form,
An escape hatch seems good to me, though I would rather call it
`(without-static-analysis (totality) ...)'.  The forms are not neccesarily ‘unsafe’,
only potentially so.  We could also define a checkers for other issues this way.

(I'm thinking of potential TOCTTOU (time of check to time of use) problems involving
symbolic links.)

> but otherwise, by default, the specific subset must be used, and users have to
> explicitly put `(unsafe-turing-complete ...)` so they are aware that they can
> seriously break their system if they make a mistake here.  Ideally, as much of
> the code for any Shepherd action should be *outside* an ``unsafe-turing-complete`,
> and only parts of the code that really need the full Guile language to implement
> should be rapped in `unsafe-turing-complete`.

I'm getting Rusty vibes here.  Sounds sound to me.

> (`lambda` and `let-syntax` and `let`, because they can rebind the meanings of symbols,
>  would need to be in `unsafe-turing-complete` --- otherwise the analysis routine would
>  require a full syntax expander as well)

No.  The analysis routine should not directly work on Scheme code, but rather on
the 'tree-il' (or maybe on the CPS code, I dunno).
Try (macro-expand '(let ((a 0)) a)) from a Guile REPL.

> Turing-completeness is a weakness, not a strength, and restricting languages to be
> the Least Powerful necessary is better.  The `unsafe-turing-complete` form allows
> an escape, but by default Shepherd code should be in the restricted non-Turing-complete
> subset, to reduce the scope for catastrophic mistakes.

I'm assuming make-fork+exec-constructor/container would be defined in Scheme
and be whitelisted in raid5atemyhomework-scheme?

* Section 5 -- Declarative

Something you don't seem to have considered: defining services in a declarative
language!  Hypothetical example à la SystemD, written in something vaguely
resembling Wisp  (Wisp is an alternative read syntax for Scheme with less
parentheses): (I'm a bit rusty with the details on defining shepherd services
in Guix:

shepherd-service
  start `
    I-need-to-think-of-a-name-constructor
      #:binary #$(file-append package "/bin/stuff")
      #:arguments stuff
      #:umask suff
      #:user stuff
      #:group stuff
      ,@(if linux `(#:allowed-syscalls ',(x y z ...)))
      #:friendly-shutdown
        thunk
          invoke #$(file-append package "/bin/stop-stuff-please")
      #:polling-test-started?
        thunk
          if . file-exists? "/run/i-am-really-started"
            #true

            ;; tell I-need-to-think-of.... to wait 4 secs
            ;; before trying again
            seconds 4
      ;; how much time starting may take
      ;; until we consider things failed
      ;; & stop the process
      #:startup-timeout?
        thunk
          if spinning-disk?
             plenty
             little
      #:actions etcetera
      ...

Well, that's technically still procedural, but this starts to look
like a SystemD configuration file due to the many keyword arguments!

(Here (thunk X ...) == (lambda () X ...), and
    I-need-to-think-of-a-name-constructor
is a procedure with very many options, that
‘should be enough for everyone.  If it doesn't support
enough arguments,
look in SystemD for inspiration.)

That seems easier to analyse.  Although it's bit kitchen-sinky,
no kitchen is complete without a sink, so maybe that's ok ...
(I assume the kitchen sink I-need-to-think-of-a-name-constructor
would be defined in the guix source code.)

* Epilogue

Disclaimer: I'm working on other things than shepherd currently,
though I have an unsubmitted ‘fd-mappings’ patch that needs some
more tests & needs to be checked on GNU/Hurd.  This issue is most
likely to move forward is someone actually codes something,
though any discussion is still informative.

> Thanks
> raid5atemyhomework

Greetings,
Maxime

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 21:49 ` Maxime Devos
@ 2021-03-20  2:42   ` raid5atemyhomework
  0 siblings, 0 replies; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-20  2:42 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guix-devel@gnu.org

Hello Maxime,

> Multi-threading seems complicated (but can be worth it). What work would you put
> on which thread (please give us some concrete to reason about, ‘thread X does A,
> and is created on $EVENT ...’)? A complication is that "fork" is ‘unsafe’ in a
> process with multiple threads (I don't know the details). Maybe that could be
> worked around. (E.g. by using system* on a helper binary that closes unneeded
> file descriptors and sets the umask .... and eventually uses exec to run the
> service binary.)

Perhaps the point is not so much to be *multi-threaded* but to be *concurrent*, and whether to be multi-*thread* or multi-*process* or multi-*fiber* is a lower-level detail.

For example, we can consider that for 99% of shepherd services, the `start` action returns a numeric PID, and the remaining ones return `#t`.  Rather than call the `start` action in the same thread/process/fiber that implements the Shepherd main loop:

* Create a pipe.
* `fork`.
  * On the child:
    * Close the pipe output end, make the input end cloexec.
    * Call the `start` action.
    * If an error is thrown, print out `(error <>)` with the serialization of the error to the pipe.
    * Otherwise, print out `(return <>)` with the return value of the `start` action.
  * On the parent:
    * Close the pipe input end, and add the output end and the pid of the forked child to one of the things we will poll in the mainloop.
    * Return to the mainloop.
* In the mainloop:
  * `poll` (or `select`, or whatever) the action pipes in addition to the `herd` command socket(s).
  * If an action pipe is ready, read it in:
    * If it EOFed then the sub-process died or execed without returning anything, treat as an error.
    * Similarly if unable to `read` from it.
    * Otherwise deserialize it and react appropriately as to whether it is an error or a return of some value.
  * If an action pipe has been around too long, kill the pid (and all child pids) and close the pipe and treat as an error.

I mean --- single-threaded/process/fibered, non-concurrent `init` systems have fallen by the wayside because of their brittleness.  SystemD is getting wide support precisely because it's an excellent concurrent `init` mechanism that is fairly tolerant of mistakes; a problem in one service unit is generally isolated in that service unit and its dependents and will not cause problems with the other service units.

Presumably, `fork`ing within a `fork`ed process has no issues since that is what is normally done in Unix anyway, it's threads that are weird.


> That's a bit of an over-simplification. At least in a singly-threaded Shepherd,
> waitpid could not be used by an individual service, as it
> (1) returns not often
> enough (if a process of another service exits and this action was waiting on
> a single pid) (2) too often (likewise,
> and this action was ‘just waiting’
> (3) potentially never (imagine every process has been started and an action
> was waiting for one to exit,
> and now the user tries to reboot. Then shepherd
> daemon would never accept the connection from the user!).

Yes, I agree it is an oversimplification, but in any case, there should be some smaller subset of things that can be (typically) done.

Then any escape hatch like `unsafe-turing-complete` or `without-static-analysis` can retain the full power of the language for those times when you *really* do need it.

>
> And in a multi-threaded Shepherd, "fork" is unsafe. (Anyway, "fork" captures
> too much state, and no action should ever call "exec" without "fork".) Also,
> "exec" replaces too little state (e.g. open file descriptors, umask, root
> directory, current working directory ...).
>
> But perhaps that's just a call for a different set of primitives (which could
> be implemented on top of these system calls by shepherd, or directly on top
> of the relevant Hurd RPCs when running on GNU/Hurd.).

Yes.

> > Yet the language is a full Turing-complete language, including the major
> > weakness of Turing-completeness: the inability to solve the halting problem.
>
> IIUC, there are plans to perform the network configuration inside shepherd
> (search for guile-netlink). So arbitrary code still needs to be allowed.
> This is the Procedural (extensible) I'll referring to later.
>
> > The fact that the halting problem is unsolved in the language means it is
> > possible to trivially write an infinite loop in the language. In the context
> > of an `init` system, the possibility of an infinite loop is dangerous, as it
> > means the system may never complete bootup.
>
> I'm assuming the hypothetical infinite loop is in, say, the code for starting
> guix-daemon which isn't very essential (try "herd stop guix-daemon"!)
> (If there's an infinite loop is in e.g. udev-service-type then there's indeed
> a problem, then that infinite loop might as well be in eudev itself, so I'm
> only speaking of non-essential services blocking the boot process.)
>
> I'm not convinced we need Turing-incompleteness here, at least if the
> start / do-a-dance action / stop code of each service definition is run in
> a separate thread (not currently the case).
>
> > I have experienced this firsthand since I wrote a Shepherd service to launch
> > a daemon, and I needed to wait for the daemon initialization to complete.
> > My implementation of this had a bug that caused an infinite loop to be entered,
> > but would only tickle this bug when a very particular (persistent on-disk) state
> > existed.
> > I wrote this code about a month or so ago, and never got to test it until last week,
> > when the bug was tickled. Unfortunately, by then I had removed older system versions
> > that predated the bug. When I looked at a backup copy of the `configuration.scm`, I
> > discovered the bug soon afterwards.
> > But the amount of code verbiage needed here had apparently overwhelmed me at the time I
> > wrote the code to do the waiting, and the bug got into the code and broke my system.
> > I had to completely reinstall Guix (fortunately the backup copy of `configuration.scm`
> > was helpful in recovering most of the system, also ZFS rocks).
>
> Ok, shepherd should be more robust (at least for non-essential services like guix-publish).
>
> > Yes, I made a mistake. I'm only human. It should be easy to recover from mistakes.
>
> I would need to know what the mistake and what ‘recovering’ would be exactly in this
> case, before agreeing or disagreeing whether this is something shepherd should help
> with.

It's what I described in the previous paragraph.  I wrote code to wait for startup, the code to wait for startup had a bug that caused it to enter an infinite loop under a certain edge case.

The edge case was not reached for a long time (almost a month), so I only found it a week ago when it triggered and broke my system.  In order to recover I had to reinstall Guix System completely, and none of the saved system generations had a configuration that predated the introduction of the bug.

If you want specifics --- as a sketch, what I did was something similar to this:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (- count 1)))) ;;; <--- (fuck)

The daemon died (this was the persistent on-disk edge case, arguably a bug in `daemon-started?` as well) causing `(daemon-started?)` to return `#f` permanently, and then I started counting *in the wrong direction*.


> > The full Turing-completeness of the language invites mistakes,
>
> Agreed (though ‘invites’ seems a little too strong here).

Come on.  Something as simple as "Wait for some condition to be true" needs code like the above to implement fully in the Shepherd action-description language (i.e. Scheme).  And because I was debating on "just put `(count 30)` in the `let` and decrement" versus "Naaa just count up like a normal person" and got my wires crossed, I got into the above completely dumb mistake.

Indeed, not only should `-` be changed to `+` in the above, but it's also better to do something like this as well:

    (let wait-for-startup ((count 0))
      (cond
        ((daemon-started?) ; if the daemon died this would return #f
         #t)
        ((not (zero? (car (waitpid daemon-pid WNOHANG))))
         (format #t "daemon died!~%")
         #f)
        ((> count 30)
         (format #t "daemon taking too long, continuing with boot~%")
         #f))
        (else
         (sleep 1)
         (wait-for-startup (+ count 1))))

Because if the daemon died, waiting for 30 seconds would be fairly dumb as well.

That is a fair bit of verbiage, and much harder to read through for bugs.

As per information theory: the more possible things I can talk about, the more bits I have to push through a communication channel to convey it.  Turing-complete languages can do more things than more restricted  domain-specific languages, so on average it requires more code to implement an arbitrary thing in the Turing-complete language (thus requiring more careful review) than the restricted domain-specific language.

> -   Section 3: A nicely behaving API.
>
> > So what can we do?
> > For one, a Turing-complete language is a strict superset of non-Turing-complete
> > languages. So one thing we can do is to define a more dedicated language for Shepherd
> > actions, strongly encourage the use of that sub-language, and, at some point, require
> > that truly Turing-complete actions need to be wrapped in a `(unsafe-turing-complete ...)` form.
> > For example, in addition to the current existing `make-forkexec-constructor` and
> > `make-kill-destructor`, let us also add these syntax forms:
>
> > `(wait-for-condition <timeout> <form>)` - Return a procedure that accepts a numeric `pid`, that does: Check if evaluating `<form>` in the lexical context results in `#f`. If so, wait one second and re-evaluate, but exit anyway and return `#f` if `<timeout>` seconds has passed, or if the `pid` has died. If `<form>` evaluates to non-`#f` then return it immediately.
> > `(timed-action <timeout> <form> ...)` - Return a procedure that accepts a numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed process group?). The new thread evaluates `<form> ...` in sequence. If the thread completes or throws in `<timeout>` seconds, return the result or throw the exception in the main thread. IF the `<timeout>` is reached or the
> > given `pid` has died, kill the thread and any processes it may have spawned, then return `#f`.
> > `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid` that does: Check if the `pid` has died, if so, return #t. Else sleep 1 second and try again, and if `<timeout>` is reached, return `#f`.
>
> Hard to say if these syntaxes are useful in advance.

In my particular case I could have just written `(wait-for-condition 30 (daemon-started?))`.  I imagine a fair number of wait-for-daemon-startup things can use that or `(timed-action 30 (invoke #$waiter-program))`.

>
> In any case, I don't like numeric pids, as they are reused.
> Could we use something like pidfds (Linux) or whatever the Hurd has instead?
> (Wrapped in nice type in Scheme, maybe named <task> with a predicate task?,
> a predicate task-dead?, a wait-for-task-dead operation (if we use guile-fibers).)

Certainly, but that may require breaking changes to existing specific shepherd actions.


> Note: (perform-operation (choice-operation (wait-for-task-dead task) (sleep 5)))
> == wait until task is dead OR 5 seconds have passed. Operations in guile-fibers
> combine nicely!
>
> Also, sprinkling time-outs everywhere seems rather dubious to me (if that's
> what timed-action is for). If some process is taking long to start,
> then by killing it it will take forever to start.

Sprinkling timeouts makes the halting problem trivial: for any code that is wrapped in a timeout, that code will halt (either by itself, or by hitting the timeout).

Indeed, you can make a practical total functional language simply by requiring that recursions have a decrementing counter that will abort computation when it reaches 0.  It could use mostly partial code patterns with the addition of that decrementing counter, and it becomes total.  In many ways the programmer is just assuring the compiler "this operation will not take more than N recursions".


>
> > The above forms should also report as well what they are doing (e.g. `Have been waiting 10 of 30 seconds for <form>`) on the console and/or syslog.
>
> Something like that would be useful, yes. But printing <form> itself seems a bit
> low-level to me.

Yes, but if it's something that the end-user wrote in their `configuration.scm`, it seems fine to me.

>
> > In addition, `make-forkexec-constructor` should accept a `#:post-fork`, which is
> > a procedure that accepts a `pid`, and `make-kill-destructor` should accept a
> > `#:pre-kill`, which is a procedure that accepts a `pid`. Possibly we need to
> > add combinators for the above actions as well. For example a `sub-actions`
> > procedural form that accepts any number of functions of the above `pid -> bool`
> > type and creates a single combined `pid -> bool`.
>
> > So for example, in my case, I would have written a `make-forkexec-constructor`
> > that accepted a `#:post-fork` that had a `wait-for-condition` on the code that
> > hecked if the daemon initialization completed correctly.
> > I think it is a common enough pattern that:
> >
> > -   We have to spawn a daemon.
> > -   We have to wait for the daemon to be properly initialized (`#:post-fork`)
> > -   When shutting down the daemon, it's best to at least try to politely ask it
> >     to finish using whatever means the daemon has (`#:pre-kill`).
> >
> > -   If the daemon doesn't exit soon after we politely asked it to, be less polite and start sending signals.
> >
> > So I think the above should cover a good number of necessities.
>
> Agreed. My own proposal, which I like better (-: :
> (this is the ‘procedural’ (extensible) interface)
>
> -   Multi-threading using guile-fibers (we'll use condition variables & channels).
> -   Instead of make-forkexec-constructor, we have
>     make-process-constructor, as fork + exec as-is is troublesome in
>     a multi-threaded application (see POSIX).
>
>     (It still accepts #:user, #:group & the like)
>
>     make-process-constructor returns a thunk as usual,
>     but this thunk THUNK works differently:
>
>     THUNK : () -> <process-task>.
>
>
> <process-task> is a GOOPS class representing processes (parent: <shepherd-task>),
> that do not neccessarily exist yet / anymore. <process-task>
> has some fields:
>
> * stopped?-cond
> (condition variable, only signalled after exit-value is set)
> * exit-value
> (an atomic box, value #f or an integer,
> only read after stopped?-cond is signalled
> or whenever the user is feeling impatient and asking shepherd
> how it is going.)
> * stop!-cond (condition variable)
> * stop-forcefully!-cond (condition variable)
> * started-cond (condition variable, only signalled when ‘pid’ has been set)
> * pid (an atomic box)
> * maybe some other condition variables?
> * other atomic boxes?
>
> THUNK also starts a fiber associated with the returned <process-task>,
> which tries to start the service binary in the background (could also be used
> for other things than the service binary). When the binary has been started,
> the fiber signals started-cond.
>
> The fiber will wait for the process to exit, and sets exit-value and
> signals stopped?-cond when it does. It will also wait for a polite request
> to stop the service binary (stop!-cond) and impolite requests (for SIGKILL?)
> (stop-forcefully!-cond).
>
> (Does shepherd even have an equivalent to stop-forcefully! ?)
>
> kill-destructor is adjusted to not work with pids directly, but rather
> with <process-task>. It signals stop!-cond or stop-forcefully!-cond.
>
> (So process creation & destruction happens asynchronuously, but there
> still is an easy way to wait until they are actually started.)
>
> -   #:pre-kill & #:post-fork seem rather ad-hoc to me. I advise
>     still wrapping make-process-constructor & kill-destructor.
>     However, the thunk still needs to return a <shepherd-task> (or subclass
>     of course).
>
>     If the thunk does anything that could block / loop (e.g. if it's a HTTP
>     server, then the wrapper might want to wait until the process is actually
>     listening at port 80), then this blocking / looping must be done in a fiber.
>
>     Likewise if the killer does anything that could block / loop (e.g.
>     first asking the process to exit nicely before killing it after a time-out).

Sure, this design could work better as well.  But *some* restricted dialect should exist, and we should discourage people from using the full Guile language in Shepherd actions they write bespoke in their `configuration.scm`, not tout it as a feature.

> -   Section 4: Static analysis
>
> > Then, we can define a strict subset of Guile, containing a set of forms we know are
> > total (i.e. we have a proof they will always halt for all inputs). Then any Shepherd
> > actions, being Lisp code and therefore homoiconic, can be analyzed. Every list must
> > have a `car` that is a symbol naming a syntax or procedure that is known to be safe
> > --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, `quote`
> > (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thunk combinators,
> > as well as the domain-specific `make-forkexec-constructor`, `make-kill-destructor`,
> > `wait-for-condition`, `timed-action`, and probably some of the `(guix build utils)`
> > like `invoke`, `invoke/quiet`, [...]
>
> `invoke' has no place in your subset, as the invoked program can take arbitrarily long.
> (Shouldn't be a problem when multi-threading, but then there's less need for this kind
> of static analysis.)

Well, the thing is --- I can probably write a shell script and test it **outside of Shepherd**.  In fact, in my mistake above, on a previous non-Guix system, I ***did*** use a shell script to wait for daemon startup.  But when I ported it over to Guix, I decided to rewrite it as Scheme code to "really dive into the Guix system".

Before, when writing it as a shell script I was able to test it and indeed found problems and fixed them.  But as Scheme code, well, it's a bit harder, since I have to go import a bunch of things (that I have to go search for, and I have to figure out Guile path settings for the bits that are provided by Guix, etc.), and then afterwards I have to copy back the tested code into my `configuration.scm`.  This was enough of an annoyance to me that it discouraged me from testing it, which could have caught the above bug.

Since `invoke` calls an external program, which runs isolated in its own process (and can be `kill`ed in order to unstuck Shepherd), and can more easily be independently tested outside the `configuration.scm`, it's substantially safer than writing the same logic directly in the Shepherd language, so it gets a pass.

Many other daemons that have some kind of "wait-for-daemon-startup" program will often include tests for the "wait" program as well in its test suite, so using `invoke` is substantially safer than writing bespoke code in Scheme that performs the same action.

> > `mkdir-p` etc.
>
> I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some procedure
> (prepare-directory
> `("/var/stuff #:bits #o700 #:user ... #:group
> ("blabla" #:bits ...
> #:atomic-generate-contents ,(lambda (port) (display 'stuff port))))
> ...))
>
> ... automatically taking care of symlinks & detecting whether there are conflicts
> with different services.
>
> > Sub-forms (or the entire form for an action) can be wrapped in `(unsafe-turing-complete ...)`
> > to skip this analysis for the sub-form,
>
> An escape hatch seems good to me, though I would rather call it
> `(without-static-analysis (totality) ...)'. The forms are not neccesarily ‘unsafe’,
> only potentially so. We could also define a checkers for other issues this way.
>
> (I'm thinking of potential TOCTTOU (time of check to time of use) problems involving
> symbolic links.)
>
> > but otherwise, by default, the specific subset must be used, and users have to
> > explicitly put `(unsafe-turing-complete ...)` so they are aware that they can
> > seriously break their system if they make a mistake here. Ideally, as much of
> > the code for any Shepherd action should be outside an ``unsafe-turing-complete`, and only parts of the code that really need the full Guile language to implement should be rapped in`unsafe-turing-complete`.
>
> I'm getting Rusty vibes here. Sounds sound to me.
>
> > (`lambda` and `let-syntax` and `let`, because they can rebind the meanings of symbols,
> > would need to be in `unsafe-turing-complete` --- otherwise the analysis routine would
> > require a full syntax expander as well)
>
> No. The analysis routine should not directly work on Scheme code, but rather on
> the 'tree-il' (or maybe on the CPS code, I dunno).
> Try (macro-expand '(let ((a 0)) a)) from a Guile REPL.
>
> > Turing-completeness is a weakness, not a strength, and restricting languages to be
> > the Least Powerful necessary is better. The `unsafe-turing-complete` form allows
> > an escape, but by default Shepherd code should be in the restricted non-Turing-complete
> > subset, to reduce the scope for catastrophic mistakes.
>
> I'm assuming make-fork+exec-constructor/container would be defined in Scheme
> and be whitelisted in raid5atemyhomework-scheme?

Yes.

>
> -   Section 5 -- Declarative
>
>     Something you don't seem to have considered: defining services in a declarative
>     language! Hypothetical example à la SystemD, written in something vaguely
>     resembling Wisp (Wisp is an alternative read syntax for Scheme with less
>     parentheses):

I prefer SRFI 110 myself.

> (I'm a bit rusty with the details on defining shepherd services
>     in Guix:
>
>     shepherd-service
>     start `I-need-to-think-of-a-name-constructor #:binary #$(file-append package "/bin/stuff") #:arguments stuff #:umask suff #:user stuff #:group stuff ,@(if linux`(#:allowed-syscalls ',(x y z ...)))
>     #:friendly-shutdown
>     thunk
>     invoke #$(file-append package "/bin/stop-stuff-please")
>     #:polling-test-started?
>     thunk
>     if . file-exists? "/run/i-am-really-started"
>     #true
>
>     ;; tell I-need-to-think-of.... to wait 4 secs
>     ;; before trying again
>     seconds 4
>     ;; how much time starting may take
>     ;; until we consider things failed
>     ;; & stop the process
>     #:startup-timeout?
>     thunk
>     if spinning-disk?
>     plenty
>     little
>     #:actions etcetera
>     ...
>
>     Well, that's technically still procedural, but this starts to look
>     like a SystemD configuration file due to the many keyword arguments!
>
>     (Here (thunk X ...) == (lambda () X ...), and
>     I-need-to-think-of-a-name-constructor
>     is a procedure with very many options, that
>     ‘should be enough for everyone. If it doesn't support
>     enough arguments,
>     look in SystemD for inspiration.)
>
>     That seems easier to analyse. Although it's bit kitchen-sinky,
>     no kitchen is complete without a sink, so maybe that's ok ...
>     (I assume the kitchen sink I-need-to-think-of-a-name-constructor
>     would be defined in the guix source code.)


Well, yes ---- but then we can start arguing that using SystemD would be better, as it is more battle-tested and already exists and all the kitchen-sink features are already there and a lot more people can hack its language than can hack the specific dialect of scheme (augmented with Guixy things like `make-forkexec-constructor`) used by GNU Shepherd.

Thanks
raid5atemyhomework


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
  2021-03-19 21:49 ` Maxime Devos
@ 2021-03-20  4:02 ` Maxim Cournoyer
  2021-03-20  5:07 ` Mark H Weaver
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Maxim Cournoyer @ 2021-03-20  4:02 UTC (permalink / raw)
  To: raid5atemyhomework; +Cc: guix-devel@gnu.org

Hi,

raid5atemyhomework <raid5atemyhomework@protonmail.com> writes:

> GNU Shepherd is the `init` system used by GNU Guix.  It features:
>
> * A rich full Scheme language to describe actions.
> * A simple core that is easy to maintain.
>
> However, in this critique, I contend that these features are bugs.
>
> The Shepherd language for describing actions on Shepherd daemons is a
> Turing-complete Guile language.  Turing completeness runs afoul of the
> Principle of Least Power.  In principle, all that actions have to do
> is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.  Yet the
> language is a full Turing-complete language, including the major
> weakness of Turing-completeness: the inability to solve the halting
> problem.
>
> The fact that the halting problem is unsolved in the language means it
> is possible to trivially write an infinite loop in the language.  In
> the context of an `init` system, the possibility of an infinite loop
> is dangerous, as it means the system may never complete bootup.
>
> Now, let us combine this with the second feature (really a bug): GNU
> shepherd is a simple, single-threaded Scheme program.  That means that
> if the single thread enters an infinite loop (because of a Shepherd
> service description that entered an infinite loop), then Shepherd
> itself hangs.  This means that the system is severely broken.  You
> cannot `sudo reboot` because that communicates with the Shepherd
> daemon.  You cannot even `kill 1` because signals are handled in the
> mainloop, which never get entered if the `start` action of some
> Shepherd daemon managed to enter an infinite loop.
>
> I have experienced this firsthand since I wrote a Shepherd service to
> launch a daemon, and I needed to wait for the daemon initialization to
> complete.  My implementation of this had a bug that caused an infinite
> loop to be entered, but would only tickle this bug when a very
> particular (persistent on-disk) state existed.  I wrote this code
> about a month or so ago, and never got to test it until last week,
> when the bug was tickled.  Unfortunately, by then I had removed older
> system versions that predated the bug.  When I looked at a backup copy
> of the `configuration.scm`, I discovered the bug soon afterwards.  But
> the amount of code verbiage needed here had apparently overwhelmed me
> at the time I wrote the code to do the waiting, and the bug got into
> the code and broke my system.  I had to completely reinstall Guix
> (fortunately the backup copy of `configuration.scm` was helpful in
> recovering most of the system, also ZFS rocks).
>
> Yes, I made a mistake.  I'm only human.  It should be easy to recover
> from mistakes.  The full Turing-completeness of the language invites
> mistakes, and the single-threadedness makes the infinite-loop mistakes
> that Turing-completeness invites, into catastrophic system breakages.
>
> So what can we do?
>
> For one, a Turing-complete language is a strict *superset* of
> non-Turing-complete languages.  So one thing we can do is to define a
> more dedicated language for Shepherd actions, strongly encourage the
> use of that sub-language, and, at some point, require that truly
> Turing-complete actions need to be wrapped in a
> `(unsafe-turing-complete ...)` form.
>
> For example, in addition to the current existing
> `make-forkexec-constructor` and `make-kill-destructor`, let us also
> add these syntax forms:
>
> `(wait-for-condition <timeout> <form>)` - Return a procedure that
> accepts a numeric `pid`, that does: Check if evaluating `<form>` in
> the lexical context results in `#f`.  If so, wait one second and
> re-evaluate, but exit anyway and return `#f` if `<timeout>` seconds
> has passed, or if the `pid` has died.  If `<form>` evaluates to
> non-`#f` then return it immediately.
>
> `(timed-action <timeout> <form> ...)` - Return a procedure that
> accepts a numeric `pid`, that does: Spawn a new thread (or maybe
> better a `fork`ed process group?).  The new thread evaluates `<form>
> ...` in sequence.  If the thread completes or throws in `<timeout>`
> seconds, return the result or throw the exception in the main thread.
> IF the `<timeout>` is reached or the given `pid` has died, kill the
> thread and any processes it may have spawned, then return `#f`.
>
> `(wait-to-die <timeout>)` - Return a procedure that accepts a `pid`
> that does: Check if the `pid` has died, if so, return #t.  Else sleep
> 1 second and try again, and if `<timeout>` is reached, return `#f`.
>
> The above forms should also report as well what they are doing
> (e.g. `Have been waiting 10 of 30 seconds for <form>`) on the console
> and/or syslog.
>
> In addition, `make-forkexec-constructor` should accept a
> `#:post-fork`, which is a procedure that accepts a `pid`, and
> `make-kill-destructor` should accept a `#:pre-kill`, which is a
> procedure that accepts a `pid`.  Possibly we need to add combinators
> for the above actions as well.  For example a `sub-actions` procedural
> form that accepts any number of functions of the above `pid -> bool`
> type and creates a single combined `pid -> bool`.
>
> So for example, in my case, I would have written a
> `make-forkexec-constructor` that accepted a `#:post-fork` that had a
> `wait-for-condition` on the code that checked if the daemon
> initialization completed correctly.
>
> I think it is a common enough pattern that:
>
> * We have to spawn a daemon.
> * We have to wait for the daemon to be properly initialized (`#:post-fork`)
> * When shutting down the daemon, it's best to at least try to politely
> ask it to finish using whatever means the daemon has (`#:pre-kill`).
> * If the daemon doesn't exit soon after we politely asked it to, be
> less polite and start sending signals.
>
> So I think the above should cover a good number of necessities.
>
>
> Then, we can define a strict subset of Guile, containing a set of
> forms we know are total (i.e. we have a proof they will always halt
> for all inputs).  Then any Shepherd actions, being Lisp code and
> therefore homoiconic, can be analyzed.  Every list must have a `car`
> that is a symbol naming a syntax or procedure that is known to be safe
> --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`,
> `quote` (which also prevents analysis of sub-lists), `and`, `or`,
> `begin`, thunk combinators, as well as the domain-specific
> `make-forkexec-constructor`, `make-kill-destructor`,
> `wait-for-condition`, `timed-action`, and probably some of the `(guix
> build utils)` like `invoke`, `invoke/quiet`, `mkdir-p` etc.
>
> Sub-forms (or the entire form for an action) can be wrapped in
> `(unsafe-turing-complete ...)` to skip this analysis for the sub-form,
> but otherwise, by default, the specific subset must be used, and users
> have to explicitly put `(unsafe-turing-complete ...)` so they are
> aware that they can seriously break their system if they make a
> mistake here.  Ideally, as much of the code for any Shepherd action
> should be *outside* an ``unsafe-turing-complete`, and only parts of
> the code that really need the full Guile language to implement should
> be rapped in `unsafe-turing-complete`.
>
> (`lambda` and `let-syntax` and `let`, because they can rebind the
> meanings of symbols, would need to be in `unsafe-turing-complete` ---
> otherwise the analysis routine would require a full syntax expander as
> well)
>
>
> Turing-completeness is a weakness, not a strength, and restricting
> languages to be the Least Powerful necessary is better.  The
> `unsafe-turing-complete` form allows an escape, but by default
> Shepherd code should be in the restricted non-Turing-complete subset,
> to reduce the scope for catastrophic mistakes.

Thanks for the well written critique.  Turing-completeness is a two
edged sword; with great power comes great responsibility :-).  The same
could be said of our use of Guile for writing package definitions, but
in a context where mistakes are easier to catch.

Your proposal is also interesting... I'm not sure if I like it yet but
I'll keep thinking about it.  The reason I'm somewhat reluctant is for
the same reason I think using Guile to write package definitions is
great: maximum expressiveness and flexibility.  If we were to try and
start restricting that core feature, it wouldn't really be Shepherd
anymore.  Other init systems such as systemd keep their descriptive
language minimal but accommodate running pre or post actions (typically
scripts) very easily (but with restrictions about how long they can run
in the case of systemd IIRC).

Thanks for having shared your analysis!

Maxim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
  2021-03-19 21:49 ` Maxime Devos
  2021-03-20  4:02 ` Maxim Cournoyer
@ 2021-03-20  5:07 ` Mark H Weaver
  2021-03-20 11:10   ` raid5atemyhomework
  2021-03-20 16:58 ` Ludovic Courtès
  2021-03-22 13:42 ` raingloom
  4 siblings, 1 reply; 13+ messages in thread
From: Mark H Weaver @ 2021-03-20  5:07 UTC (permalink / raw)
  To: raid5atemyhomework, guix-devel@gnu.org

Hi,

raid5atemyhomework <raid5atemyhomework@protonmail.com> writes:

> GNU Shepherd is the `init` system used by GNU Guix.  It features:
>
> * A rich full Scheme language to describe actions.
> * A simple core that is easy to maintain.
>
> However, in this critique, I contend that these features are bugs.
>
> The Shepherd language for describing actions on Shepherd daemons is a
> Turing-complete Guile language.  Turing completeness runs afoul of the
> Principle of Least Power.  In principle, all that actions have to do
> is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.

These 4 calls are already enough to run "sleep 100000000000" and wait
for it to finish, or to rebuild your Guix system with an extra patch
added to glibc.

> Yet the language is a full Turing-complete language, including the
> major weakness of Turing-completeness: the inability to solve the
> halting problem.
>
> The fact that the halting problem is unsolved in the language means it
> is possible to trivially write an infinite loop in the language.  In
> the context of an `init` system, the possibility of an infinite loop
> is dangerous, as it means the system may never complete bootup.

Limiting ourselves to strictly total functions wouldn't help much here,
because for all practical purposes, computing 10^100 digits of Pi is
just as bad as an infinite loop.

That said, I certainly agree that Shepherd could use improvement, and
I'm glad that you've started this discussion.

At a glance, your idea of having Shepherd do more within subprocesses
looks promising to me, although this is not my area of expertise.

     Thanks,
       Mark


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-20  5:07 ` Mark H Weaver
@ 2021-03-20 11:10   ` raid5atemyhomework
  0 siblings, 0 replies; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-20 11:10 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel@gnu.org

Good rmoning Mark,

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com writes:
>
> > GNU Shepherd is the `init` system used by GNU Guix. It features:
> >
> > -   A rich full Scheme language to describe actions.
> > -   A simple core that is easy to maintain.
> >
> > However, in this critique, I contend that these features are bugs.
> > The Shepherd language for describing actions on Shepherd daemons is a
> > Turing-complete Guile language. Turing completeness runs afoul of the
> > Principle of Least Power. In principle, all that actions have to do
> > is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls.
>
> These 4 calls are already enough to run "sleep 100000000000" and wait
> for it to finish, or to rebuild your Guix system with an extra patch
> added to glibc.

I agree.  But this mechanism is intended to avoid stupid mistakes like what I committed, not protect against an attacker who is capable of invoking `guix system reconfigure` on arbitrary Scheme code (and can easily wrap anything nefarious in any `unsafe-turing-complete` or `without-static-analysis` escape mechanism).  Seatbelts, not steel walls.

>
> > Yet the language is a full Turing-complete language, including the
> > major weakness of Turing-completeness: the inability to solve the
> > halting problem.
> > The fact that the halting problem is unsolved in the language means it
> > is possible to trivially write an infinite loop in the language. In
> > the context of an `init` system, the possibility of an infinite loop
> > is dangerous, as it means the system may never complete bootup.
>
> Limiting ourselves to strictly total functions wouldn't help much here,
> because for all practical purposes, computing 10^100 digits of Pi is
> just as bad as an infinite loop.

Indeed.  Again, seatbelts, not steel walls.  It's fairly difficult to commit a mistake that causes you to accidentally write a program that computes 10^100 digits of pi, not so difficult to have a brain fart and use `(- count 1)` instead of `(+ count 1)` because you were wondering idly whether an increment or a decrement loop would be more Scemey or if both are just as Schemey as the other.

What I propose would protect against the latter (a much more likely mistake), as in-context the recursive loop would be flagged since the recursion would be flagged due to being a call to a function that is not a member of a whitelist.  Hopefully getting recursive loops flagged would make the sysad writing `configuration.scm` look for the "proper" way to wait for an event to be true, and hopefully lead to them discovering the (hopefully extant) documentation on whatever domain-specific language we have for waiting for the event to be true instead of rolling their own.

> That said, I certainly agree that Shepherd could use improvement, and
> I'm glad that you've started this discussion.
>
> At a glance, your idea of having Shepherd do more within subprocesses
> looks promising to me, although this is not my area of expertise.

An issue here is that we sometimes pass data across Shepherd actions using environment variables, which do not cross process boundaries.  Xref. the `set-http-proxy` of `guix-daemon`; the environment variable is used as a global namespace that is accessible from both the `set-http-proxy` and `start` actions.

On the other hand, arguably the environment variable table is a global resource shared amongst multiple shepherd daemons.  This technique in general may not scale well for large numbers of daemons; environment variable name conflicts may cause subtle problems later.  I think it would be better if in addition to the "value" (typically the PID) each Shepherd service also had a `settings` (which can be used to contain anything that satisfies `(lambda (x) (equal? x (read (print x))))` so that it can be easily serialized across each subprocess launched by each action) that can be read and modified by each action.  Then the `set-http-proxy` action would update this `settings` field for the shepherd service, then queue up a `restart` action.  It could by convention be an association list.

This would also persist the `http_proxy` setting, BTW --- currently if you `herd set-http-proxy guix-daemon <whatever>` and then `herd restart guix-daemon` later, the HTTP proxy is lost (since the environment variable is cleared after `set-http-proxy` restarts the `guix-daemon`).  In short, this `set-http-proxy` example looks like a fairly brittle hack anyway, and maybe worth avoiding as a pattern.

Then there's actions that invoke other actions.  From a cursory glance at the Guix code it looks like only Ganeti and Guix-Daemon have actions that invoke actions, and they only invoke actions on their own Shepherd services.  It seems to me safe for an action invoked in another action of the same service to *not* spawn a new process, but to execute as the same process.  Not sure how safe it would be to allow one shepherd service to invoke an action on another shepherd service --- but then the `start` action of any service may cause other services it requires to be started as well, so we still do need to figure out what subprocesses to launch or not launch.

Or maybe each Shepherd service has its own subprocess that is its own mainloop, and the "main" Shepherd process mainloop "just" serves as a switching center to forward commands to each service's mainloop-subprocess, and also incidentally monitors per-service mainloop-subprocess that are not responding fast enough (and possibly decide to kill those mainloops and all its children, then disable that service).  This would make each service's environment variables a persistent but local store that is specific to each service and makes its use in `guix-daemon` safe, and the `set-http-proxy` would simply not clear the env vars so that the setting persists.  This allows Shepherd to remain responsive at all times even if some action of some Shepherd service enters an infloop or 10^100 pi digits condition; it could even have `herd status` report the number of pending unhandled commands for each service to inform the sysad about possible problems with specific services.

Thanks
raid5atemyhomework

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
                   ` (2 preceding siblings ...)
  2021-03-20  5:07 ` Mark H Weaver
@ 2021-03-20 16:58 ` Ludovic Courtès
  2021-03-21  0:22   ` raid5atemyhomework
  2021-03-22 13:42 ` raingloom
  4 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2021-03-20 16:58 UTC (permalink / raw)
  To: raid5atemyhomework; +Cc: guix-devel@gnu.org

Hi,

raid5atemyhomework <raid5atemyhomework@protonmail.com> skribis:

> Now, let us combine this with the second feature (really a bug): GNU
> shepherd is a simple, single-threaded Scheme program.  That means that
> if the single thread enters an infinite loop (because of a Shepherd
> service description that entered an infinite loop), then Shepherd
> itself hangs.

You’re right that it’s an issue; in practice, it’s okay because we pay
attention to the code we run there, but obviously, mistakes could lead
to the situation you describe.

It’s a known problem and there are plans to address it, discussed on
this list a few times before.  The Shepherd “recently” switched to
‘signalfd’ for signal handling in the main loop, with an eye on making
the whole loop event-driven:

  https://issues.guix.gnu.org/41507

This will address this issue and unlock things like “socket activation”.

That said, let’s not lie to ourselves: the Shepherd’s design is
simplistic.  I think that’s okay though because there’s a way to address
the main issues while keeping it simple.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-20 16:58 ` Ludovic Courtès
@ 2021-03-21  0:22   ` raid5atemyhomework
  2021-03-22 17:02     ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-21  0:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel@gnu.org

Hello Ludo',

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com skribis:
>
> > Now, let us combine this with the second feature (really a bug): GNU
> > shepherd is a simple, single-threaded Scheme program. That means that
> > if the single thread enters an infinite loop (because of a Shepherd
> > service description that entered an infinite loop), then Shepherd
> > itself hangs.
>
> You’re right that it’s an issue; in practice, it’s okay because we pay
> attention to the code we run there, but obviously, mistakes could lead
> to the situation you describe.
>
> It’s a known problem and there are plans to address it, discussed on
> this list a few times before. The Shepherd “recently” switched to
> ‘signalfd’ for signal handling in the main loop, with an eye on making
> the whole loop event-driven:
>
> https://issues.guix.gnu.org/41507
>
> This will address this issue and unlock things like “socket activation”.
>
> That said, let’s not lie to ourselves: the Shepherd’s design is
> simplistic. I think that’s okay though because there’s a way to address
> the main issues while keeping it simple.

I'm not sure you can afford to keep it simple.  Consider: https://issues.guix.gnu.org/47253

In that issue, the `networking` provision comes up potentially *before* the network is, in fact, up.  This means that other daemons that require `networking` could potentially be started before the network connection is up.

One example of such a daemon is `transmission-daemon`.  This daemon will bind itself to port 9091 so you can control it.  Unfortunately, if it gets started while network is down, it will be unable to bind to 9091 (so you can't control it) but still keep running.  On my system that means that on reboot I have to manually `sudo herd restart trannsmission-daemon`.

In another example, I have a custom daemon that I have set up to use the Tor proxy over 127.0.0.1:9050.  It requires both `networking` and `tor`.  When it starts after `networking` comes up but before the actual network does, it dies because it can't access the proxy at 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).  Then shepherd respawns it, then it dies again (network still not up) enough times that it gets disabled.  This means that on reboot I have to manually `sudo herd enable raid5atemyhomework-custom-daemon` and `sudo herd restart raid5atemyhomework-custom-daemon`.

On SystemD-based systems, there's a `NetworkManager-network-online.service` which just calls `nm-online -s -q --timeout=30`.  This delays network-requiring daemons until after the network is in fact actually up.

However in https://issues.guix.gnu.org/47253#1 Mark points out this is undesirable in the Guix case since it could potentially stall the (single-threaded) bootup process for up to 30 seconds if the network is physically disconnected, a bad UX for desktop and laptop users (who might still want to run `transmission-daemon`, BTW) because it potentially blocks the initialization of X and make the computer unusable for such users for up to 30 seconds after boot.  I note that I experienced such issues in some very old Ubuntu installations, as well.

SystemD can afford to *always* have `nm-online -s -q --timeout=30` because it's concurrent.  The `network-online.service` will block, but other services like X that don't ***need*** the network will continue booting.  So the user can still get to a usable system even if the boot isn't complete because the network isn't up yet due to factors beyond the control of the operating system.

Switching to a concurrent design for Shepherd --- *any* concurrent design --- is probably best done sooner rather than later, because it risks strongly affecting customized `configuration.scm`s like mine that have almost a half dozen custom Shepherd daemons.

Thanks
raid5atemyhomework

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
                   ` (3 preceding siblings ...)
  2021-03-20 16:58 ` Ludovic Courtès
@ 2021-03-22 13:42 ` raingloom
  2021-03-22 17:50   ` Martin
  4 siblings, 1 reply; 13+ messages in thread
From: raingloom @ 2021-03-22 13:42 UTC (permalink / raw)
  To: guix-devel@gnu.org

On Fri, 19 Mar 2021 17:33:57 +0000
raid5atemyhomework <raid5atemyhomework@protonmail.com> wrote:

> The Shepherd language for describing actions on Shepherd daemons is a
> Turing-complete Guile language.  Turing completeness runs afoul of
> the Principle of Least Power.

Erlang is turing complete and yet it is famously excellent at managing
services in a robust way.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-21  0:22   ` raid5atemyhomework
@ 2021-03-22 17:02     ` Ludovic Courtès
  2021-03-24 14:29       ` raid5atemyhomework
  2021-03-24 14:48       ` raid5atemyhomework
  0 siblings, 2 replies; 13+ messages in thread
From: Ludovic Courtès @ 2021-03-22 17:02 UTC (permalink / raw)
  To: raid5atemyhomework; +Cc: guix-devel@gnu.org

Hi,

raid5atemyhomework <raid5atemyhomework@protonmail.com> skribis:

> I'm not sure you can afford to keep it simple.

It has limitations but it does the job—just like many other init systems
did the job before the advent of systemd.

> Consider: https://issues.guix.gnu.org/47253
>
> In that issue, the `networking` provision comes up potentially *before* the network is, in fact, up.  This means that other daemons that require `networking` could potentially be started before the network connection is up.

The ‘networking’ service is just here to say that some network interface
is up or will be up.  It’s admittedly vague and weak, but it’s enough
for most daemons; they just need to be able to bind and listen to some
port.

> One example of such a daemon is `transmission-daemon`.  This daemon will bind itself to port 9091 so you can control it.  Unfortunately, if it gets started while network is down, it will be unable to bind to 9091 (so you can't control it) but still keep running.  On my system that means that on reboot I have to manually `sudo herd restart trannsmission-daemon`.

Could you report a bug for this one?  I don’t see why it’d fail to bind.

> In another example, I have a custom daemon that I have set up to use
> the Tor proxy over 127.0.0.1:9050.  It requires both `networking` and
> `tor`.  When it starts after `networking` comes up but before the
> actual network does, it dies because it can't access the proxy at
> 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).

Loopback is handled by the ‘loopback’ shepherd service, which is
provided via ‘%base-services’.  Perhaps you just need to have your
service depend on it?

> Switching to a concurrent design for Shepherd --- *any* concurrent design --- is probably best done sooner rather than later, because it risks strongly affecting customized `configuration.scm`s like mine that have almost a half dozen custom Shepherd daemons.

I suspect the main issue here is undeclared dependencies of some of the
Shepherd services you mention.

I like the “sooner rather than later” bit, though: it sounds like you’re
about to send patches or announce some sponsorship program?…  :-)

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-22 13:42 ` raingloom
@ 2021-03-22 17:50   ` Martin
  0 siblings, 0 replies; 13+ messages in thread
From: Martin @ 2021-03-22 17:50 UTC (permalink / raw)
  To: raingloom, guix-devel@gnu.org


> On Fri, 19 Mar 2021 17:33:57 +0000
> raid5atemyhomework <raid5atemyhomework@protonmail.com> wrote:
>
>> The Shepherd language for describing actions on Shepherd daemons is a
>> Turing-complete Guile language.  Turing completeness runs afoul of
>> the Principle of Least Power.
> Erlang is turing complete and yet it is famously excellent at managing
> services in a robust way.
>
Ironically nowadays almost everything is Turing complete: 
https://www.gwern.net/Turing-complete

Lack of Turing-completeness by design can be a real feature, i.e. 
bitcoin script is a perfect practical example of it.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-22 17:02     ` Ludovic Courtès
@ 2021-03-24 14:29       ` raid5atemyhomework
  2021-03-24 14:48       ` raid5atemyhomework
  1 sibling, 0 replies; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-24 14:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel@gnu.org

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 23, 2021 1:02 AM, Ludovic Courtès <ludo@gnu.org> wrote:

> Hi,
>
> raid5atemyhomework raid5atemyhomework@protonmail.com skribis:
>
> > I'm not sure you can afford to keep it simple.
>
> It has limitations but it does the job—just like many other init systems
> did the job before the advent of systemd.
>
> > Consider: https://issues.guix.gnu.org/47253
> > In that issue, the `networking` provision comes up potentially before the network is, in fact, up. This means that other daemons that require `networking` could potentially be started before the network connection is up.
>
> The ‘networking’ service is just here to say that some network interface
> is up or will be up. It’s admittedly vague and weak, but it’s enough
> for most daemons; they just need to be able to bind and listen to some
> port.
>
> > One example of such a daemon is `transmission-daemon`. This daemon will bind itself to port 9091 so you can control it. Unfortunately, if it gets started while network is down, it will be unable to bind to 9091 (so you can't control it) but still keep running. On my system that means that on reboot I have to manually `sudo herd restart trannsmission-daemon`.
>
> Could you report a bug for this one? I don’t see why it’d fail to bind.

Let me see if I can still get to the old syslogs where `transmission-daemon` claims it cannot bind, but it still keeps going anyway.  I've pulled my `transmission-daemon` directly into my `configuration.scm` so I can edit its `requirement` to include a custom `networking-online` service that does `nm-online`.

> > In another example, I have a custom daemon that I have set up to use
> > the Tor proxy over 127.0.0.1:9050. It requires both `networking` and
> > `tor`. When it starts after `networking` comes up but before the
> > actual network does, it dies because it can't access the proxy at
> > 127.0.0.1:9050 (apparently NetworkManager handles loopback as well).
>
> Loopback is handled by the ‘loopback’ shepherd service, which is
> provided via ‘%base-services’. Perhaps you just need to have your
> service depend on it?
>
> > Switching to a concurrent design for Shepherd --- any concurrent design --- is probably best done sooner rather than later, because it risks strongly affecting customized `configuration.scm`s like mine that have almost a half dozen custom Shepherd daemons.
>
> I suspect the main issue here is undeclared dependencies of some of the
> Shepherd services you mention.
>
> I like the “sooner rather than later” bit, though: it sounds like you’re
> about to send patches or announce some sponsorship program?… :-)

Not particularly, but I *have* looked over Shepherd and here are some notes.  Maybe I'll even send patches, but the reaction to the ZFS patches makes me just shrug; I'd need to devote more time than what I spent on ZFS, and the ZFS patches aren't getting into Guix, so why bother.  If I get annoyed enough I'll just patch my own system and call it a day.  My own system has `nm-online` and I don't expect to not have networking, so the `nm-online` delay is unlikely to be an issue, and I don't intend to mess with the `configuration.scm` anymore because it's just too brittle, I'll just host VMs instead and use a SystemD fully-free OS like Trisquel, I only need Guix for the ZFS (which Trisquel does not have, for some reason).

Anyway...

It seems to me that a good design to use would be that each `<service>` should have its own process.  Then the big loop in `modules/shepherd.scm` will then be "just" a completely event-based command dispatcher that forwards commands to the correct per-`<service>` process.

Now, one feature Shepherd has is that it can be set with `--socket-file=-`, which, if specified, causes GNU Shepherd to enable GNU readline and use the readline library to read `-herd`-like (?) commands.

Unfortunately the `readline` interface is inherently blocking instead of event-based.  The C interface of GNU readline has an alternative interface that is compatible with event-based (and I've used this in the past to create a toy chat program that would display messages from other users while you were typing your own) but it looks like this interface is not exposed.  I checked `readline-port` as well, but the code I could find online suggests that this just uses the blocking `readline` interface, and would (?) be incompatible with the Guile `select`. (side note: the `SIGCHLD` problem could probably be fixed if Guile had `pselect` exposed, but apparently it's not exposed and I'm not prepared to dedicate even more time fiddling with the lack of syscalls in Guile.  Maybe a signal-via-pipe technique would work as an alternative though since that supposedly works on every UNIX-like --- but presumably the Shepherd authors already knew that, so maybe there is good reason not to use it).

Since `readline` is blocking, one possibility would be to *fork off* the process that does the stdin communication.  Then it can continue to use the blocking `readline`.  It just needs to invoke `herd stop root`when it gets EOF (note: need to check how commands are sent, for now it looks to me (not verified, just impression) that commands are sent as plain text).

Since the goal is to make the mainloop into a very parallel dispatcher, we need some way to somehow send commands in stdin-mode.  We can take advantage of the little-known fact that UNIX domain sockets can pass file descriptors across processes, with the file descriptor number being remapped to the receiving process via magic kernel stuff.  So, we create a `socketpair` (NOTE: CHECK IF GUILE HAS `socketpair`!!!  Also review how the fd-passing gets done, maybe Guile doesn't expose the needed syscalls either, sigh), then each time the `readline`-process receives a command, it creates a new `socketpair`, sends over one end to the mainloop, sends the command via the other end, then  waits for a response and prints it.  This should make it very near to in experience as the blocking Shepherd.

If the above pattern is workable, ***we can use the same pattern for `--socket-file=/file/path`***.  We ***always*** fork off *some* process to handle `--socket-file`, whether `stdin`-mode or not.  In `--socket-file=/file/path` mode, the `socket-file` process binds the socket file, `listen`s on it on a loop, and then just passes the opened socket over to the mainloop.

We also need this pattern as a general feature of the mainloop.  An action on one `<service>` can trigger actions on another service (in theory; my cursory review of the Guix code suggests that current services only trigger actions on themselves (`ganeti`, `guix-daemon`; but this is not a full review and there may be other services in Guix that do other stuff)); note in particular that `start` causes every `requirement` to be started anyway.  So I think we need a general mechanism for the mainloop to receive commands from remote processes; we might as well use the same mechanism for both the Shepherd<->user interaction and the Shepherd<->service interaction.

So for clarity of exposition, let me then list down the processes created by Shepherd:

* The `mainloop` process which handles the main massively-parallel event loop.  This is PID 1.
* The `socket-file` process which either gets commands from `stdin`, or via the `socket-file`.
* Multiple per-`<service>` process.

Now, the mainloop has to parse the command in order to learn which per-`<service>` process the command should get forwarded to.  And as mentioned, each per-`<service>` process also needs a command-sending socket to go to the mainloop.  So for each per-`<service>` process:

* The mainloop maintains a mainloop->service socket to send commands over.
* The mainloop maintains a service->mainloop socket it receives command socketfds over.

The mainloop process also special-cases the `root` service --- it handles commands to those directly (BUT this probably needs a lot of fiddling with the data structures involved --- `herd status root` can now occur while a `herd start <whatever>` is still running, so we need status reporting for "being started up" and "being stopped" as well --- `/` for "starting up", `\` for "stopping"?).

Now, the `action` and other procedures need to be special-cased.  We need a global variable indicating:

* The current process is `root` ie the mainloop process.
* The current process is some `<service>`.

Every `action` is different depending on this variable (`%process-identity`?).

* IF the action is going to the same `<service>` (including `root`):
  * Just tail-call into the action code.
* If the current process is `root` and a non-`root` action is being performed:
  * Check if the per-`<service>` process has been started, and start if needed.
  * Schedule the command to be sent via the event loop.
  * Keep operating the mainloop until the command has completed.
    * Use an event-loop stepper function (i.e. just calls `select` and dispatches appropriately, then returns, so caller has to implement the loop).
    * Initially set a mutable variable to `#f.
    * Schedule the command with a callback that sets the above variable to `#t`.
    * Call the event-loop stepper function until the mutable variable is true.
    * This implements the current semantics where a `.conf` file running an action will block until the action finishes, while still allowing commands to be sent to the Shepherd daemon.
* If the current process is not `root` and the action to be performed is of a different process:
  * Create a socketpair to send the command over to the mainloop and send it (blocking).
  * Send the command to the mainloop (blocking).
  * Wait for completion (blocking).

Each per-`<service>` process has a simple blocking loop: It waits for commands from the mainloop process, executes those commands, then loops again.

In particular, it means that any `start` actions in the `.conf` file will block (which is the expected behavior of legacy `.conf` files, but even so, the Shepherd will be able to handle commands even while it is still loading `.conf`.

=== Concurrency is Like Regexps, Now You Have Two Problems ===

But take not that this means that it is possible to deadlock services in this way:

* There are two services `A` and `B`.
  * `A` has an action `AtoB` which invokes an action `BtoA` on service `B`.
  * The `B` `BtoA` action invokes an action `Anoop` on service `A`.
    * In the above structure, because the `A` per-`<service>` process is waiting on the `BtoA` action on service `B`, it cannot handle the `Anoop` action!
    * In the current single-threaded Shepherd, such a thing is actually possible and would not cause a deadlock if the `A` `Anoop` terminates normally.

HOWEVER, this is probably a very unusual setup!  It may be tolerable to simply require that a service that performs actions on another service should have an acyclic relationship (it is Turing-complete --- consider the case where the `B` `BtoA` action reinvokes the `A` `AtoB` with the same arguments causing it to invoke `B` `BtoA` action again ad infinitum --- whereas an acyclic relationship requirement would provably terminate; it may even be possible, by passing a "stack" (i.e. list of service names that caused a particular action of a particular service to be invoked) when passing commands across, with the `root` service always passing an empty stack, and each `<service>`-to-`<service>` action prepends the name of the calling service, so the mainloop can detect a dynamic cycle and just fail the command without forwarding).

IF this restriction is too onerous, then it may be possible to use an event loop in the per-`<service>` process as well, and use the same wait-for-events-while-blocking logic as on the mainloop --- the code might even be reusable.  BUT I worry about this bit, as it could potentially mean that an action is invoked in the dynamic context (including fluids, sigh) of another on-going completely-unrelated action, which is BAAAAAD.  This is fine for the `root` service since the `root` service is Shepherd itself (? I think) and we can ensure that the Shepherd code does not depend on dynamic context.

Another is that in the current single-threaded Shepherd, any service's action can (re-)register a new `<service>`.  This is problematic since a per-`<service>` process will obviously not affect the mainloop process post-fork.

Again this is probably a very unusual setup.  While such a thing would be cute, I think it would be tolerable to simply require that only the mainloop process (which is what loads the `.conf` file) is the only one that is allowed to (re-)register `<service>`s.

=== Taking Advantage of Concurrency ===

Now that each `<service>` gets its own process, we can add a new `force-destroy-service` action on `root`.

    herd force-destroy-service root <some-service>

This forces the per-`<service>` process, and that of every dependent `<service>`, as well as all processes in the process trees of tho affected per-`<service>` processes, to be force-killed (IMPORTANT: check how to get the process tree, also look into "process groups", maybe that would work better, but does it exist on Hurd?).

This new command helps mitigate the issue where Shepherd `start` actions are Turing-complete and potentially contain infinite-loop bugs.  If this happens, the sysad can `herd force-destroy-service root <some-service>` the misbehaving service, reconfigure the system to correct the issue, and restart.

Another issue is that the current Guix startup is like this (paraphrased):

    (for-each (lambda (service) (start service)) '(guix-daemon #;...))

Now consider this in the context of the `nm-online` example above.  The reason why Guix+Shepherd cannot just ***always*** use `nm-online` like SystemD does is because `nm-online` can delay boot for an arbitrary number of seconds.  And since `start` is a known-blocking legacy interface, it should remain a known-blocking interface (back compatibility is a bitch).

This means that for the Guix startup, we should expose a new Shepherd API:

    (start-multiple '(guix-daemon #;...))

This basically does this:

* Set a mutable variable to the length of the input list.
* For each service, if the per-`<service>` process isn't started yet, start it.
* For each service, schedule to `start` the service; have the callback decrement the above variable (maybe also set another variable to a list of services that fail to start).
* In a loop:
  * If the above mutable variable is zero, exit.
  * Otherwise, call the mainloop stepper function, then loop again.

This allows multiple `start` to be executed at once (which could trigger a `start` again of their requirements, but if a service is already started then its `start` action should do nothing) in parallel.  There may be a thundering herd effect on the mainloop though because of hammering `start` on the requirements.  Hmm.  Concurrency is hard.

Then Guix also has to be modified to use `start-multiple` instead.

This further step ensures as well that infloop problems in custom service definitions do not delay startup --- startup is fully parallel (modulo thundering herds of daemons).

=== Overall ===

It seems to me that with the above redesign, there would be very little code left of the original Shepherd, making this more of a reimplementation than a patch or even a fork.

Thanks
raid5atemyhomework

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A Critique of Shepherd Design
  2021-03-22 17:02     ` Ludovic Courtès
  2021-03-24 14:29       ` raid5atemyhomework
@ 2021-03-24 14:48       ` raid5atemyhomework
  1 sibling, 0 replies; 13+ messages in thread
From: raid5atemyhomework @ 2021-03-24 14:48 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel@gnu.org

> Loopback is handled by the ‘loopback’ shepherd service, which is
> provided via ‘%base-services’. Perhaps you just need to have your
> service depend on it?

My service requires `tor`, which itself requires `loopback`, but it was still unable to access `127.0.0.1:9050` until I added a service that invokes `nm-online` and had my service require that.  It works that way without problems, which leads me to conclude that loopback is *still* handled by NetworkManager somehow, despite what the Shepherd-level services claim.  Unfortunately I cannot divulge what my service is; suffice it to say that naive code that just opens to `127.0.0.1:9050` failed when invoked before `nm-online` completes.

Thanks
raid5atemyhomework

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-24 14:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 17:33 A Critique of Shepherd Design raid5atemyhomework
2021-03-19 21:49 ` Maxime Devos
2021-03-20  2:42   ` raid5atemyhomework
2021-03-20  4:02 ` Maxim Cournoyer
2021-03-20  5:07 ` Mark H Weaver
2021-03-20 11:10   ` raid5atemyhomework
2021-03-20 16:58 ` Ludovic Courtès
2021-03-21  0:22   ` raid5atemyhomework
2021-03-22 17:02     ` Ludovic Courtès
2021-03-24 14:29       ` raid5atemyhomework
2021-03-24 14:48       ` raid5atemyhomework
2021-03-22 13:42 ` raingloom
2021-03-22 17:50   ` Martin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).