From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id EOksFvJMW2D4NwAA0tVLHw (envelope-from ) for ; Wed, 24 Mar 2021 14:30:10 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id UIcbEvJMW2AkdgAAB5/wlQ (envelope-from ) for ; Wed, 24 Mar 2021 14:30:10 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 86337250CA for ; Wed, 24 Mar 2021 15:30:09 +0100 (CET) Received: from localhost ([::1]:43318 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lP4WO-0005pU-Lt for larch@yhetil.org; Wed, 24 Mar 2021 10:30:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51286) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lP4WE-0005pO-7a for guix-devel@gnu.org; Wed, 24 Mar 2021 10:29:58 -0400 Received: from mail-40137.protonmail.ch ([185.70.40.137]:52436) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lP4W9-0006pp-Io for guix-devel@gnu.org; Wed, 24 Mar 2021 10:29:57 -0400 Date: Wed, 24 Mar 2021 14:29:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1616596189; bh=a2cpky/fRI849ZtTMRMvAv+T+Ht6U2QQmSKrXw2kZUo=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=pIvwMJOZczlEX7rQy6yrUrnXLsuMojMz1uRfw+hI7/dy6u4kQ/OvddFXN7eVLxGEZ p/FYEcCxIaBkrOKAcSUmz+dieiwZiV04JhhC/nAD5fG8Mgq8haTZIlE3tbbNIBfKDM 1jxTPK4Wjlyuyk3EdbRkDFvZ4jGRb3JzamzcLVzc= To: =?utf-8?Q?Ludovic_Court=C3=A8s?= From: raid5atemyhomework Cc: "guix-devel@gnu.org" Subject: Re: A Critique of Shepherd Design Message-ID: In-Reply-To: <87h7l3w28b.fsf@gnu.org> References: <87a6qx7of5.fsf@gnu.org> <87h7l3w28b.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.137; envelope-from=raid5atemyhomework@protonmail.com; helo=mail-40137.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: raid5atemyhomework Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616596210; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=a2cpky/fRI849ZtTMRMvAv+T+Ht6U2QQmSKrXw2kZUo=; b=SYxWVzentl/85qavTvKCQieBWLxV56IJ6MO1ztGIeWmW+09VqYKF5/WygVyV1+HilfonjB rR5//YPvPXKUmLEhHPTOjhtKmY4BFwGSapy8/gebIuZt2QmARh8+9+9PhlccNhqnnNqwvj 69O87FauHS8QrNVx58M0tRAjOOfQXAr9Z9OhAGhsO1lt3YitrGIGwpQtctBwNH7U7wpWjV wQG/+nQhWAC9nVqVODhH6e+Od2wP8u65b7NCMcHI4wBmbFkFyZifkM2ZUu4tlcQFfRweBU etUj1i5P29SApgdOIw9KsIv0t1ziCQfRnQ222g7VJ4G36X4gBSl0zX77ixA6Ng== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616596210; a=rsa-sha256; cv=none; b=uxsXwbcnGSMsw06utEVtpxO885/a1BpGRmkzLzz5uOVqD0ntGz7Pznr5qcCQmitOC4mM8F 7Sgu7CDTDLUjwmdgRlbw12yGgXLRR3Cs68j7cLj71Yq2+zoTXWLlwgPfBAInlIMvG1dAGn u9cOOtwjG2u8OSLQIec4CGh5gPWM66is/IB0f9TeVkspS/AcObqZ5VnKvQGSUs/IG8/ULd wBBxgKMAIMiHL4TOSXule+tAk5neWCXlZ+CoXmYC7ydqfDezxGaNJdxVWSPs0K2xo9g5Sl kFRvu+mqfG61u4tSEZ6YeBlk64t1XkuCzInNe0xNocG+BaoCWYtWmn956PDlPg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=pIvwMJOZ; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.67 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=pIvwMJOZ; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: 86337250CA X-Spam-Score: -3.67 X-Migadu-Scanner: scn0.migadu.com X-TUID: l6xsQG/WEngB Sent with ProtonMail Secure Email. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Tuesday, March 23, 2021 1:02 AM, Ludovic Court=C3=A8s wro= te: > Hi, > > raid5atemyhomework raid5atemyhomework@protonmail.com skribis: > > > I'm not sure you can afford to keep it simple. > > It has limitations but it does the job=E2=80=94just like many other init = systems > did the job before the advent of systemd. > > > Consider: https://issues.guix.gnu.org/47253 > > In that issue, the `networking` provision comes up potentially before t= he network is, in fact, up. This means that other daemons that require `net= working` could potentially be started before the network connection is up. > > The =E2=80=98networking=E2=80=99 service is just here to say that some ne= twork interface > is up or will be up. It=E2=80=99s admittedly vague and weak, but it= =E2=80=99s enough > for most daemons; they just need to be able to bind and listen to some > port. > > > One example of such a daemon is `transmission-daemon`. This daemon will= bind itself to port 9091 so you can control it. Unfortunately, if it gets = started while network is down, it will be unable to bind to 9091 (so you ca= n't control it) but still keep running. On my system that means that on reb= oot I have to manually `sudo herd restart trannsmission-daemon`. > > Could you report a bug for this one? I don=E2=80=99t see why it=E2=80= =99d fail to bind. Let me see if I can still get to the old syslogs where `transmission-daemon= ` claims it cannot bind, but it still keeps going anyway. I've pulled my `= transmission-daemon` directly into my `configuration.scm` so I can edit its= `requirement` to include a custom `networking-online` service that does `n= m-online`. > > In another example, I have a custom daemon that I have set up to use > > the Tor proxy over 127.0.0.1:9050. It requires both `networking` and > > `tor`. When it starts after `networking` comes up but before the > > actual network does, it dies because it can't access the proxy at > > 127.0.0.1:9050 (apparently NetworkManager handles loopback as well). > > Loopback is handled by the =E2=80=98loopback=E2=80=99 shepherd service, w= hich is > provided via =E2=80=98%base-services=E2=80=99. Perhaps you just need to h= ave your > service depend on it? > > > Switching to a concurrent design for Shepherd --- any concurrent design= --- is probably best done sooner rather than later, because it risks stron= gly affecting customized `configuration.scm`s like mine that have almost a = half dozen custom Shepherd daemons. > > I suspect the main issue here is undeclared dependencies of some of the > Shepherd services you mention. > > I like the =E2=80=9Csooner rather than later=E2=80=9D bit, though: it sou= nds like you=E2=80=99re > about to send patches or announce some sponsorship program?=E2=80=A6 :-) Not particularly, but I *have* looked over Shepherd and here are some notes= . Maybe I'll even send patches, but the reaction to the ZFS patches makes = me just shrug; I'd need to devote more time than what I spent on ZFS, and t= he ZFS patches aren't getting into Guix, so why bother. If I get annoyed e= nough I'll just patch my own system and call it a day. My own system has `= nm-online` and I don't expect to not have networking, so the `nm-online` de= lay is unlikely to be an issue, and I don't intend to mess with the `config= uration.scm` anymore because it's just too brittle, I'll just host VMs inst= ead and use a SystemD fully-free OS like Trisquel, I only need Guix for the= ZFS (which Trisquel does not have, for some reason). Anyway... It seems to me that a good design to use would be that each `` sho= uld have its own process. Then the big loop in `modules/shepherd.scm` will= then be "just" a completely event-based command dispatcher that forwards c= ommands to the correct per-`` process. Now, one feature Shepherd has is that it can be set with `--socket-file=3D-= `, which, if specified, causes GNU Shepherd to enable GNU readline and use = the readline library to read `-herd`-like (?) commands. Unfortunately the `readline` interface is inherently blocking instead of ev= ent-based. The C interface of GNU readline has an alternative interface th= at is compatible with event-based (and I've used this in the past to create= a toy chat program that would display messages from other users while you = were typing your own) but it looks like this interface is not exposed. I c= hecked `readline-port` as well, but the code I could find online suggests t= hat this just uses the blocking `readline` interface, and would (?) be inco= mpatible with the Guile `select`. (side note: the `SIGCHLD` problem could p= robably be fixed if Guile had `pselect` exposed, but apparently it's not ex= posed and I'm not prepared to dedicate even more time fiddling with the lac= k of syscalls in Guile. Maybe a signal-via-pipe technique would work as an= alternative though since that supposedly works on every UNIX-like --- but = presumably the Shepherd authors already knew that, so maybe there is good r= eason not to use it). Since `readline` is blocking, one possibility would be to *fork off* the pr= ocess that does the stdin communication. Then it can continue to use the b= locking `readline`. It just needs to invoke `herd stop root`when it gets E= OF (note: need to check how commands are sent, for now it looks to me (not = verified, just impression) that commands are sent as plain text). Since the goal is to make the mainloop into a very parallel dispatcher, we = need some way to somehow send commands in stdin-mode. We can take advantag= e of the little-known fact that UNIX domain sockets can pass file descripto= rs across processes, with the file descriptor number being remapped to the = receiving process via magic kernel stuff. So, we create a `socketpair` (NO= TE: CHECK IF GUILE HAS `socketpair`!!! Also review how the fd-passing gets= done, maybe Guile doesn't expose the needed syscalls either, sigh), then e= ach time the `readline`-process receives a command, it creates a new `socke= tpair`, sends over one end to the mainloop, sends the command via the other= end, then waits for a response and prints it. This should make it very n= ear to in experience as the blocking Shepherd. If the above pattern is workable, ***we can use the same pattern for `--soc= ket-file=3D/file/path`***. We ***always*** fork off *some* process to hand= le `--socket-file`, whether `stdin`-mode or not. In `--socket-file=3D/file= /path` mode, the `socket-file` process binds the socket file, `listen`s on = it on a loop, and then just passes the opened socket over to the mainloop. We also need this pattern as a general feature of the mainloop. An action = on one `` can trigger actions on another service (in theory; my cu= rsory review of the Guix code suggests that current services only trigger a= ctions on themselves (`ganeti`, `guix-daemon`; but this is not a full revie= w and there may be other services in Guix that do other stuff)); note in pa= rticular that `start` causes every `requirement` to be started anyway. So = I think we need a general mechanism for the mainloop to receive commands fr= om remote processes; we might as well use the same mechanism for both the S= hepherd<->user interaction and the Shepherd<->service interaction. So for clarity of exposition, let me then list down the processes created b= y Shepherd: * The `mainloop` process which handles the main massively-parallel event lo= op. This is PID 1. * The `socket-file` process which either gets commands from `stdin`, or via= the `socket-file`. * Multiple per-`` process. Now, the mainloop has to parse the command in order to learn which per-`` process the command should get forwarded to. And as mentioned, eac= h per-`` process also needs a command-sending socket to go to the = mainloop. So for each per-`` process: * The mainloop maintains a mainloop->service socket to send commands over. * The mainloop maintains a service->mainloop socket it receives command soc= ketfds over. The mainloop process also special-cases the `root` service --- it handles c= ommands to those directly (BUT this probably needs a lot of fiddling with t= he data structures involved --- `herd status root` can now occur while a `h= erd start ` is still running, so we need status reporting for "be= ing started up" and "being stopped" as well --- `/` for "starting up", `\` = for "stopping"?). Now, the `action` and other procedures need to be special-cased. We need a= global variable indicating: * The current process is `root` ie the mainloop process. * The current process is some ``. Every `action` is different depending on this variable (`%process-identity`= ?). * IF the action is going to the same `` (including `root`): * Just tail-call into the action code. * If the current process is `root` and a non-`root` action is being perform= ed: * Check if the per-`` process has been started, and start if nee= ded. * Schedule the command to be sent via the event loop. * Keep operating the mainloop until the command has completed. * Use an event-loop stepper function (i.e. just calls `select` and disp= atches appropriately, then returns, so caller has to implement the loop). * Initially set a mutable variable to `#f. * Schedule the command with a callback that sets the above variable to = `#t`. * Call the event-loop stepper function until the mutable variable is tr= ue. * This implements the current semantics where a `.conf` file running an= action will block until the action finishes, while still allowing commands= to be sent to the Shepherd daemon. * If the current process is not `root` and the action to be performed is of= a different process: * Create a socketpair to send the command over to the mainloop and send i= t (blocking). * Send the command to the mainloop (blocking). * Wait for completion (blocking). Each per-`` process has a simple blocking loop: It waits for comma= nds from the mainloop process, executes those commands, then loops again. In particular, it means that any `start` actions in the `.conf` file will b= lock (which is the expected behavior of legacy `.conf` files, but even so, = the Shepherd will be able to handle commands even while it is still loading= `.conf`. =3D=3D=3D Concurrency is Like Regexps, Now You Have Two Problems =3D=3D=3D But take not that this means that it is possible to deadlock services in th= is way: * There are two services `A` and `B`. * `A` has an action `AtoB` which invokes an action `BtoA` on service `B`. * The `B` `BtoA` action invokes an action `Anoop` on service `A`. * In the above structure, because the `A` per-`` process is wa= iting on the `BtoA` action on service `B`, it cannot handle the `Anoop` act= ion! * In the current single-threaded Shepherd, such a thing is actually pos= sible and would not cause a deadlock if the `A` `Anoop` terminates normally= . HOWEVER, this is probably a very unusual setup! It may be tolerable to sim= ply require that a service that performs actions on another service should = have an acyclic relationship (it is Turing-complete --- consider the case w= here the `B` `BtoA` action reinvokes the `A` `AtoB` with the same arguments= causing it to invoke `B` `BtoA` action again ad infinitum --- whereas an a= cyclic relationship requirement would provably terminate; it may even be po= ssible, by passing a "stack" (i.e. list of service names that caused a part= icular action of a particular service to be invoked) when passing commands = across, with the `root` service always passing an empty stack, and each ``-to-`` action prepends the name of the calling service, so= the mainloop can detect a dynamic cycle and just fail the command without = forwarding). IF this restriction is too onerous, then it may be possible to use an event= loop in the per-`` process as well, and use the same wait-for-eve= nts-while-blocking logic as on the mainloop --- the code might even be reus= able. BUT I worry about this bit, as it could potentially mean that an act= ion is invoked in the dynamic context (including fluids, sigh) of another o= n-going completely-unrelated action, which is BAAAAAD. This is fine for th= e `root` service since the `root` service is Shepherd itself (? I think) an= d we can ensure that the Shepherd code does not depend on dynamic context. Another is that in the current single-threaded Shepherd, any service's acti= on can (re-)register a new ``. This is problematic since a per-`<= service>` process will obviously not affect the mainloop process post-fork. Again this is probably a very unusual setup. While such a thing would be c= ute, I think it would be tolerable to simply require that only the mainloop= process (which is what loads the `.conf` file) is the only one that is all= owed to (re-)register ``s. =3D=3D=3D Taking Advantage of Concurrency =3D=3D=3D Now that each `` gets its own process, we can add a new `force-des= troy-service` action on `root`. herd force-destroy-service root This forces the per-`` process, and that of every dependent ``, as well as all processes in the process trees of tho affected per-`<= service>` processes, to be force-killed (IMPORTANT: check how to get the pr= ocess tree, also look into "process groups", maybe that would work better, = but does it exist on Hurd?). This new command helps mitigate the issue where Shepherd `start` actions ar= e Turing-complete and potentially contain infinite-loop bugs. If this happ= ens, the sysad can `herd force-destroy-service root ` the mis= behaving service, reconfigure the system to correct the issue, and restart. Another issue is that the current Guix startup is like this (paraphrased): (for-each (lambda (service) (start service)) '(guix-daemon #;...)) Now consider this in the context of the `nm-online` example above. The rea= son why Guix+Shepherd cannot just ***always*** use `nm-online` like SystemD= does is because `nm-online` can delay boot for an arbitrary number of seco= nds. And since `start` is a known-blocking legacy interface, it should rem= ain a known-blocking interface (back compatibility is a bitch). This means that for the Guix startup, we should expose a new Shepherd API: (start-multiple '(guix-daemon #;...)) This basically does this: * Set a mutable variable to the length of the input list. * For each service, if the per-`` process isn't started yet, start= it. * For each service, schedule to `start` the service; have the callback decr= ement the above variable (maybe also set another variable to a list of serv= ices that fail to start). * In a loop: * If the above mutable variable is zero, exit. * Otherwise, call the mainloop stepper function, then loop again. This allows multiple `start` to be executed at once (which could trigger a = `start` again of their requirements, but if a service is already started th= en its `start` action should do nothing) in parallel. There may be a thund= ering herd effect on the mainloop though because of hammering `start` on th= e requirements. Hmm. Concurrency is hard. Then Guix also has to be modified to use `start-multiple` instead. This further step ensures as well that infloop problems in custom service d= efinitions do not delay startup --- startup is fully parallel (modulo thund= ering herds of daemons). =3D=3D=3D Overall =3D=3D=3D It seems to me that with the above redesign, there would be very little cod= e left of the original Shepherd, making this more of a reimplementation tha= n a patch or even a fork. Thanks raid5atemyhomework