From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id OH5zHEBhVWCxVgAA0tVLHw (envelope-from ) for ; Sat, 20 Mar 2021 02:43:12 +0000 Received: from aspmx2.migadu.com ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id mLorGEBhVWCDOgAAbx9fmQ (envelope-from ) for ; Sat, 20 Mar 2021 02:43:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx2.migadu.com (Postfix) with ESMTPS id D7A892457D for ; Sat, 20 Mar 2021 03:43:11 +0100 (CET) Received: from localhost ([::1]:38054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNRa1-0003WM-NB for larch@yhetil.org; Fri, 19 Mar 2021 22:43:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57696) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNRZl-0003WB-Pn for guix-devel@gnu.org; Fri, 19 Mar 2021 22:42:54 -0400 Received: from mail-40137.protonmail.ch ([185.70.40.137]:61836) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNRZh-000569-Hj for guix-devel@gnu.org; Fri, 19 Mar 2021 22:42:53 -0400 Date: Sat, 20 Mar 2021 02:42:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1616208165; bh=Je+E8Ge113dc2+m5YDjO8YsPZIGKJEib/3yiDQw0R/0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=EXEWelJR644FunFBaqajHTvaeusvkyXZ8qcvMtlXidbA9B09K2bHN/i8nl64uqJ+1 GpxyCZLE7j3y1PieTVVSUQENCtkRx51XftNYKlAgZX93n1O2URt3SONDl1Rdxm4nrf IkPAfdGIaPncYzADPSY7l5+hsryqs2sNvjP5frHE= To: Maxime Devos From: raid5atemyhomework Cc: "guix-devel@gnu.org" Subject: Re: A Critique of Shepherd Design Message-ID: In-Reply-To: <6286d7101ae8219a539bc34437ab46bd48d38476.camel@telenet.be> References: <6286d7101ae8219a539bc34437ab46bd48d38476.camel@telenet.be> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.137; envelope-from=raid5atemyhomework@protonmail.com; helo=mail-40137.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: raid5atemyhomework Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616208192; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=Je+E8Ge113dc2+m5YDjO8YsPZIGKJEib/3yiDQw0R/0=; b=llPBkzY9F/gFd9WivShwU6r5TQEVgUnIVKEL8bSAFDBUZ7U1OLijTqUphOOwHVrZiEA31p QvwapeDbfYWM0DNC61ahN4jgSl20E5uenPaWD0dkGxeY8Fq+OxNTaJ9/ky9Y6lWj64CPZf GhzEvjFArWFr+WGYcthAnTvZ73eGrLoq8MgLg82GPaCRr4EfdPTk9n1sql1ofbC4kwgDVZ U9/EtGZqEqcP8xvWrs5HTkYL7dw631rvxfOhhpQkz7cF2QytkapWcuCbZsP2d/d2wKBzjh I1eh2NnC6p58Zd+OMMNJSeNE0O8Jz8Nb/FpbESrlFBW8ke6UvHG6jm+ZdE9JiA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616208192; a=rsa-sha256; cv=none; b=hmJXLMSuIyEUhf+W1p6zW5/bhx/N/H+Qx4K7TUrFNprQZRwMMXBCto9eO/bs2IDxA7Ny8Z eilXsV4be/JiiNrmHWdfx/4ofbyo//szSL51oWhh00YUX2B68YTl3aM04n3RNt6MWwxnVN qgA64AY+VIeC+kPzwYmVjoHkpGbt4Ic/CeKlJpmozxlJTWEx5xzjXgmOXZyza3VJgYDmk3 w/hmvPKSB2HPVnCEjPBcHQrHE8fT8IKT8TrfhkHpxxBG16Sx4jnR8jiRXItLFvvJtynAzb cU5yWf/xEen2zHJFV9HJDsUZaAK7Nb/tSEh3HL7bWWibdluWuEjl/1BfH9WxIw== ARC-Authentication-Results: i=1; aspmx2.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=EXEWelJR; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx2.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.11 Authentication-Results: aspmx2.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=EXEWelJR; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx2.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: D7A892457D X-Spam-Score: -3.11 X-Migadu-Scanner: scn0.migadu.com X-TUID: gQoRIYgHzT09 Hello Maxime, > Multi-threading seems complicated (but can be worth it). What work would = you put > on which thread (please give us some concrete to reason about, =E2=80= =98thread X does A, > and is created on $EVENT ...=E2=80=99)? A complication is that "fork" is = =E2=80=98unsafe=E2=80=99 in a > process with multiple threads (I don't know the details). Maybe that coul= d be > worked around. (E.g. by using system* on a helper binary that closes unne= eded > file descriptors and sets the umask .... and eventually uses exec to run = the > service binary.) Perhaps the point is not so much to be *multi-threaded* but to be *concurre= nt*, and whether to be multi-*thread* or multi-*process* or multi-*fiber* i= s a lower-level detail. For example, we can consider that for 99% of shepherd services, the `start`= action returns a numeric PID, and the remaining ones return `#t`. Rather = than call the `start` action in the same thread/process/fiber that implemen= ts the Shepherd main loop: * Create a pipe. * `fork`. * On the child: * Close the pipe output end, make the input end cloexec. * Call the `start` action. * If an error is thrown, print out `(error <>)` with the serialization = of the error to the pipe. * Otherwise, print out `(return <>)` with the return value of the `star= t` action. * On the parent: * Close the pipe input end, and add the output end and the pid of the f= orked child to one of the things we will poll in the mainloop. * Return to the mainloop. * In the mainloop: * `poll` (or `select`, or whatever) the action pipes in addition to the `= herd` command socket(s). * If an action pipe is ready, read it in: * If it EOFed then the sub-process died or execed without returning any= thing, treat as an error. * Similarly if unable to `read` from it. * Otherwise deserialize it and react appropriately as to whether it is = an error or a return of some value. * If an action pipe has been around too long, kill the pid (and all child= pids) and close the pipe and treat as an error. I mean --- single-threaded/process/fibered, non-concurrent `init` systems h= ave fallen by the wayside because of their brittleness. SystemD is getting= wide support precisely because it's an excellent concurrent `init` mechani= sm that is fairly tolerant of mistakes; a problem in one service unit is ge= nerally isolated in that service unit and its dependents and will not cause= problems with the other service units. Presumably, `fork`ing within a `fork`ed process has no issues since that is= what is normally done in Unix anyway, it's threads that are weird. > That's a bit of an over-simplification. At least in a singly-threaded She= pherd, > waitpid could not be used by an individual service, as it > (1) returns not often > enough (if a process of another service exits and this action was waiting= on > a single pid) (2) too often (likewise, > and this action was =E2=80=98just waiting=E2=80=99 > (3) potentially never (imagine every process has been started and an acti= on > was waiting for one to exit, > and now the user tries to reboot. Then shepherd > daemon would never accept the connection from the user!). Yes, I agree it is an oversimplification, but in any case, there should be = some smaller subset of things that can be (typically) done. Then any escape hatch like `unsafe-turing-complete` or `without-static-anal= ysis` can retain the full power of the language for those times when you *r= eally* do need it. > > And in a multi-threaded Shepherd, "fork" is unsafe. (Anyway, "fork" captu= res > too much state, and no action should ever call "exec" without "fork".) Al= so, > "exec" replaces too little state (e.g. open file descriptors, umask, root > directory, current working directory ...). > > But perhaps that's just a call for a different set of primitives (which c= ould > be implemented on top of these system calls by shepherd, or directly on t= op > of the relevant Hurd RPCs when running on GNU/Hurd.). Yes. > > Yet the language is a full Turing-complete language, including the majo= r > > weakness of Turing-completeness: the inability to solve the halting pro= blem. > > IIUC, there are plans to perform the network configuration inside shepher= d > (search for guile-netlink). So arbitrary code still needs to be allowed. > This is the Procedural (extensible) I'll referring to later. > > > The fact that the halting problem is unsolved in the language means it = is > > possible to trivially write an infinite loop in the language. In the co= ntext > > of an `init` system, the possibility of an infinite loop is dangerous, = as it > > means the system may never complete bootup. > > I'm assuming the hypothetical infinite loop is in, say, the code for star= ting > guix-daemon which isn't very essential (try "herd stop guix-daemon"!) > (If there's an infinite loop is in e.g. udev-service-type then there's in= deed > a problem, then that infinite loop might as well be in eudev itself, so I= 'm > only speaking of non-essential services blocking the boot process.) > > I'm not convinced we need Turing-incompleteness here, at least if the > start / do-a-dance action / stop code of each service definition is run i= n > a separate thread (not currently the case). > > > I have experienced this firsthand since I wrote a Shepherd service to l= aunch > > a daemon, and I needed to wait for the daemon initialization to complet= e. > > My implementation of this had a bug that caused an infinite loop to be = entered, > > but would only tickle this bug when a very particular (persistent on-di= sk) state > > existed. > > I wrote this code about a month or so ago, and never got to test it unt= il last week, > > when the bug was tickled. Unfortunately, by then I had removed older sy= stem versions > > that predated the bug. When I looked at a backup copy of the `configura= tion.scm`, I > > discovered the bug soon afterwards. > > But the amount of code verbiage needed here had apparently overwhelmed = me at the time I > > wrote the code to do the waiting, and the bug got into the code and bro= ke my system. > > I had to completely reinstall Guix (fortunately the backup copy of `con= figuration.scm` > > was helpful in recovering most of the system, also ZFS rocks). > > Ok, shepherd should be more robust (at least for non-essential services l= ike guix-publish). > > > Yes, I made a mistake. I'm only human. It should be easy to recover fro= m mistakes. > > I would need to know what the mistake and what =E2=80=98recovering= =E2=80=99 would be exactly in this > case, before agreeing or disagreeing whether this is something shepherd s= hould help > with. It's what I described in the previous paragraph. I wrote code to wait for = startup, the code to wait for startup had a bug that caused it to enter an = infinite loop under a certain edge case. The edge case was not reached for a long time (almost a month), so I only f= ound it a week ago when it triggered and broke my system. In order to reco= ver I had to reinstall Guix System completely, and none of the saved system= generations had a configuration that predated the introduction of the bug. If you want specifics --- as a sketch, what I did was something similar to = this: (let wait-for-startup ((count 0)) (cond ((daemon-started?) ; if the daemon died this would return #f #t) ((> count 30) (format #t "daemon taking too long, continuing with boot~%") #f)) (else (sleep 1) (wait-for-startup (- count 1)))) ;;; <--- (fuck) The daemon died (this was the persistent on-disk edge case, arguably a bug = in `daemon-started?` as well) causing `(daemon-started?)` to return `#f` pe= rmanently, and then I started counting *in the wrong direction*. > > The full Turing-completeness of the language invites mistakes, > > Agreed (though =E2=80=98invites=E2=80=99 seems a little too strong here). Come on. Something as simple as "Wait for some condition to be true" needs= code like the above to implement fully in the Shepherd action-description = language (i.e. Scheme). And because I was debating on "just put `(count 30= )` in the `let` and decrement" versus "Naaa just count up like a normal per= son" and got my wires crossed, I got into the above completely dumb mistake= . Indeed, not only should `-` be changed to `+` in the above, but it's also b= etter to do something like this as well: (let wait-for-startup ((count 0)) (cond ((daemon-started?) ; if the daemon died this would return #f #t) ((not (zero? (car (waitpid daemon-pid WNOHANG)))) (format #t "daemon died!~%") #f) ((> count 30) (format #t "daemon taking too long, continuing with boot~%") #f)) (else (sleep 1) (wait-for-startup (+ count 1)))) Because if the daemon died, waiting for 30 seconds would be fairly dumb as = well. That is a fair bit of verbiage, and much harder to read through for bugs. As per information theory: the more possible things I can talk about, the m= ore bits I have to push through a communication channel to convey it. Turi= ng-complete languages can do more things than more restricted domain-speci= fic languages, so on average it requires more code to implement an arbitrar= y thing in the Turing-complete language (thus requiring more careful review= ) than the restricted domain-specific language. > - Section 3: A nicely behaving API. > > > So what can we do? > > For one, a Turing-complete language is a strict superset of non-Turing-= complete > > languages. So one thing we can do is to define a more dedicated languag= e for Shepherd > > actions, strongly encourage the use of that sub-language, and, at some = point, require > > that truly Turing-complete actions need to be wrapped in a `(unsafe-tur= ing-complete ...)` form. > > For example, in addition to the current existing `make-forkexec-constru= ctor` and > > `make-kill-destructor`, let us also add these syntax forms: > > > `(wait-for-condition
)` - Return a procedure that accep= ts a numeric `pid`, that does: Check if evaluating `` in the lexical = context results in `#f`. If so, wait one second and re-evaluate, but exit a= nyway and return `#f` if `` seconds has passed, or if the `pid` ha= s died. If `` evaluates to non-`#f` then return it immediately. > > `(timed-action ...)` - Return a procedure that accepts= a numeric `pid`, that does: Spawn a new thread (or maybe better a `fork`ed= process group?). The new thread evaluates ` ...` in sequence. If the= thread completes or throws in `` seconds, return the result or th= row the exception in the main thread. IF the `` is reached or the > > given `pid` has died, kill the thread and any processes it may have spa= wned, then return `#f`. > > `(wait-to-die )` - Return a procedure that accepts a `pid` tha= t does: Check if the `pid` has died, if so, return #t. Else sleep 1 second = and try again, and if `` is reached, return `#f`. > > Hard to say if these syntaxes are useful in advance. In my particular case I could have just written `(wait-for-condition 30 (da= emon-started?))`. I imagine a fair number of wait-for-daemon-startup thing= s can use that or `(timed-action 30 (invoke #$waiter-program))`. > > In any case, I don't like numeric pids, as they are reused. > Could we use something like pidfds (Linux) or whatever the Hurd has inste= ad? > (Wrapped in nice type in Scheme, maybe named with a predicate task= ?, > a predicate task-dead?, a wait-for-task-dead operation (if we use guile-f= ibers).) Certainly, but that may require breaking changes to existing specific sheph= erd actions. > Note: (perform-operation (choice-operation (wait-for-task-dead task) (sle= ep 5))) > =3D=3D wait until task is dead OR 5 seconds have passed. Operations in gu= ile-fibers > combine nicely! > > Also, sprinkling time-outs everywhere seems rather dubious to me (if that= 's > what timed-action is for). If some process is taking long to start, > then by killing it it will take forever to start. Sprinkling timeouts makes the halting problem trivial: for any code that is= wrapped in a timeout, that code will halt (either by itself, or by hitting= the timeout). Indeed, you can make a practical total functional language simply by requir= ing that recursions have a decrementing counter that will abort computation= when it reaches 0. It could use mostly partial code patterns with the add= ition of that decrementing counter, and it becomes total. In many ways the= programmer is just assuring the compiler "this operation will not take mor= e than N recursions". > > > The above forms should also report as well what they are doing (e.g. `H= ave been waiting 10 of 30 seconds for `) on the console and/or syslog= . > > Something like that would be useful, yes. But printing itself seem= s a bit > low-level to me. Yes, but if it's something that the end-user wrote in their `configuration.= scm`, it seems fine to me. > > > In addition, `make-forkexec-constructor` should accept a `#:post-fork`,= which is > > a procedure that accepts a `pid`, and `make-kill-destructor` should acc= ept a > > `#:pre-kill`, which is a procedure that accepts a `pid`. Possibly we ne= ed to > > add combinators for the above actions as well. For example a `sub-actio= ns` > > procedural form that accepts any number of functions of the above `pid = -> bool` > > type and creates a single combined `pid -> bool`. > > > So for example, in my case, I would have written a `make-forkexec-const= ructor` > > that accepted a `#:post-fork` that had a `wait-for-condition` on the co= de that > > hecked if the daemon initialization completed correctly. > > I think it is a common enough pattern that: > > > > - We have to spawn a daemon. > > - We have to wait for the daemon to be properly initialized (`#:post-= fork`) > > - When shutting down the daemon, it's best to at least try to politel= y ask it > > to finish using whatever means the daemon has (`#:pre-kill`). > > > > - If the daemon doesn't exit soon after we politely asked it to, be l= ess polite and start sending signals. > > > > So I think the above should cover a good number of necessities. > > Agreed. My own proposal, which I like better (-: : > (this is the =E2=80=98procedural=E2=80=99 (extensible) interface) > > - Multi-threading using guile-fibers (we'll use condition variables & c= hannels). > - Instead of make-forkexec-constructor, we have > make-process-constructor, as fork + exec as-is is troublesome in > a multi-threaded application (see POSIX). > > (It still accepts #:user, #:group & the like) > > make-process-constructor returns a thunk as usual, > but this thunk THUNK works differently: > > THUNK : () -> . > > > is a GOOPS class representing processes (parent: ), > that do not neccessarily exist yet / anymore. > has some fields: > > * stopped?-cond > (condition variable, only signalled after exit-value is set) > * exit-value > (an atomic box, value #f or an integer, > only read after stopped?-cond is signalled > or whenever the user is feeling impatient and asking shepherd > how it is going.) > * stop!-cond (condition variable) > * stop-forcefully!-cond (condition variable) > * started-cond (condition variable, only signalled when =E2=80=98pid= =E2=80=99 has been set) > * pid (an atomic box) > * maybe some other condition variables? > * other atomic boxes? > > THUNK also starts a fiber associated with the returned , > which tries to start the service binary in the background (could also be = used > for other things than the service binary). When the binary has been start= ed, > the fiber signals started-cond. > > The fiber will wait for the process to exit, and sets exit-value and > signals stopped?-cond when it does. It will also wait for a polite reques= t > to stop the service binary (stop!-cond) and impolite requests (for SIGKIL= L?) > (stop-forcefully!-cond). > > (Does shepherd even have an equivalent to stop-forcefully! ?) > > kill-destructor is adjusted to not work with pids directly, but rather > with . It signals stop!-cond or stop-forcefully!-cond. > > (So process creation & destruction happens asynchronuously, but there > still is an easy way to wait until they are actually started.) > > - #:pre-kill & #:post-fork seem rather ad-hoc to me. I advise > still wrapping make-process-constructor & kill-destructor. > However, the thunk still needs to return a (or subcla= ss > of course). > > If the thunk does anything that could block / loop (e.g. if it's a HT= TP > server, then the wrapper might want to wait until the process is actu= ally > listening at port 80), then this blocking / looping must be done in a= fiber. > > Likewise if the killer does anything that could block / loop (e.g. > first asking the process to exit nicely before killing it after a tim= e-out). Sure, this design could work better as well. But *some* restricted dialect= should exist, and we should discourage people from using the full Guile la= nguage in Shepherd actions they write bespoke in their `configuration.scm`,= not tout it as a feature. > - Section 4: Static analysis > > > Then, we can define a strict subset of Guile, containing a set of forms= we know are > > total (i.e. we have a proof they will always halt for all inputs). Then= any Shepherd > > actions, being Lisp code and therefore homoiconic, can be analyzed. Eve= ry list must > > have a `car` that is a symbol naming a syntax or procedure that is know= n to be safe > > --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, = `quote` > > (which also prevents analysis of sub-lists), `and`, `or`, `begin`, thun= k combinators, > > as well as the domain-specific `make-forkexec-constructor`, `make-kill-= destructor`, > > `wait-for-condition`, `timed-action`, and probably some of the `(guix b= uild utils)` > > like `invoke`, `invoke/quiet`, [...] > > `invoke' has no place in your subset, as the invoked program can take arb= itrarily long. > (Shouldn't be a problem when multi-threading, but then there's less need = for this kind > of static analysis.) Well, the thing is --- I can probably write a shell script and test it **ou= tside of Shepherd**. In fact, in my mistake above, on a previous non-Guix = system, I ***did*** use a shell script to wait for daemon startup. But whe= n I ported it over to Guix, I decided to rewrite it as Scheme code to "real= ly dive into the Guix system". Before, when writing it as a shell script I was able to test it and indeed = found problems and fixed them. But as Scheme code, well, it's a bit harder= , since I have to go import a bunch of things (that I have to go search for= , and I have to figure out Guile path settings for the bits that are provid= ed by Guix, etc.), and then afterwards I have to copy back the tested code = into my `configuration.scm`. This was enough of an annoyance to me that it= discouraged me from testing it, which could have caught the above bug. Since `invoke` calls an external program, which runs isolated in its own pr= ocess (and can be `kill`ed in order to unstuck Shepherd), and can more easi= ly be independently tested outside the `configuration.scm`, it's substantia= lly safer than writing the same logic directly in the Shepherd language, so= it gets a pass. Many other daemons that have some kind of "wait-for-daemon-startup" program= will often include tests for the "wait" program as well in its test suite,= so using `invoke` is substantially safer than writing bespoke code in Sche= me that performs the same action. > > `mkdir-p` etc. > > I have vague plans to replace 'mkdir-p', 'mkdir-p/perms' etc. with some p= rocedure > (prepare-directory > `("/var/stuff #:bits #o700 #:user ... #:group > ("blabla" #:bits ... > #:atomic-generate-contents ,(lambda (port) (display 'stuff port)))) > ...)) > > ... automatically taking care of symlinks & detecting whether there are c= onflicts > with different services. > > > Sub-forms (or the entire form for an action) can be wrapped in `(unsafe= -turing-complete ...)` > > to skip this analysis for the sub-form, > > An escape hatch seems good to me, though I would rather call it > `(without-static-analysis (totality) ...)'. The forms are not neccesarily= =E2=80=98unsafe=E2=80=99, > only potentially so. We could also define a checkers for other issues thi= s way. > > (I'm thinking of potential TOCTTOU (time of check to time of use) problem= s involving > symbolic links.) > > > but otherwise, by default, the specific subset must be used, and users = have to > > explicitly put `(unsafe-turing-complete ...)` so they are aware that th= ey can > > seriously break their system if they make a mistake here. Ideally, as m= uch of > > the code for any Shepherd action should be outside an ``unsafe-turing-c= omplete`, and only parts of the code that really need the full Guile langua= ge to implement should be rapped in`unsafe-turing-complete`. > > I'm getting Rusty vibes here. Sounds sound to me. > > > (`lambda` and `let-syntax` and `let`, because they can rebind the meani= ngs of symbols, > > would need to be in `unsafe-turing-complete` --- otherwise the analysis= routine would > > require a full syntax expander as well) > > No. The analysis routine should not directly work on Scheme code, but rat= her on > the 'tree-il' (or maybe on the CPS code, I dunno). > Try (macro-expand '(let ((a 0)) a)) from a Guile REPL. > > > Turing-completeness is a weakness, not a strength, and restricting lang= uages to be > > the Least Powerful necessary is better. The `unsafe-turing-complete` fo= rm allows > > an escape, but by default Shepherd code should be in the restricted non= -Turing-complete > > subset, to reduce the scope for catastrophic mistakes. > > I'm assuming make-fork+exec-constructor/container would be defined in Sch= eme > and be whitelisted in raid5atemyhomework-scheme? Yes. > > - Section 5 -- Declarative > > Something you don't seem to have considered: defining services in a d= eclarative > language! Hypothetical example =C3=A0 la SystemD, written in somethin= g vaguely > resembling Wisp (Wisp is an alternative read syntax for Scheme with l= ess > parentheses): I prefer SRFI 110 myself. > (I'm a bit rusty with the details on defining shepherd services > in Guix: > > shepherd-service > start `I-need-to-think-of-a-name-constructor #:binary #$(file-append = package "/bin/stuff") #:arguments stuff #:umask suff #:user stuff #:group s= tuff ,@(if linux`(#:allowed-syscalls ',(x y z ...))) > #:friendly-shutdown > thunk > invoke #$(file-append package "/bin/stop-stuff-please") > #:polling-test-started? > thunk > if . file-exists? "/run/i-am-really-started" > #true > > ;; tell I-need-to-think-of.... to wait 4 secs > ;; before trying again > seconds 4 > ;; how much time starting may take > ;; until we consider things failed > ;; & stop the process > #:startup-timeout? > thunk > if spinning-disk? > plenty > little > #:actions etcetera > ... > > Well, that's technically still procedural, but this starts to look > like a SystemD configuration file due to the many keyword arguments! > > (Here (thunk X ...) =3D=3D (lambda () X ...), and > I-need-to-think-of-a-name-constructor > is a procedure with very many options, that > =E2=80=98should be enough for everyone. If it doesn't support > enough arguments, > look in SystemD for inspiration.) > > That seems easier to analyse. Although it's bit kitchen-sinky, > no kitchen is complete without a sink, so maybe that's ok ... > (I assume the kitchen sink I-need-to-think-of-a-name-constructor > would be defined in the guix source code.) Well, yes ---- but then we can start arguing that using SystemD would be be= tter, as it is more battle-tested and already exists and all the kitchen-si= nk features are already there and a lot more people can hack its language t= han can hack the specific dialect of scheme (augmented with Guixy things li= ke `make-forkexec-constructor`) used by GNU Shepherd. Thanks raid5atemyhomework