From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id KGL9HfNzVWCjWQAA0tVLHw (envelope-from ) for ; Sat, 20 Mar 2021 04:02:59 +0000 Received: from aspmx2.migadu.com ([2001:41d0:2:c151::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id YNq6GfNzVWA0ZQAAB5/wlQ (envelope-from ) for ; Sat, 20 Mar 2021 04:02:59 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx2.migadu.com (Postfix) with ESMTPS id B9A6324CC1 for ; Sat, 20 Mar 2021 05:02:58 +0100 (CET) Received: from localhost ([::1]:58682 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNSpF-0001gL-Th for larch@yhetil.org; Sat, 20 Mar 2021 00:02:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48402) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNSp2-0001gD-2W for guix-devel@gnu.org; Sat, 20 Mar 2021 00:02:44 -0400 Received: from mail-qv1-xf2a.google.com ([2607:f8b0:4864:20::f2a]:41666) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lNSoz-0000Zl-1p for guix-devel@gnu.org; Sat, 20 Mar 2021 00:02:43 -0400 Received: by mail-qv1-xf2a.google.com with SMTP id h3so6071416qvh.8 for ; Fri, 19 Mar 2021 21:02:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=rZvEvaVy6M901E4ZlQfqA1CwJhgPIN8faGZxjMmaKEc=; b=pYx1WLUYMLCahrhizq7ltNlcT0ZyrVB5L94IxJyAUytUxcP/h4cr3WhvZrP5Sct2bj 3/+/m7MT9dfSNTT8h0m5EQMt9XKwrSGRkcjxfK57xFqtin6FzPhof4Sv2jAUWFbu5U3C 2tuCtm4xzXj48O5jckgRRh6jEmiMlJq3DEufNzPpzQ4spFop4/uc4BPb5hJDhScp7xTB 4FvzWbEJNqdI+nkN2SzNDdwgjzyXUcnZwVuNonPgVwwuyH8U+9z9qBP8oSlN3ME5Luha NljSYpoSp8hi1OXimNGfmkZjAuf0GGMJ/Apwt1uXvCGwoTgIeJ8mkke9brgDRYL1amc9 hWsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=rZvEvaVy6M901E4ZlQfqA1CwJhgPIN8faGZxjMmaKEc=; b=o6xGMW5Z06O0yXha0Y+AVjNH3SpNpg1PK6MuUq3CQ2Qax8i2YgnHjX8RqerqSfPMY+ QJwRD2wQT8DL7JEprTX1HNU7acNH6WBmy44llPfy/It9vZhugwRDLe3baX6TJsT419SP 9MQR3Ew942SFksAADPgcfrN2TZZWkSOkZmsXrQvOsyTioXNV66BSAyTNLX2NjUDhSxRd A4kGBQTBtHq4PZM/b9VmINEbJW7910dBxLt81rY9TXbcjw2bp5kC4vRyYDT+aIPAWPOC a3JfwgBUYwM0QIYVpiIoD66trT9XxbWzo7Bsbe5mLvcdaqO/8w7dQKiFxpNGC+BObtK8 /39Q== X-Gm-Message-State: AOAM530LroaS9zqmtDAMFgKzuz39Yl7uVSlkPhhxpeNHHV0G7G8tTCZW AZQObgbagstqBWcsrKw50m1KLBRyxbc= X-Google-Smtp-Source: ABdhPJxL+EAdyEtxEyK8a3GNRWQu6eLr30pWlfKo+uLraOfvVaae4x2QDMI7em0nCZ7UjKN7VxdzQw== X-Received: by 2002:ad4:472d:: with SMTP id l13mr12244806qvz.17.1616212959131; Fri, 19 Mar 2021 21:02:39 -0700 (PDT) Received: from hurd (dsl-148-122.b2b2c.ca. [66.158.148.122]) by smtp.gmail.com with ESMTPSA id j12sm5196054qtn.36.2021.03.19.21.02.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Mar 2021 21:02:38 -0700 (PDT) From: Maxim Cournoyer To: raid5atemyhomework Subject: Re: A Critique of Shepherd Design References: Date: Sat, 20 Mar 2021 00:02:36 -0400 In-Reply-To: (raid5atemyhomework@protonmail.com's message of "Fri, 19 Mar 2021 17:33:57 +0000") Message-ID: <871rcalbfn.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::f2a; envelope-from=maxim.cournoyer@gmail.com; helo=mail-qv1-xf2a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "guix-devel@gnu.org" Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616212978; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=rZvEvaVy6M901E4ZlQfqA1CwJhgPIN8faGZxjMmaKEc=; b=USTC9lv5+a2ZWDGvy/809618w6r4tTx950iCDW4x/hGF6ue7oqNCbV6rnnSVgcrl8XfXdg ppwB+HuSrXTl9Gm3nMRjlO58NSIjTrIclEGfb2FVauMPDZrvMBnV3SniL8D5SEjTR+6J06 fSfo5eA/fySYj/BEVFf6fhSi2roXuq8hSvsEibwa8ul/BV71gNtc9ITwfcKnoredW59/w+ gqBUuEHowfvVvZQ+UkzSf/T0olPXEXfJrA1fMdY4T6z0SAMR2BtGOmSRbOpV1CAip5kxKy M1RWcs59hRr3Hd1oEjc5c9mKEN5tjN3A90vY7kEj5NoEiZtYeMCcTmDU9MAPNQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616212978; a=rsa-sha256; cv=none; b=eeZQRKTX8rpbybW5/wIfsfYJPwd3awFQVc3hJxILt4mOVdS/J2InXq0dHhkYxTqJe5LHrQ A1ltrwPqC6q+3jwarCvNPTQaG0FCLjaC8EgPcYXm8ppJHGWtoK6ytPU2j6RUrTGMrO3/yx WQXiTMPz12PMR1mAChR4iYBx7k7ZkeI+LZxr/nqORKVL01IR7Yd2FdDm907WRnm6RKho9H bH55VmBTP3TMtK1fhMesWVDqAYbfa5OGZ7Su7g5sFLtFZaFe8sgYyp56cpGx8keiilgQar /RirwTc5b9N/NCfqKlypolSJtZg35t+14pJAreEWt4k+9nPegWLRyEsN7x7bIw== ARC-Authentication-Results: i=1; aspmx2.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=pYx1WLUY; spf=pass (aspmx2.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.11 Authentication-Results: aspmx2.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=pYx1WLUY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx2.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: B9A6324CC1 X-Spam-Score: -3.11 X-Migadu-Scanner: scn0.migadu.com X-TUID: VTk2YyFvk+G5 Hi, raid5atemyhomework writes: > GNU Shepherd is the `init` system used by GNU Guix. It features: > > * A rich full Scheme language to describe actions. > * A simple core that is easy to maintain. > > However, in this critique, I contend that these features are bugs. > > The Shepherd language for describing actions on Shepherd daemons is a > Turing-complete Guile language. Turing completeness runs afoul of the > Principle of Least Power. In principle, all that actions have to do > is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls. Yet the > language is a full Turing-complete language, including the major > weakness of Turing-completeness: the inability to solve the halting > problem. > > The fact that the halting problem is unsolved in the language means it > is possible to trivially write an infinite loop in the language. In > the context of an `init` system, the possibility of an infinite loop > is dangerous, as it means the system may never complete bootup. > > Now, let us combine this with the second feature (really a bug): GNU > shepherd is a simple, single-threaded Scheme program. That means that > if the single thread enters an infinite loop (because of a Shepherd > service description that entered an infinite loop), then Shepherd > itself hangs. This means that the system is severely broken. You > cannot `sudo reboot` because that communicates with the Shepherd > daemon. You cannot even `kill 1` because signals are handled in the > mainloop, which never get entered if the `start` action of some > Shepherd daemon managed to enter an infinite loop. > > I have experienced this firsthand since I wrote a Shepherd service to > launch a daemon, and I needed to wait for the daemon initialization to > complete. My implementation of this had a bug that caused an infinite > loop to be entered, but would only tickle this bug when a very > particular (persistent on-disk) state existed. I wrote this code > about a month or so ago, and never got to test it until last week, > when the bug was tickled. Unfortunately, by then I had removed older > system versions that predated the bug. When I looked at a backup copy > of the `configuration.scm`, I discovered the bug soon afterwards. But > the amount of code verbiage needed here had apparently overwhelmed me > at the time I wrote the code to do the waiting, and the bug got into > the code and broke my system. I had to completely reinstall Guix > (fortunately the backup copy of `configuration.scm` was helpful in > recovering most of the system, also ZFS rocks). > > Yes, I made a mistake. I'm only human. It should be easy to recover > from mistakes. The full Turing-completeness of the language invites > mistakes, and the single-threadedness makes the infinite-loop mistakes > that Turing-completeness invites, into catastrophic system breakages. > > So what can we do? > > For one, a Turing-complete language is a strict *superset* of > non-Turing-complete languages. So one thing we can do is to define a > more dedicated language for Shepherd actions, strongly encourage the > use of that sub-language, and, at some point, require that truly > Turing-complete actions need to be wrapped in a > `(unsafe-turing-complete ...)` form. > > For example, in addition to the current existing > `make-forkexec-constructor` and `make-kill-destructor`, let us also > add these syntax forms: > > `(wait-for-condition
)` - Return a procedure that > accepts a numeric `pid`, that does: Check if evaluating `` in > the lexical context results in `#f`. If so, wait one second and > re-evaluate, but exit anyway and return `#f` if `` seconds > has passed, or if the `pid` has died. If `` evaluates to > non-`#f` then return it immediately. > > `(timed-action ...)` - Return a procedure that > accepts a numeric `pid`, that does: Spawn a new thread (or maybe > better a `fork`ed process group?). The new thread evaluates ` > ...` in sequence. If the thread completes or throws in `` > seconds, return the result or throw the exception in the main thread. > IF the `` is reached or the given `pid` has died, kill the > thread and any processes it may have spawned, then return `#f`. > > `(wait-to-die )` - Return a procedure that accepts a `pid` > that does: Check if the `pid` has died, if so, return #t. Else sleep > 1 second and try again, and if `` is reached, return `#f`. > > The above forms should also report as well what they are doing > (e.g. `Have been waiting 10 of 30 seconds for `) on the console > and/or syslog. > > In addition, `make-forkexec-constructor` should accept a > `#:post-fork`, which is a procedure that accepts a `pid`, and > `make-kill-destructor` should accept a `#:pre-kill`, which is a > procedure that accepts a `pid`. Possibly we need to add combinators > for the above actions as well. For example a `sub-actions` procedural > form that accepts any number of functions of the above `pid -> bool` > type and creates a single combined `pid -> bool`. > > So for example, in my case, I would have written a > `make-forkexec-constructor` that accepted a `#:post-fork` that had a > `wait-for-condition` on the code that checked if the daemon > initialization completed correctly. > > I think it is a common enough pattern that: > > * We have to spawn a daemon. > * We have to wait for the daemon to be properly initialized (`#:post-fork`) > * When shutting down the daemon, it's best to at least try to politely > ask it to finish using whatever means the daemon has (`#:pre-kill`). > * If the daemon doesn't exit soon after we politely asked it to, be > less polite and start sending signals. > > So I think the above should cover a good number of necessities. > > > Then, we can define a strict subset of Guile, containing a set of > forms we know are total (i.e. we have a proof they will always halt > for all inputs). Then any Shepherd actions, being Lisp code and > therefore homoiconic, can be analyzed. Every list must have a `car` > that is a symbol naming a syntax or procedure that is known to be safe > --- `list`, `append`, `cons*`, `cons`, `string-append`, `string-join`, > `quote` (which also prevents analysis of sub-lists), `and`, `or`, > `begin`, thunk combinators, as well as the domain-specific > `make-forkexec-constructor`, `make-kill-destructor`, > `wait-for-condition`, `timed-action`, and probably some of the `(guix > build utils)` like `invoke`, `invoke/quiet`, `mkdir-p` etc. > > Sub-forms (or the entire form for an action) can be wrapped in > `(unsafe-turing-complete ...)` to skip this analysis for the sub-form, > but otherwise, by default, the specific subset must be used, and users > have to explicitly put `(unsafe-turing-complete ...)` so they are > aware that they can seriously break their system if they make a > mistake here. Ideally, as much of the code for any Shepherd action > should be *outside* an ``unsafe-turing-complete`, and only parts of > the code that really need the full Guile language to implement should > be rapped in `unsafe-turing-complete`. > > (`lambda` and `let-syntax` and `let`, because they can rebind the > meanings of symbols, would need to be in `unsafe-turing-complete` --- > otherwise the analysis routine would require a full syntax expander as > well) > > > Turing-completeness is a weakness, not a strength, and restricting > languages to be the Least Powerful necessary is better. The > `unsafe-turing-complete` form allows an escape, but by default > Shepherd code should be in the restricted non-Turing-complete subset, > to reduce the scope for catastrophic mistakes. Thanks for the well written critique. Turing-completeness is a two edged sword; with great power comes great responsibility :-). The same could be said of our use of Guile for writing package definitions, but in a context where mistakes are easier to catch. Your proposal is also interesting... I'm not sure if I like it yet but I'll keep thinking about it. The reason I'm somewhat reluctant is for the same reason I think using Guile to write package definitions is great: maximum expressiveness and flexibility. If we were to try and start restricting that core feature, it wouldn't really be Shepherd anymore. Other init systems such as systemd keep their descriptive language minimal but accommodate running pre or post actions (typically scripts) very easily (but with restrictions about how long they can run in the case of systemd IIRC). Thanks for having shared your analysis! Maxim