From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id uOeqI17YVWBNFwAA0tVLHw (envelope-from ) for ; Sat, 20 Mar 2021 11:11:26 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 0H1WH17YVWANLgAAbx9fmQ (envelope-from ) for ; Sat, 20 Mar 2021 11:11:26 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id DC6AEBC46 for ; Sat, 20 Mar 2021 12:11:25 +0100 (CET) Received: from localhost ([::1]:60472 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNZVs-0001y8-Vv for larch@yhetil.org; Sat, 20 Mar 2021 07:11:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38524) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNZVj-0001wg-0b for guix-devel@gnu.org; Sat, 20 Mar 2021 07:11:15 -0400 Received: from mail-40132.protonmail.ch ([185.70.40.132]:24031) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNZVf-0003Lj-5I for guix-devel@gnu.org; Sat, 20 Mar 2021 07:11:14 -0400 Date: Sat, 20 Mar 2021 11:10:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1616238661; bh=iFJAykJgkwwjaAWK2NayAgfFZUcpowMqcapvqj3tN9w=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=jmuMSuyOEvJwN992lgDnDPXawQ+WxAyWbdhhupqD9unaOCfdoYzJbZFAQN+dO4NDA EuSIxfQjXzrY0Jv8hiezepPNOKMdj1RBmlJihuSDINVWDkNzk9TW1Xj53QtQPwofA5 k6hi460CtHx6iEERJo44kidCD+9K4DhFrYDzefSg= To: Mark H Weaver From: raid5atemyhomework Cc: "guix-devel@gnu.org" Subject: Re: A Critique of Shepherd Design Message-ID: In-Reply-To: <87y2eil8eb.fsf@netris.org> References: <87y2eil8eb.fsf@netris.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.132; envelope-from=raid5atemyhomework@protonmail.com; helo=mail-40132.protonmail.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: raid5atemyhomework Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616238686; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=iFJAykJgkwwjaAWK2NayAgfFZUcpowMqcapvqj3tN9w=; b=Abg2jkPxr4hGHty5RuVZOqEk9rfNKdtMqqff51dui3zZuOBLNizC2LbaLtWjyFEQ5znhyJ /0Svxvl+aTn8SHXiGHZMSpVtQOSHqY6ExLfuE83941ohTJojZQWXXbzSCS2SIDHsolsiuh QqVjwIqtzq+0urnE8nOcQMAkv4MjU6kDtY+lziHvbs2PMNfrz9Y1l/jN/cPfqESBpE1DhV Id+8PTW+7Wx4Lo5Nve+MFFdfx5+8gYYZfH1Gi4uMWAg+2yQJIHaxW7GvEZS/IsyQRTSXQh zQh0Fi7EFPycYuZ9xh40SLBjlEgK7D5DeQLKpdmOL4mrgIVxjI0NhgIyA0UOvw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616238686; a=rsa-sha256; cv=none; b=LmOMzEl+bbdWLBBVaSzFFNWjhKkdr4JZU3oKR8tsZaIgT+kFRfBkfotbbzwykJ1KVZtXsU 5Pqn5iFDBOYQ4bv2QBmJm/0yL0WnYbmD1ZpB1Gc2EN0hHN6NHpJu10PMtrftK68P8vI5W/ UJnMTGZA4zJt2tyEdNwbhKCDs2HdBr/Fx8U2k68V3lTBmlvMg2d3LtdI1Xpk7ZAb97pmuU OmcGrq/0RtOSqgkfIwytQkD5w3w+zAWO7iiQ/FUlIB7tB7HLZX/+dVWoaO6JQWgQhpfi+K MjR+jzMCtpfGh2bldbpE1SyJHSolc16uBuy6rpEXOkVUXcYqWABsujUilZKtvQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=jmuMSuyO; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.11 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=jmuMSuyO; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: DC6AEBC46 X-Spam-Score: -3.11 X-Migadu-Scanner: scn0.migadu.com X-TUID: rFr87rQ0avem Good rmoning Mark, > Hi, > > raid5atemyhomework raid5atemyhomework@protonmail.com writes: > > > GNU Shepherd is the `init` system used by GNU Guix. It features: > > > > - A rich full Scheme language to describe actions. > > - A simple core that is easy to maintain. > > > > However, in this critique, I contend that these features are bugs. > > The Shepherd language for describing actions on Shepherd daemons is a > > Turing-complete Guile language. Turing completeness runs afoul of the > > Principle of Least Power. In principle, all that actions have to do > > is invoke `exec`, `fork`, `kill`, and `waitpid` syscalls. > > These 4 calls are already enough to run "sleep 100000000000" and wait > for it to finish, or to rebuild your Guix system with an extra patch > added to glibc. I agree. But this mechanism is intended to avoid stupid mistakes like what= I committed, not protect against an attacker who is capable of invoking `g= uix system reconfigure` on arbitrary Scheme code (and can easily wrap anyth= ing nefarious in any `unsafe-turing-complete` or `without-static-analysis` = escape mechanism). Seatbelts, not steel walls. > > > Yet the language is a full Turing-complete language, including the > > major weakness of Turing-completeness: the inability to solve the > > halting problem. > > The fact that the halting problem is unsolved in the language means it > > is possible to trivially write an infinite loop in the language. In > > the context of an `init` system, the possibility of an infinite loop > > is dangerous, as it means the system may never complete bootup. > > Limiting ourselves to strictly total functions wouldn't help much here, > because for all practical purposes, computing 10^100 digits of Pi is > just as bad as an infinite loop. Indeed. Again, seatbelts, not steel walls. It's fairly difficult to commi= t a mistake that causes you to accidentally write a program that computes 1= 0^100 digits of pi, not so difficult to have a brain fart and use `(- count= 1)` instead of `(+ count 1)` because you were wondering idly whether an in= crement or a decrement loop would be more Scemey or if both are just as Sch= emey as the other. What I propose would protect against the latter (a much more likely mistake= ), as in-context the recursive loop would be flagged since the recursion wo= uld be flagged due to being a call to a function that is not a member of a = whitelist. Hopefully getting recursive loops flagged would make the sysad = writing `configuration.scm` look for the "proper" way to wait for an event = to be true, and hopefully lead to them discovering the (hopefully extant) d= ocumentation on whatever domain-specific language we have for waiting for t= he event to be true instead of rolling their own. > That said, I certainly agree that Shepherd could use improvement, and > I'm glad that you've started this discussion. > > At a glance, your idea of having Shepherd do more within subprocesses > looks promising to me, although this is not my area of expertise. An issue here is that we sometimes pass data across Shepherd actions using = environment variables, which do not cross process boundaries. Xref. the `s= et-http-proxy` of `guix-daemon`; the environment variable is used as a glob= al namespace that is accessible from both the `set-http-proxy` and `start` = actions. On the other hand, arguably the environment variable table is a global reso= urce shared amongst multiple shepherd daemons. This technique in general m= ay not scale well for large numbers of daemons; environment variable name c= onflicts may cause subtle problems later. I think it would be better if in= addition to the "value" (typically the PID) each Shepherd service also had= a `settings` (which can be used to contain anything that satisfies `(lambd= a (x) (equal? x (read (print x))))` so that it can be easily serialized acr= oss each subprocess launched by each action) that can be read and modified = by each action. Then the `set-http-proxy` action would update this `settin= gs` field for the shepherd service, then queue up a `restart` action. It c= ould by convention be an association list. This would also persist the `http_proxy` setting, BTW --- currently if you = `herd set-http-proxy guix-daemon ` and then `herd restart guix-da= emon` later, the HTTP proxy is lost (since the environment variable is clea= red after `set-http-proxy` restarts the `guix-daemon`). In short, this `se= t-http-proxy` example looks like a fairly brittle hack anyway, and maybe wo= rth avoiding as a pattern. Then there's actions that invoke other actions. From a cursory glance at t= he Guix code it looks like only Ganeti and Guix-Daemon have actions that in= voke actions, and they only invoke actions on their own Shepherd services. = It seems to me safe for an action invoked in another action of the same se= rvice to *not* spawn a new process, but to execute as the same process. No= t sure how safe it would be to allow one shepherd service to invoke an acti= on on another shepherd service --- but then the `start` action of any servi= ce may cause other services it requires to be started as well, so we still = do need to figure out what subprocesses to launch or not launch. Or maybe each Shepherd service has its own subprocess that is its own mainl= oop, and the "main" Shepherd process mainloop "just" serves as a switching = center to forward commands to each service's mainloop-subprocess, and also = incidentally monitors per-service mainloop-subprocess that are not respondi= ng fast enough (and possibly decide to kill those mainloops and all its chi= ldren, then disable that service). This would make each service's environm= ent variables a persistent but local store that is specific to each service= and makes its use in `guix-daemon` safe, and the `set-http-proxy` would si= mply not clear the env vars so that the setting persists. This allows Shep= herd to remain responsive at all times even if some action of some Shepherd= service enters an infloop or 10^100 pi digits condition; it could even hav= e `herd status` report the number of pending unhandled commands for each se= rvice to inform the sysad about possible problems with specific services. Thanks raid5atemyhomework