From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id KIzrCtaRVmCxBQAA0tVLHw (envelope-from ) for ; Sun, 21 Mar 2021 00:22:46 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id mJTHBtaRVmBDNgAA1q6Kng (envelope-from ) for ; Sun, 21 Mar 2021 00:22:46 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id CD13114E55 for ; Sun, 21 Mar 2021 01:22:45 +0100 (CET) Received: from localhost ([::1]:54868 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNlrh-0008Uu-19 for larch@yhetil.org; Sat, 20 Mar 2021 20:22:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:54632) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNlrI-0008Lk-2Q for guix-devel@gnu.org; Sat, 20 Mar 2021 20:22:21 -0400 Received: from mail-40140.protonmail.ch ([185.70.40.140]:44522) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNlrD-0002cZ-Ly for guix-devel@gnu.org; Sat, 20 Mar 2021 20:22:19 -0400 Date: Sun, 21 Mar 2021 00:22:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1616286132; bh=vWD2hWZoUp0kZCDoanC+MqXI6AGterEb0bH6KGhxJj0=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=R+P5WrzEa1AucqhYAL0+vt36wPycM9mjRH/zVBBVj66I3W2bObBLxJl+pY/NQpgd8 lgDORqtKmJUV7P3PxFKW2u899YwzxuS6SGn342lWYw6MS5Q37rv1aa3XU4dcYDVWmC ZyTdFmP1D9eQazqyRiiP0Pc3I09IG05DjYaXioSU= To: =?utf-8?Q?Ludovic_Court=C3=A8s?= From: raid5atemyhomework Cc: "guix-devel@gnu.org" Subject: Re: A Critique of Shepherd Design Message-ID: In-Reply-To: <87a6qx7of5.fsf@gnu.org> References: <87a6qx7of5.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.140; envelope-from=raid5atemyhomework@protonmail.com; helo=mail-40140.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: raid5atemyhomework Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616286165; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=vWD2hWZoUp0kZCDoanC+MqXI6AGterEb0bH6KGhxJj0=; b=BmT8JBpIByAshOoIPerCXAjXrKogEdA123SQ5spn1G6ztvyNWTXaqAR2S8uw0NrI1vHlmZ DXoL8jY+bhff7d8bIMP4JfAPKfHcLmy3BmgpvIqD+eSH21u4c1PSvzjrXfNI87fCtsAOfC 752PKprAE4LbtACx+g8hMPyjc7IC4axQ6QfDgkZX2NXbbx6ks5kXxDnwUnPVmpP3W2Cfw7 uTweb5DOeY4U1IfFXLFf6a16scZ6kBPL/N0ooUT0SQpwW0KCNiRWfebSG+WDrNl6IyV4SM Ep/M4wr2tzAJiWKkBLUsy7KCmUyAqVoHxL5cZjNaHyl8b00pLD28VwbYfnJfaw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616286165; a=rsa-sha256; cv=none; b=Uqs31Wj0mxi5nYDHJbFFB+CZEmoY3s/AM3gvEB6oESdMyvZaP23561uoR5junQsnSdedqv yPeRJE39JMy6iqkFfbMxTG7zBNomoCk5tgUeEB8GymZLfLBKVWTy1vX9V6quKtT0Y244lv NzOYgVV+qjX31ej2ycC/YOTEStBkwgjfdY0ZDQFS0guWARTv0VWmYUcArlhq4ygAxtBpup 5ONQZ0N0r6DNu2TfjiYMnuTI8tTzqU6g1Xg6rvk1BnPk8C8yANlO8eRw/eaKpB9CLfHYzK 3u3crMiK0Yt2GEJspp06YzAKsbM415fAtZ5mcsy7fApHo5OBPzDWljwn2A3AlQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=R+P5WrzE; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -3.11 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail header.b=R+P5WrzE; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: CD13114E55 X-Spam-Score: -3.11 X-Migadu-Scanner: scn0.migadu.com X-TUID: Voy2l7d2JhKM Hello Ludo', > Hi, > > raid5atemyhomework raid5atemyhomework@protonmail.com skribis: > > > Now, let us combine this with the second feature (really a bug): GNU > > shepherd is a simple, single-threaded Scheme program. That means that > > if the single thread enters an infinite loop (because of a Shepherd > > service description that entered an infinite loop), then Shepherd > > itself hangs. > > You=E2=80=99re right that it=E2=80=99s an issue; in practice, it=E2=80= =99s okay because we pay > attention to the code we run there, but obviously, mistakes could lead > to the situation you describe. > > It=E2=80=99s a known problem and there are plans to address it, discussed= on > this list a few times before. The Shepherd =E2=80=9Crecently=E2=80=9D swi= tched to > =E2=80=98signalfd=E2=80=99 for signal handling in the main loop, with an = eye on making > the whole loop event-driven: > > https://issues.guix.gnu.org/41507 > > This will address this issue and unlock things like =E2=80=9Csocket activ= ation=E2=80=9D. > > That said, let=E2=80=99s not lie to ourselves: the Shepherd=E2=80=99s des= ign is > simplistic. I think that=E2=80=99s okay though because there=E2=80=99s a = way to address > the main issues while keeping it simple. I'm not sure you can afford to keep it simple. Consider: https://issues.gu= ix.gnu.org/47253 In that issue, the `networking` provision comes up potentially *before* the= network is, in fact, up. This means that other daemons that require `netw= orking` could potentially be started before the network connection is up. One example of such a daemon is `transmission-daemon`. This daemon will bi= nd itself to port 9091 so you can control it. Unfortunately, if it gets st= arted while network is down, it will be unable to bind to 9091 (so you can'= t control it) but still keep running. On my system that means that on rebo= ot I have to manually `sudo herd restart trannsmission-daemon`. In another example, I have a custom daemon that I have set up to use the To= r proxy over 127.0.0.1:9050. It requires both `networking` and `tor`. Whe= n it starts after `networking` comes up but before the actual network does,= it dies because it can't access the proxy at 127.0.0.1:9050 (apparently Ne= tworkManager handles loopback as well). Then shepherd respawns it, then it= dies again (network still not up) enough times that it gets disabled. Thi= s means that on reboot I have to manually `sudo herd enable raid5atemyhomew= ork-custom-daemon` and `sudo herd restart raid5atemyhomework-custom-daemon`= . On SystemD-based systems, there's a `NetworkManager-network-online.service`= which just calls `nm-online -s -q --timeout=3D30`. This delays network-re= quiring daemons until after the network is in fact actually up. However in https://issues.guix.gnu.org/47253#1 Mark points out this is unde= sirable in the Guix case since it could potentially stall the (single-threa= ded) bootup process for up to 30 seconds if the network is physically disco= nnected, a bad UX for desktop and laptop users (who might still want to run= `transmission-daemon`, BTW) because it potentially blocks the initializati= on of X and make the computer unusable for such users for up to 30 seconds = after boot. I note that I experienced such issues in some very old Ubuntu = installations, as well. SystemD can afford to *always* have `nm-online -s -q --timeout=3D30` becaus= e it's concurrent. The `network-online.service` will block, but other serv= ices like X that don't ***need*** the network will continue booting. So th= e user can still get to a usable system even if the boot isn't complete bec= ause the network isn't up yet due to factors beyond the control of the oper= ating system. Switching to a concurrent design for Shepherd --- *any* concurrent design -= -- is probably best done sooner rather than later, because it risks strongl= y affecting customized `configuration.scm`s like mine that have almost a ha= lf dozen custom Shepherd daemons. Thanks raid5atemyhomework