From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id WKzFK82lxl6OVgAA0tVLHw (envelope-from ) for ; Thu, 21 May 2020 16:01:17 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id qPF1J82lxl4pYAAAbx9fmQ (envelope-from ) for ; Thu, 21 May 2020 16:01:17 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2B55D94042C for ; Thu, 21 May 2020 16:01:17 +0000 (UTC) Received: from localhost ([::1]:58370 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbndC-0003gV-JO for larch@yhetil.org; Thu, 21 May 2020 12:01:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44554) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbnc3-0003fm-5O for bug-guix@gnu.org; Thu, 21 May 2020 12:00:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:45599) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jbnc2-0003pA-Qv for bug-guix@gnu.org; Thu, 21 May 2020 12:00:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jbnc2-0006CX-PZ for bug-guix@gnu.org; Thu, 21 May 2020 12:00:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41429: Shepherd Sometimes Crashes Resent-From: Katherine Cox-Buday Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 21 May 2020 16:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41429 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Efraim Flashner Received: via spool by 41429-submit@debbugs.gnu.org id=B41429.159007679623798 (code B ref 41429); Thu, 21 May 2020 16:00:02 +0000 Received: (at 41429) by debbugs.gnu.org; 21 May 2020 15:59:56 +0000 Received: from localhost ([127.0.0.1]:57145 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jbnbw-0006Bm-8k for submit@debbugs.gnu.org; Thu, 21 May 2020 11:59:56 -0400 Received: from mail-io1-f44.google.com ([209.85.166.44]:42469) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jbnbt-0006BX-8Q for 41429@debbugs.gnu.org; Thu, 21 May 2020 11:59:54 -0400 Received: by mail-io1-f44.google.com with SMTP id e18so7938516iog.9 for <41429@debbugs.gnu.org>; Thu, 21 May 2020 08:59:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=NggpimCG87oy8hK9I5f6aL7wqAg7F8zqjBJDatxRDGs=; b=HQammI1SYWdkk7W9I47adCs9WNVAZElgVKRU/fbjIhhpnMTlIvNdX74nL4UdcFY0my vRoGRxtzKSnTLrgLEwTQWVc2kPfMgnGrZmeOn+nHtp6bmOKcbu152CcAOqueI/oJiPv+ 4NtJRn56n1rJuxjuR/ja8lWsvH/ntiaialaSQ7NDLRNYGcB71v0+J0yIy9qjd0YpSLYN noWqhVrXuJDqqn8bEryyeTHNJ41FNGuYAAu/KZV9RtEu1OM9x/o5vDyfNnJoUbOWd9/x YC0nQEQM/yvw3/nd7C8ROeNH0AvYxAt6nNi9SiY+/HNS2HnbTYrvh6zLmdhqHVjxqfVd FGKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=NggpimCG87oy8hK9I5f6aL7wqAg7F8zqjBJDatxRDGs=; b=OTWRZT0U6no713rWmKK65rrEFDBTrsJ+z5JOmDYR3DzJROls3igewWfR/F6H8b1ai7 fbIUSdA3mtMAdT7bFjDyDjfMlT8mvsX85Kg7sVGHZFkqlTr5QVhvZaIb1uGgeTh7lFIc +tDqx4RMq9CyjkeV/8NYLmX4pC9+g91bUu09x6FRdX5P+If42DwXjeC+mvVUuxh5Mgi+ nv6dwr4yIRvqvjqyr068HVmRk+LxGdqxW1jorIXk0K5QrI5r/dYXsTyct+RooKXq2b7H ktYWX0JD/e5YRdFZUCTgSZRP037Yf1yf1pASb9ERpa44eo9B+EhAgEk0pD8qlyyqSHxg Jw9Q== X-Gm-Message-State: AOAM5326G+cjl5UQ5JfbZP8Mu/VOI2NO50pDqFHNbJjarWF22TS4Nda9 IXIIGAc13sQWeye5JzTUtylfDJee X-Google-Smtp-Source: ABdhPJym9S0IWE6oJ9ICEtZcgfYMVbLQS3kKQf6R/FqBwPcQO7Ojpt1zp8W1PP+bdncmvL8h4BI1Cg== X-Received: by 2002:a6b:7b42:: with SMTP id m2mr7670360iop.98.1590076785480; Thu, 21 May 2020 08:59:45 -0700 (PDT) Received: from gazelle-pro (172-221-246-205.res.spectrum.com. [172.221.246.205]) by smtp.gmail.com with ESMTPSA id a13sm3301271ill.34.2020.05.21.08.59.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2020 08:59:44 -0700 (PDT) From: Katherine Cox-Buday References: <87d06yc7t4.fsf@gmail.com> <20200521121443.GC958@E5400> <87sgftbgd1.fsf@gmail.com> <20200521140442.GF958@E5400> Date: Thu, 21 May 2020 10:59:43 -0500 In-Reply-To: <20200521140442.GF958@E5400> (Efraim Flashner's message of "Thu, 21 May 2020 17:04:42 +0300") Message-ID: <87k115b7o0.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 41429@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=HQammI1S; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Spam-Score: 0.09 X-TUID: /roWhPY5WF90 Efraim Flashner writes: >> Your comment is kind of scary though! Shepherd is the thing I want to >> stay up no matter what since it's responsible for monitoring and >> restarting things. The idea that a misbehaving or poorly written service >> could bring down the entire Shepherd process is a problem! Is there no >> isolation? > > I have a whole collection of attempts to integrate mcron with shepherd, > to create loops and add jobs only when the service is active. Attempting > to fork off and then collect the child process and then fail just enough > to make the service restart. Lots of cringe-worthy code. The more common > fail scenarios I see are shepherd fails to start because it doesn't like > my start code of one of the services or actually starting the service > somehow kills it. All of those were with straight lambdas to the start > command though. I'm not familiar with Shepherd's internals, so I don't know why interacting with a cron is relevant. > Do you have your services writing out any logs? Maybe there's a clue > there. Not yet, but I should be enabling this soon, and if they display anything I'll report back. Still, this seems beside the point: the bug is that Shepherd needs to stay up regardless of what the services it's monitoring do. -- Katherine