From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 6EOaKtSKxl4yQwAA0tVLHw (envelope-from ) for ; Thu, 21 May 2020 14:06:12 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id ML9dJtSKxl4cGgAAB5/wlQ (envelope-from ) for ; Thu, 21 May 2020 14:06:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C90AF9404E0 for ; Thu, 21 May 2020 14:06:11 +0000 (UTC) Received: from localhost ([::1]:49814 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jblpq-0006cG-6l for larch@yhetil.org; Thu, 21 May 2020 10:06:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59740) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jblpi-0006ad-K1 for bug-guix@gnu.org; Thu, 21 May 2020 10:06:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:45486) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jblpi-0003U7-BP for bug-guix@gnu.org; Thu, 21 May 2020 10:06:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jblpi-0003Ui-1I for bug-guix@gnu.org; Thu, 21 May 2020 10:06:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41429: Shepherd Sometimes Crashes Resent-From: Efraim Flashner Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 21 May 2020 14:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41429 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Katherine Cox-Buday Received: via spool by 41429-submit@debbugs.gnu.org id=B41429.159006992413374 (code B ref 41429); Thu, 21 May 2020 14:06:01 +0000 Received: (at 41429) by debbugs.gnu.org; 21 May 2020 14:05:24 +0000 Received: from localhost ([127.0.0.1]:57032 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jblp5-0003Te-OH for submit@debbugs.gnu.org; Thu, 21 May 2020 10:05:24 -0400 Received: from flashner.co.il ([178.62.234.194]:42726) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jblp3-0003TK-70 for 41429@debbugs.gnu.org; Thu, 21 May 2020 10:05:21 -0400 Received: from localhost (unknown [188.120.128.132]) by flashner.co.il (Postfix) with ESMTPSA id DFBD54025B; Thu, 21 May 2020 14:05:14 +0000 (UTC) Date: Thu, 21 May 2020 17:04:42 +0300 From: Efraim Flashner Message-ID: <20200521140442.GF958@E5400> References: <87d06yc7t4.fsf@gmail.com> <20200521121443.GC958@E5400> <87sgftbgd1.fsf@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="vA66WO2vHvL/CRSR" Content-Disposition: inline In-Reply-To: <87sgftbgd1.fsf@gmail.com> X-PGP-Key-ID: 0x41AAE7DCCA3D8351 X-PGP-Key: https://flashner.co.il/~efraim/efraim_flashner.asc X-PGP-Fingerprint: A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 41429@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Spam-Score: 0.69 X-TUID: KY49P1UeczDV --vA66WO2vHvL/CRSR Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 21, 2020 at 07:51:54AM -0500, Katherine Cox-Buday wrote: > Efraim Flashner writes: >=20 > > On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote: > >> I am running shepherd as a userspace service manager on an alien distr= o. > >> Occassionally (often enough as to cause concern), Shepherd is crashing. > >> I am unable to narrow down a cause, but anecdotally, it seems to happen > >> more often when a service it's managing fails repeatedly and is > >> disabled. > >>=20 > >> I'm running `strace` against the Shepherd process in an attempt to > >> submit a better bug report, but this is all I have for now. Maybe othe= rs > >> have also seen this behavior. > > > > I found it happens less often with shepherd-0.8. What version are you > > running? Also possibly related, do you have mismatched versions of guile > > between guix packages and your distro's native packages? >=20 > Sorry, I forgot to include the version! I am running 0.8 from a store > which I update ~1 week. >=20 > > I've also sometimes found shepherd to crash when I add a service where > > the start command is "wrong", as though the error were so bad that > > shepherd says "Nope! That's it! I quit!" >=20 > I'm doing very standard things with `make-forkexec-constructor`, so I > wouldn't expect any problems there. >=20 > Your comment is kind of scary though! Shepherd is the thing I want to > stay up no matter what since it's responsible for monitoring and > restarting things. The idea that a misbehaving or poorly written service > could bring down the entire Shepherd process is a problem! Is there no > isolation? I have a whole collection of attempts to integrate mcron with shepherd, to create loops and add jobs only when the service is active. Attempting to fork off and then collect the child process and then fail just enough to make the service restart. Lots of cringe-worthy code. The more common fail scenarios I see are shepherd fails to start because it doesn't like my start code of one of the services or actually starting the service somehow kills it. All of those were with straight lambdas to the start command though. Do you have your services writing out any logs? Maybe there's a clue there. > > I'd suggest looking at .config/shepherd/shepherd.log but it's rather > > sparse. Still, it might have something useful. >=20 > Yes, this is the first place I looked, but unfortunately there wasn't > much usable informatino. >=20 > --=20 > Katherine --=20 Efraim Flashner =D7=90=D7=A4=D7=A8=D7=99=D7=9D = =D7=A4=D7=9C=D7=A9=D7=A0=D7=A8 GPG key =3D A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted --vA66WO2vHvL/CRSR Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl7GinYACgkQQarn3Mo9 g1E7Zw/+L2CA+Yy0ewq6WgTq+CmcVRUju6X9PvR8Od1Q6QxWKl4p0xdcJjI21OKt uSz0OmoM+cZRud7EZtXpbRds1k4ar6ZmM9pJv5WUBZaF11kISrxjJnncbEsHCy0U NwIEp4OSiZRubBiBzST7Wb9fr3XZSK4rvuSqmr+9OPKkj6ekZcIa51PG7h2wODyn 2gjqYdfXoKXxCB7RECRPw7v92k6QGuqnSAXlwi9fNg1ZojFECelaL0b4liqb23wG AbJ4HmatBagLo5TezO9g6KdhxS3VfEvqsuN1h6JwHXCYoAJsfN3HN9R8KovNf/Wi mL+WIxq0FpRX5rexV7GkZaC86ABGspxmrbPnnqPktCqjwwMHPo4iFeHIIzx1w9VM PPzAg3Da2TilkR5z0h4Td+nKNvCjSQ6C6WZhlxaG/uOSPYSSBApbYWgRg81x+xq6 m11UroqNSQ34PekPhl7u1Bowillyd1OvK1tIi8as7i6DEEFzsjRafP+cfZa0apkt 9LGPYXjL/me4y9ZWhXnF3gYA32lCKdyCIphLr2iJvppA44wubMsG7piE1HPkBzTm GxkdHgS1qrVW8ucYgw9KVLiB3DxwY5a3RP4jBjPx/GBNZ+bq5DKG4c58CuZZmIF2 Z4hCU448pYKS6mFGgCUCdalPeFF6u219tTiB7XprcXKhmuhEKFA= =GPjE -----END PGP SIGNATURE----- --vA66WO2vHvL/CRSR--