From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 0OSsNMswL2Pf5QAAbAwnHQ (envelope-from ) for ; Sat, 24 Sep 2022 18:31:07 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id AH+SNMswL2NoOQAAauVa8A (envelope-from ) for ; Sat, 24 Sep 2022 18:31:07 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6F66FA20D for ; Sat, 24 Sep 2022 18:31:07 +0200 (CEST) Received: from localhost ([::1]:54976 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oc83W-0006Rp-Co for larch@yhetil.org; Sat, 24 Sep 2022 12:31:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58202) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oc83T-0006Rg-MT for bug-guix@gnu.org; Sat, 24 Sep 2022 12:31:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:46034) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oc83T-0002yK-E8 for bug-guix@gnu.org; Sat, 24 Sep 2022 12:31:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oc83S-0005o1-34 for bug-guix@gnu.org; Sat, 24 Sep 2022 12:31:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#57922: Shepherd doesn't seem to correctly handle waitpid itself Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sat, 24 Sep 2022 16:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57922 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Josselin Poiret Cc: 57922@debbugs.gnu.org, Maxim Cournoyer Received: via spool by 57922-submit@debbugs.gnu.org id=B57922.166403702017647 (code B ref 57922); Sat, 24 Sep 2022 16:31:02 +0000 Received: (at 57922) by debbugs.gnu.org; 24 Sep 2022 16:30:20 +0000 Received: from localhost ([127.0.0.1]:45112 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oc82m-0004ZQ-Cm for submit@debbugs.gnu.org; Sat, 24 Sep 2022 12:30:20 -0400 Received: from eggs.gnu.org ([209.51.188.92]:51750) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oc82l-0004EQ-1p for 57922@debbugs.gnu.org; Sat, 24 Sep 2022 12:30:19 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:42760) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oc82f-0002vl-O1; Sat, 24 Sep 2022 12:30:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=got81TcrqCy98JpJ+p0OHfPAiY4xP2grH0geQAy4Xz4=; b=PQ5IgUkyodJVm2GBRwTn RXQTMHT/vhZp9nr8zTEb7F8PAhJTJVPGJ4xmeQeYmu7rEjJf4Cn/Gm8Ax3hi2fonL81rYRyA7QJsZ BbrwAlonOetQvGFn6v9TKxfEj4mQol4b7CRgcCREA1BARdyUXZoV7At7ZMrjYaC2Jaw1Nw8RR79t2 hCuuOOQkzH9CMJ553rWZhoON/n2tOfzPwqDAfmfzYobV8k38t1Mqo86jJ7W8lm5Oflr9qGb3tmnJe SIIRJFtZ9Am/Wn0l7VJ/JQRRf7LHyGR6j58Dg+y/7ju9b0lKPO000wz+bVX4eZ6BGKDFXMctrEFMH A5pTvKvjxxlvQQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:63645 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oc82c-0001gd-2G; Sat, 24 Sep 2022 12:30:13 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <874jx4q953.fsf@gmail.com> <87o7va33iq.fsf@jpoiret.xyz> <87bkr6fvlz.fsf@gnu.org> <878rm98n17.fsf@gmail.com> <87sfkh8a8z.fsf@jpoiret.xyz> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Tridi 3 =?UTF-8?Q?Vend=C3=A9miaire?= an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour de la =?UTF-8?Q?Ch=C3=A2taigne?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sat, 24 Sep 2022 18:30:07 +0200 In-Reply-To: <87sfkh8a8z.fsf@jpoiret.xyz> (Josselin Poiret's message of "Sat, 24 Sep 2022 10:09:00 +0200") Message-ID: <87zgeo68hc.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1664037067; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=got81TcrqCy98JpJ+p0OHfPAiY4xP2grH0geQAy4Xz4=; b=FpBWtNObz4tJo14Ubj0GhcskiUGXAcO0+fk0wHkrSQOkM8mUDJaAm0/aZxrOXIy00LfIcr 9OMXM7EU2iM2RonbKXYGBo+LDEBa35NS4S49njhKGDBDdrknjxExXsDZ2GZEorreWp6YRg W876vc4CUlpAxt0z705JW4tS1E2RfBMV7BvejXoqkPsiR4YOddon/+8yyKeMgidX64ExTB p9P5eNV5wmwPniqBYBSgeFleCQVrMUU+RPayXRquccVb3WbN+58K5ALY9pTctCGr0AGFXa 5zGNrtaOxdLkTwei78khXswIYjfOcJHEJveeSgg+zctOTSpBQ5wKRP2yAQtrXw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1664037067; a=rsa-sha256; cv=none; b=p8IqCJqVhpVce9ifKNwOOB0Uu1f9g94uNe0tWT39yyiw1swiCrN40Gw3ECxtEiN5rFi2S9 Cz9QNhWyFup5bk/kw6oMuMBD+rswLLhVs+UM87CksjC8+lFtX4tN2fgMaekCMQNwvcx7o0 xi5XldnMGARoOH7eNroiKD86tyRANmR3ikm9/2Btl9w0kUfhF1ML7seLDh1Y+v7zQIiEcF FWRtp9hkZs7nOhBeIWrCP7uNupYrdLLCcftEaibtQkrklQdoiFH2o1VKP2H2hC2PDhw9Xk mJ04UAKvOaBjZyqDgpEPMG07m9CFUuW7ZyGyF1idcr49HJKzaNg/CqEhXauHMQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=PQ5IgUky; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.06 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gnu.org header.s=fencepost-gnu-org header.b=PQ5IgUky; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 6F66FA20D X-Spam-Score: -3.06 X-Migadu-Scanner: scn0.migadu.com X-TUID: Jt8qKgk+bfY3 Hi, Josselin Poiret skribis: > Maxim Cournoyer writes: > >> This leads me to believe that Shepherd does not block until the process >> is actually dead to mark the process as stopped (it just waitpid on the >> group pid with WNOHANG), which means it won't block if the child process >> hasn't exited yet, if I'm correct. Correct: the service is marked as stopped as soon as =E2=80=98stop=E2=80=99= returns. >> When we are in the stop slot, we know for sure that the process should >> terminate completely, hence it'd make sense to call 'waitpid' *without* >> WNOHANG there, to avoid 'herd restart' from starting the service while >> its stopped process is not done terminating. >> >> jamid can take quite some time to terminate cleanly because of the >> networking threads in the opendht library that needs to be finalized, >> which is probably the reason this problem can be observed here. >> >> Thoughts? > > I agree with you, make-kill-destructor should waitpid the processes it's > killing. There shouldn't be any issues waitpid'ing before the > shepherd's signal handler, since stop actions are run with asyncs > disabled. The signal handler will run once but won't get anything > because all the processes were already waitpid'd and it uses WNOHANG. I think we need an extra =E2=80=9Cstopping=E2=80=9D state for services. In= general, we=E2=80=99ll want to send SIGTERM, wait for some grace period or dead proc= ess notification, then send SIGKILL, and finally change state to =E2=80=9Cstopp= ed=E2=80=9D. This is not possible in 0.9 but is something I=E2=80=99d like to have in 0.= 10=C2=B9. Ludo=E2=80=99. =C2=B9 https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00350.html