From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id WNYLNLV6LmPr5AAAbAwnHQ (envelope-from ) for ; Sat, 24 Sep 2022 05:34:13 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id WB/rM7V6LmNdNgEAauVa8A (envelope-from ) for ; Sat, 24 Sep 2022 05:34:13 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 601CF2618E for ; Sat, 24 Sep 2022 05:34:13 +0200 (CEST) Received: from localhost ([::1]:57566 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1obvvf-00063Z-Lv for larch@yhetil.org; Fri, 23 Sep 2022 23:34:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48388) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1obvvZ-00062m-Ri for bug-guix@gnu.org; Fri, 23 Sep 2022 23:34:08 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:42952) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1obvvW-0002WR-GC for bug-guix@gnu.org; Fri, 23 Sep 2022 23:34:05 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1obvvW-0000T2-3c for bug-guix@gnu.org; Fri, 23 Sep 2022 23:34:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#57922: Shepherd doesn't seem to correctly handle waitpid itself Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sat, 24 Sep 2022 03:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57922 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: Josselin Poiret , 57922@debbugs.gnu.org Received: via spool by 57922-submit@debbugs.gnu.org id=B57922.16639903821729 (code B ref 57922); Sat, 24 Sep 2022 03:34:02 +0000 Received: (at 57922) by debbugs.gnu.org; 24 Sep 2022 03:33:02 +0000 Received: from localhost ([127.0.0.1]:42030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1obvuX-0000Rb-MF for submit@debbugs.gnu.org; Fri, 23 Sep 2022 23:33:02 -0400 Received: from mail-qk1-f170.google.com ([209.85.222.170]:41618) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1obvuV-0000RK-Qt for 57922@debbugs.gnu.org; Fri, 23 Sep 2022 23:33:00 -0400 Received: by mail-qk1-f170.google.com with SMTP id k12so1236856qkj.8 for <57922@debbugs.gnu.org>; Fri, 23 Sep 2022 20:32:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:from:to:cc:subject:date; bh=LjNAuB0RhYiOD+fxOVpRlEmQOnkKmFfbEzpFDVb+MWI=; b=GvkKm+pT500R4BrK5/xw/zjXmkByhwGpWGcG6zUy0wQEJUnc7DCApw7BSpO1OFwIyR Eu9PA2zSwkgxdnlQ72qpqlkXu6RHAQV96X0/Ddq5m4ZJBwuqO/Df+W7cFN/YRIRM/IKJ 42pHHQayR4swFZox+WVg2+YiUQwYuUPRR3pTEfFRQ5NziV7ZxyKBQGN8VEpnNpWGtDxN 1FRxRxLAr6Kd4YOrJ2xevUjEdiO6plXCEDad77Uqhi1SOzrTAt+ZM2AYxG2ebntSxQxu 0xswavQS7VmmKaiEJQXEBlXQsO+ypDB+OYKdAJ8Xfcr7vTOMQ6uiV7jbZhXEF8nkWaUy VFdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date; bh=LjNAuB0RhYiOD+fxOVpRlEmQOnkKmFfbEzpFDVb+MWI=; b=faFHxYX95Q7y6pa3/6KSOUs/ZoqGvDzkh97nHoEqpEYJZm+jQlEqZ8thaq5unlw5kS y01yzDzIR/olShUpXbyp7f0WLdJv+gX3FyguC1iqFFft4jEW0+r1p25/UUiT5WYcWRu9 qMnPrDUOLLXhaV6oRlNbsfZT8GfxkF7BewVoWrdjOJpU7MWMvTDIHB45fIbyVGc9UEaJ CZKHaqMVbWtKTyH5HotGf8wPxjt9fzVr8l4AxdvaUmUf8Fo5jtIkwQOkjoEzmUdy0Kka wnLD/uPYkghRpVDNVqSFC96ZscK7JUbKTj6AYdJXbpo4+JdWFWJOEIFvFAXQfCb8duiR aOEw== X-Gm-Message-State: ACrzQf0v3bLWB0U66gUfSN5Hhtaac9qaaX9n4Z4KMr+xe1omDXvj+J7i hTMzksSHcTebemI4yx5YTVPOdJZD46o= X-Google-Smtp-Source: AMsMyM5Ym3Sy+McaVWduALVsD+jrBYZPrSMhMehb8pztlJQ2C1744S3H92eMHUv/zuzBoacqSJNQsg== X-Received: by 2002:a05:620a:40c1:b0:6ce:a11a:7279 with SMTP id g1-20020a05620a40c100b006cea11a7279mr7888764qko.703.1663990373980; Fri, 23 Sep 2022 20:32:53 -0700 (PDT) Received: from hurd (dsl-10-130-64.b2b2c.ca. [72.10.130.64]) by smtp.gmail.com with ESMTPSA id v11-20020a05622a014b00b0035cf0f50d7csm7483131qtw.52.2022.09.23.20.32.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 20:32:53 -0700 (PDT) From: Maxim Cournoyer References: <874jx4q953.fsf@gmail.com> <87o7va33iq.fsf@jpoiret.xyz> <87bkr6fvlz.fsf@gnu.org> Date: Fri, 23 Sep 2022 23:32:52 -0400 In-Reply-To: <87bkr6fvlz.fsf@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Fri, 23 Sep 2022 08:33:28 +0200") Message-ID: <878rm98n17.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1663990453; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=LjNAuB0RhYiOD+fxOVpRlEmQOnkKmFfbEzpFDVb+MWI=; b=gslyQuRPKYNa8BJxKTwfcrBWb9jJAI88fj6mUD95AchFD9Ne/UwrqOi3tFZntEjllQB7lk IJ3jLuGa+wYR22NVVLM/6qEOT+IQ8cg626z2gAKryXgtSEEZvkSfmWf06EtMn1EJfvH/DT MNgXh2bx8cd5yIsjJfDdIZp/xQoHiAF4z/U30gs6q+o1a7T3wGe1EbAvbXqdGn5plHLZJP gzOmCJfVndGesa1CmJaMgJOp4s2+hlMy1ffRa4jKEGgI0/UIx9bG7nkLmTO32Ncjl7AYbL iO3i5OoOjc5uYF09z66kaH6ez6aLL1DAuYQ0Czne1GHewMU0UPvZ489pGvdOhQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1663990453; a=rsa-sha256; cv=none; b=AjtW1jdPYV2Yvbc6Ufmi2GXnBMahD6D9aZ5Gp9rg8KAPlhTFyRjpcE0gFkxBAvm5J7n/nR 5ote+sM4FdQavFrPx918F8nJAjNkDMttPAOzYQMB6oqT+xj6mswmFNt1FmfwG6xkFks2tK UtHwnDz/jt8z+WIZK7wNWCrnO9sLRoATNBm7m8cjw+5QFszShfhQngTo+ukHw7rL8Pf9l0 7H6tmBRGb/UnNHuQ9jfixR4n7jTT+VAgH9PddwxPt10nLcjUqfKKfrKLqCib1hBQw7f1Uy o5NhIMhXvI1FoY8v5hCLvNibjcR4RUGbBHWTW9+RIgS7eRUiSrQ27lbSVGznqw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=GvkKm+pT; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 7.65 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=GvkKm+pT; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 601CF2618E X-Spam-Score: 7.65 X-Migadu-Scanner: scn1.migadu.com X-TUID: RS6I7Iuej9ob reopen 57922 tags 57922 -notabug thanks Hi again, [...] >>> Here's a small reproducer to apply on our code base: >>> >>> --8<---------------cut here---------------start------------->8--- >>> modified gnu/services/telephony.scm >>> @@ -685,13 +685,7 @@ (define (archive-name->username archive) >>> >>> ;; Finally, return the PID of the daemon process. >>> daemon-pid)) >>> - (stop >>> - #~(lambda (pid . args) >>> - (kill pid SIGKILL) >>> - ;; Wait for the process to exit; this prevents overlapping >>> - ;; processes when issuing 'herd restart'. >>> - (waitpid pid) >>> - #f)))))))) >>> + (stop #~(make-kill-destructor)))))))) > > I think the main difference between these two is that the first one uses > SIGKILL while the second one uses SIGTERM. > > You could try #~(make-kill-destructor SIGKILL) to get the same effect. > You are right, the important difference was SIGTERM vs SIGKILL. I > thought I had tried that. The problem only shows itself in the > 'jami-provisioning' system test, not the 'jami' one. > Marking this one as notabug and closing. I think I spoke too soon. SIGKILL does fix the problem when *not* using waitpid explicitly, but when using waitpid explicitly, SIGTERM can be used just fine. In other words, this works: --8<---------------cut here---------------start------------->8--- @@ -687,7 +687,7 @@ (define (archive-name->username archive) daemon-pid)) (stop #~(lambda (pid . args) - (kill pid SIGKILL) + (kill pid SIGTERM) ;; Wait for the process to exit; this prevents overlapping ;; processes when issuing 'herd restart'. (waitpid pid) --8<---------------cut here---------------end--------------->8--- but this doesn't: --8<---------------cut here---------------start------------->8--- @@ -685,13 +685,7 @@ (define (archive-name->username archive) ;; Finally, return the PID of the daemon process. daemon-pid)) - (stop - #~(lambda (pid . args) - (kill pid SIGKILL) - ;; Wait for the process to exit; this prevents overlapping - ;; processes when issuing 'herd restart'. - (waitpid pid) - #f)))))))) + (stop #~(make-kill-destructor)))))))) (define jami-service-type --8<---------------cut here---------------end--------------->8--- when exercised with 'make check-system TESTS=jami-provisioning': --8<---------------cut here---------------start------------->8--- This is the GNU system. Welcome. jami login: Jami Daemon 13.4.0, by Savoir-faire Linux 2004-2019 https://jami.net/ [Video support enabled] [Plugins support enabled] 23:29:05.375 os_core_unix.c !pjlib 2.12.1 for POSIX initialized shepherd: Service jami has been stopped. Caught signal Terminated, terminating... Some deprecated features have been used. Set the environment variable GUILE_WARN_DEPRECATED to "detailed" and rerun the program to get more information. Set it to "no" to suppress this message. Jami Daemon 13.4.0, by Savoir-faire Linux 2004-2019 https://jami.net/ [Video support enabled] [Plugins support enabled] One does not simply initialize the client: Another daemon is detected /gnu/store/2vcv1fyqfyym2zcyf5bvbj1pcgbcc515-shepherd-marionette.scm:1:1718: ERROR: 1. &action-exception-error: service: jami action: start key: misc-error args: (#f "~A ~S ~S ~S" (dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying")) #f) --8<---------------cut here---------------end--------------->8--- or manually through the test VM: --8<---------------cut here---------------start------------->8--- $(./pre-inst-env guix system vm --no-graphic --no-grafts --no-offload \ -e '(@@ (gnu tests telephony) %jami-os-provisioning)') \ -m 1G -smp $(nproc) "-nic" user,model=virtio-net-pci,hostfwd=tcp::10022-:22 --8<---------------cut here---------------end--------------->8--- This leads me to believe that Shepherd does not block until the process is actually dead to mark the process as stopped (it just waitpid on the group pid with WNOHANG), which means it won't block if the child process hasn't exited yet, if I'm correct. When we are in the stop slot, we know for sure that the process should terminate completely, hence it'd make sense to call 'waitpid' *without* WNOHANG there, to avoid 'herd restart' from starting the service while its stopped process is not done terminating. jamid can take quite some time to terminate cleanly because of the networking threads in the opendht library that needs to be finalized, which is probably the reason this problem can be observed here. Thoughts? Maxim