From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id OJXMEFDwJ2PobgAAbAwnHQ (envelope-from ) for ; Mon, 19 Sep 2022 06:30:08 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id OL7aEFDwJ2PJngAA9RJhRA (envelope-from ) for ; Mon, 19 Sep 2022 06:30:08 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C5FE53971B for ; Mon, 19 Sep 2022 06:30:07 +0200 (CEST) Received: from localhost ([::1]:34642 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oa8Q3-0006bq-0u for larch@yhetil.org; Mon, 19 Sep 2022 00:30:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51092) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oa8Q0-0006be-0b for bug-guix@gnu.org; Mon, 19 Sep 2022 00:30:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:52701) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oa8Pz-00075S-OZ for bug-guix@gnu.org; Mon, 19 Sep 2022 00:30:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oa8Pz-00084k-C8 for bug-guix@gnu.org; Mon, 19 Sep 2022 00:30:03 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#57922: Shepherd doesn't seem to correctly handle waitpid itself Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 19 Sep 2022 04:30:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 57922 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 57922@debbugs.gnu.org X-Debbugs-Original-To: bug-guix Received: via spool by submit@debbugs.gnu.org id=B.166356179230993 (code B ref -1); Mon, 19 Sep 2022 04:30:03 +0000 Received: (at submit) by debbugs.gnu.org; 19 Sep 2022 04:29:52 +0000 Received: from localhost ([127.0.0.1]:51779 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oa8Po-00083p-4j for submit@debbugs.gnu.org; Mon, 19 Sep 2022 00:29:52 -0400 Received: from lists.gnu.org ([209.51.188.17]:45858) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oa8Pl-00083h-FB for submit@debbugs.gnu.org; Mon, 19 Sep 2022 00:29:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45478) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oa8Pl-0006b5-9x for bug-guix@gnu.org; Mon, 19 Sep 2022 00:29:49 -0400 Received: from mail-qv1-xf30.google.com ([2607:f8b0:4864:20::f30]:41614) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oa8Pj-0006u9-PM for bug-guix@gnu.org; Mon, 19 Sep 2022 00:29:49 -0400 Received: by mail-qv1-xf30.google.com with SMTP id l14so12634845qvq.8 for ; Sun, 18 Sep 2022 21:29:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:message-id:date:subject:to:from:from:to:cc:subject :date; bh=A6w1ujOl8PyJXJquYATugAQsz2k2jgChb30YmyLuPco=; b=lBRIrgfzrC9RSjYFs8o9OsLRrn9m+h3tMa0pUtNNP42bopBCXFfZJeUcNAk0e2i/JT MaXk5Yn3/JfWFXiziKEhHbDedtikdOWfxapQeXPsYFSUQrPh22S3GGwY4TrDTvbN+Q92 QPc6rTkaVQg+2wjzkYpIXom87rAUwQVyd04mAjdCNLa7ectiAY7UW45lqIFxBIsYFs+g fQHHEPgREUHrWktO9jR2hoyX5ok4kZHXYLztpZv7RrGt4BdytF8uxr76T1QUGAavSb3c Qiqu6Gxjm7gdyrgq8ccpDaB/oEwWZgwUeky/Q3J9MFoFb4ivHM3DvaXV3cQLYaA4eI43 5TUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date; bh=A6w1ujOl8PyJXJquYATugAQsz2k2jgChb30YmyLuPco=; b=5nTHqU9Q8fnU4UXvqb4OtgR/VdY8TUhTamLEXwKpo2X0WjJok/feZOZY+/d9tcEUPr 6TQiwL8/EVVYJNd0UteuIzolK+Oj45oEt+FVvolxMLUfkSDi0b4ogHZKlpz3Zpz+vvdh fPd7K5HCb726ELBzHFmMWBUnQVEBk/TUhSFVeSms3exCFPIhPtZit5wDFjvsV4GDbmpu iRpHpP+wTDn3iRUrtdEYUBJlCpWcYPgtcU/W7yTU4/8Tg+37qd7w4fdFEq5vysUbuIzf 3npWe5yjN+H8MWACjOtkBYPc+2TmB9WeqFDHxojY3pUcOVlRehWAP13nY9RcsJcuZhmJ Uq9g== X-Gm-Message-State: ACrzQf3fkTGA6ZVNC5hmAaK6Iyfz+xlzfwCvGjlUrQY5m8AIErbTFzsf taVpwnmKaciZ3GZtvwTWWmSs8uypIGo= X-Google-Smtp-Source: AMsMyM5m0SNaiKUywp0fj/sxPxq2OCYh+S4b8Cc8tCfPIKwJID+teukhNVLCm/BHYVqLMquWlmWO6g== X-Received: by 2002:a05:6214:2aa4:b0:4ac:8848:b251 with SMTP id js4-20020a0562142aa400b004ac8848b251mr12932469qvb.55.1663561785980; Sun, 18 Sep 2022 21:29:45 -0700 (PDT) Received: from hurd (dsl-148-8.b2b2c.ca. [66.158.148.8]) by smtp.gmail.com with ESMTPSA id m5-20020a05620a24c500b006bb366779a4sm12943248qkn.6.2022.09.18.21.29.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Sep 2022 21:29:45 -0700 (PDT) From: Maxim Cournoyer Date: Mon, 19 Sep 2022 00:29:44 -0400 Message-ID: <874jx4q953.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::f30; envelope-from=maxim.cournoyer@gmail.com; helo=mail-qv1-xf30.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1663561807; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:list-id:list-help:list-unsubscribe:list-subscribe: list-post:dkim-signature; bh=A6w1ujOl8PyJXJquYATugAQsz2k2jgChb30YmyLuPco=; b=YWXGdRpkx/dh//t2ROyk8KzhkBphdJshdHLYe6QHgwEgTEc3QYlAbPgG7VXO8CligfkTPX HiNFACobWxS8bhnrUmHsXvps5GsNaqSNZ4ViyRsykrF7BrRvom2mtRM+xlKa7FAqvEDzxT dvNntM8s2xtpuMNbwTeTY/VgwQ/LZrj5z4r9GX6V2nQFJnhA8U9iZZ7oafr5j+rWaiTb5S qqgjZZPln49WjdE3kM5YzT/oywEQSuD3I0CqLgvQMQz2vvd11lpIek99MOJz03E0gQbG/W ciYY8K6jBPwORiMdLZ9H/qcnULkURzKuqc3O3D23bPOMrCXZm4UPofwu9cmNOA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1663561807; a=rsa-sha256; cv=none; b=RdZpdSi5eIwS9OC4G604Elc/L2KdNWDk53AuC0u2TAu9nX2bHSMP1Zr72yVmZu/UWzl0vH hHpNmuJkfdRbJtZ3Hf4eYehWykE7Ku8ISS4QI5B0liGKALljz2ZfK6OmgzSlJKTX011JZP mTZfaF6U/HpPbh0S2ShnuElhmrBEQ02CfJb/BKshBFnUmDVj0+vOr/PsgZUoJ7RMGogkSo 0r6tCmblyVqCEjf2SH+SGFKifiYZfy1AktSPVcaH2Vu3kIvgkavzW8Qc4gYtyjtR+PugfF iGdYC2SeTQQxXDB4vcCczSEUBv/J7Q0+ZHrSoCSV+rR6AO6zetvKuEKB4RBOVQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=lBRIrgfz; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 5.96 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=lBRIrgfz; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: C5FE53971B X-Spam-Score: 5.96 X-Migadu-Scanner: scn1.migadu.com X-TUID: hAOz9yk4Qk+H Hi, I've tried to determine why a workaround in the jami-service-type is required in the 'stop' slot to avoid failures in 'herd restart jami', and haven't quite found the culprit, but it appears to me that: 1. waipid is only called in one place in Shepherd, which is in the handle-SIGCHLD procedure in (shepherd service), which does not specifically wait for an exact PID but rather does: (waitpid* WAIT_ANY WNOHANG), which is waitpid with some special handling in the case a system-error exception is thrown with an ECHILD or EINTR error number. This doesn't strike me as a strong guarantee that waitpid occurs when stop is called, because: 1. It requires to be installed in the signal handlers for each processes, with something like: --8<---------------cut here---------------start------------->8--- (unless %sigchld-handler-installed? (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP) (set! %sigchld-handler-installed? #t)) --8<---------------cut here---------------end--------------->8--- Done for fork+exec-command and make-inetd-forkexec-constructor, but not for make-forkexec-constructor/container, AFAICT; 2. it has the WNOHANG flag, which means the stop simply does a kill the the signal handling weakly (because of WNOHANG) waits on it, which means the start may begin before the process was actually completely terminated. Here's a small reproducer to apply on our code base: --8<---------------cut here---------------start------------->8--- modified gnu/services/telephony.scm @@ -685,13 +685,7 @@ (define (archive-name->username archive) ;; Finally, return the PID of the daemon process. daemon-pid)) - (stop - #~(lambda (pid . args) - (kill pid SIGKILL) - ;; Wait for the process to exit; this prevents overlapping - ;; processes when issuing 'herd restart'. - (waitpid pid) - #f)))))))) + (stop #~(make-kill-destructor)))))))) (define jami-service-type (service-type --8<---------------cut here---------------end--------------->8--- Then run 'make check-system TESTS=jami-provisioning' to see new failures, or if you want to investigate manually the system: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix system vm --no-grafts --no-offload --no-graphic \ -e '(@@ (gnu tests telephony) %jami-os-provisioning)' $ /gnu/store/rxi7c14hga62qslb0sr6nac9qnkxr0nn-run-vm.sh -m 1G -smp 4 \ -nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22 # Connect to the QEMU VM: $ ssh root@localhost -p10022 root@jami ~# herd restart jami Service jami has been stopped. herd: exception caught while executing 'start' on service 'jami': dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying") root@jami ~# herd status jami Status of jami: It is stopped. It is enabled. Provides (jami). Requires (jami-dbus-session). Conflicts with (). Will be respawned. root@jami ~# pgrep jami --8<---------------cut here---------------end--------------->8--- Thanks, Maxim