From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id WHVoAfcCOV9LLwAA0tVLHw (envelope-from ) for ; Sun, 16 Aug 2020 09:57:11 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 2EenOPYCOV9UPgAAbx9fmQ (envelope-from ) for ; Sun, 16 Aug 2020 09:57:10 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 79FD9940876 for ; Sun, 16 Aug 2020 09:57:10 +0000 (UTC) Received: from localhost ([::1]:46412 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k7FPY-0002uL-PR for larch@yhetil.org; Sun, 16 Aug 2020 05:57:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37462) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k7FPS-0002uD-BW for bug-guix@gnu.org; Sun, 16 Aug 2020 05:57:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:45336) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k7FPS-0002Dw-2t for bug-guix@gnu.org; Sun, 16 Aug 2020 05:57:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k7FPS-0007FG-1h for bug-guix@gnu.org; Sun, 16 Aug 2020 05:57:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#41948: Shepherd deadlocks Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sun, 16 Aug 2020 09:57:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 41948 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 41948-submit@debbugs.gnu.org id=B41948.159757181227832 (code B ref 41948); Sun, 16 Aug 2020 09:57:01 +0000 Received: (at 41948) by debbugs.gnu.org; 16 Aug 2020 09:56:52 +0000 Received: from localhost ([127.0.0.1]:56882 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k7FPH-0007Eq-Ux for submit@debbugs.gnu.org; Sun, 16 Aug 2020 05:56:52 -0400 Received: from eggs.gnu.org ([209.51.188.92]:55560) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k7FPC-0007Ea-Jg for 41948@debbugs.gnu.org; Sun, 16 Aug 2020 05:56:50 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37520) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k7FP6-0002C9-4a; Sun, 16 Aug 2020 05:56:40 -0400 Received: from [2a01:e0a:19b:d9a0:3107:b202:556:bd51] (port=40986 helo=cervin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k7FP5-0000pf-2U; Sun, 16 Aug 2020 05:56:39 -0400 From: Mathieu Othacehe References: <87h7v75txx.fsf@gnu.org> <87a70yc9kj.fsf@gnu.org> Date: Sun, 16 Aug 2020 11:56:37 +0200 In-Reply-To: <87a70yc9kj.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sat, 20 Jun 2020 12:31:40 +0200") Message-ID: <87k0xyhq22.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 41948@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: iRIEM/rw4LaR --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hey Ludo, > We should be able to reproduce it with much simpler tests then, right? > Like maybe =E2=80=9Cwhile : ; do herd restart guix-daemon ; done=E2=80=9D= or similar? Well I tried that without success. Then I had a closer look to the strace log. Turns out there are two concurrent "finalizer" threads: --8<---------------cut here---------------start------------->8--- 1 clone(child_stack=3D0x7f17981e6fb0, flags=3DCLONE_VM|CLONE_FS|CLONE_F= ILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SET= TID|CLONE_CHILD_CLEARTID, parent_tid=3D[271], tls=3D0x7f17981e7700, child_t= idptr=3D0x7f17981e79d0) =3D 271 --8<---------------cut here---------------end--------------->8--- and this one, --8<---------------cut here---------------start------------->8--- 217 <... clone resumed>, parent_tid=3D[253], tls=3D0x7f1799309700, child_= tidptr=3D0x7f17993099d0) =3D 253 --8<---------------cut here---------------end--------------->8--- The first one is spawned from Shepherd directly. The other one is spawned from the forked process in "marionette-shepherd-service". Those two finalizer threads share the same pipe. When we try to stop the finalizer thread in Shepherd, right before forking a new process, we send a '\1' byte to the finalizer pipe. --8<---------------cut here---------------start------------->8--- 1 write(6, "\1", 1 --8<---------------cut here---------------end--------------->8--- which is received by (line 183597):=20 --8<---------------cut here---------------start------------->8--- 253 <... read resumed>"\1", 1) =3D 1 --8<---------------cut here---------------end--------------->8--- the marionette finalizer thread. Then, we pthread_join the Shepherd finalizer thread, which never stops! Quite unfortunate. Here's a small reproducer attached. So unless I'm wrong this is a Guile issue, that will cause any program that uses at least two primitive-fork calls to possibly hang. I'm quite convinced that those two bugs are directly related: * https://issues.guix.info/31925 * https://issues.guix.gnu.org/42353 Now regarding the fix of this issue, I guess that a process forked with "primitive-fork" in Guile should close it's parent finalizer pipe and open a new one. WDYT? Thanks, Mathieu --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=t.scm Content-Transfer-Encoding: base64 KHVzZS1tb2R1bGVzIChzaGVwaGVyZCBzZXJ2aWNlKQogICAgICAgICAgICAgKGljZS05IG1hdGNo KSkKCihtYXRjaCAocHJpbWl0aXZlLWZvcmspCiAgKDAKICAgKHdoaWxlICN0CiAgICAgKGdjKQog ICAgICh1c2xlZXAgMjAwMDAwKSkpCiAgKHBpZAogICAobGV0IGxvb3AgKChjb3VudCAwKSkKICAg ICAoZm9ybWF0ICN0ICJGb3JraW5nIH5hfiUiIGNvdW50KQogICAgIChmb3JrK2V4ZWMtY29tbWFu ZCAnKCIvYmluL3NoIiAiLWMiICJzbGVlcCAxIikpCiAgICAgKHVzbGVlcCAocmFuZG9tIDIwMDAw MCkpCiAgICAgKGxvb3AgKDErIGNvdW50KSkpKSkK --=-=-=--