From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id OwETC2TYt17NZQAA0tVLHw (envelope-from ) for ; Sun, 10 May 2020 10:33:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id Lx4IKHHYt17fXwAA1q6Kng (envelope-from ) for ; Sun, 10 May 2020 10:33:21 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 680CB940C62 for ; Sun, 10 May 2020 10:33:09 +0000 (UTC) Received: from localhost ([::1]:45460 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jXjGf-0007lv-5v for larch@yhetil.org; Sun, 10 May 2020 06:33:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50848) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jXjGY-0007kW-1Y for bug-guix@gnu.org; Sun, 10 May 2020 06:33:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37537) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jXjGX-00021N-Of for bug-guix@gnu.org; Sun, 10 May 2020 06:33:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jXjGX-0006Dx-MB for bug-guix@gnu.org; Sun, 10 May 2020 06:33:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#40981: Graphical installer tests sometimes hang. Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sun, 10 May 2020 10:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40981 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mathieu Othacehe Received: via spool by 40981-submit@debbugs.gnu.org id=B40981.158910673423868 (code B ref 40981); Sun, 10 May 2020 10:33:01 +0000 Received: (at 40981) by debbugs.gnu.org; 10 May 2020 10:32:14 +0000 Received: from localhost ([127.0.0.1]:49083 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXjFm-0006Cu-Hh for submit@debbugs.gnu.org; Sun, 10 May 2020 06:32:14 -0400 Received: from eggs.gnu.org ([209.51.188.92]:47762) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXjFk-0006Cg-RE for 40981@debbugs.gnu.org; Sun, 10 May 2020 06:32:13 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:40845) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jXjFf-00018U-I5; Sun, 10 May 2020 06:32:07 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=42888 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jXjFd-0006fF-OK; Sun, 10 May 2020 06:32:05 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87o8r9w5re.fsf@gmail.com> <87a72ox6ws.fsf@gmail.com> <87zhanx3rw.fsf@gmail.com> <87a72mn1l2.fsf@gnu.org> <875zdael5s.fsf@gmail.com> <87v9l9bcu0.fsf@gmail.com> <87mu6jelmx.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 22 =?UTF-8?Q?Flor=C3=A9al?= an 228 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sun, 10 May 2020 12:32:00 +0200 In-Reply-To: <87mu6jelmx.fsf@gmail.com> (Mathieu Othacehe's message of "Thu, 07 May 2020 18:48:54 +0200") Message-ID: <87tv0o9j33.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 40981@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 X-Spam-Score: -1.01 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Scan-Result: default: False [-1.01 / 13.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; GENERIC_REPUTATION(0.00)[-0.53890152085558]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.51.188.0/24:c]; IP_REPUTATION_HAM(0.00)[asn: 22989(0.08), country: US(-0.00), ip: 209.51.188.17(-0.54)]; DWL_DNSWL_FAIL(0.00)[209.51.188.17:server fail]; MX_GOOD(-0.50)[cached: eggs.gnu.org]; RCPT_COUNT_TWO(0.00)[2]; MAILLIST(-0.20)[mailman]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_IN_DNSWL_FAIL(0.00)[209.51.188.17:server fail]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:22989, ipnet:209.51.188.0/24, country:US]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_FROM(0.00)[larch=yhetil.org]; ARC_NA(0.00)[]; FORGED_RECIPIENTS_MAILLIST(0.00)[]; FROM_NEQ_ENVFROM(0.00)[ludo@gnu.org,bug-guix-bounces@gnu.org]; FROM_HAS_DN(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[gnu.org]; HAS_LIST_UNSUB(-0.01)[]; MIME_TRACE(0.00)[0:+]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.51.188.17:from]; RCVD_COUNT_SEVEN(0.00)[9]; FORGED_SENDER_MAILLIST(0.00)[] X-TUID: QpcwdpOLOI70 Hi! Mathieu Othacehe skribis: > The previous patch was not working. The reason is that, when a process > is forked and before execv is called, it shares the parent signal > handler. > > So when sending a SIGTERM to the child process, we are stopping > root-service, if the signal is received before the child has forked. Woow, good catch! > The work-around here is to save the installed SIGTERM handler and reset > it. Then, after forking, the parent can restore the SIGTERM handler. The > child will use the default SIGTERM handler that terminates the process. OK, makes sense. (Another occasion to see how something like =E2=80=98posix_spawn=E2=80=99 would be more appropriate than fork + exec=E2= =80=A6) > From aa6f67068f1fdd622673ec0223f05fd8f8a96baa Mon Sep 17 00:00:00 2001 > From: Mathieu Othacehe > Date: Thu, 7 May 2020 18:39:41 +0200 > Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero. > > When a process is forked, and before its GID is changed in "exec-command", > it will share the parent GID, which is 0 for Shepherd. In that case, use > the PID instead of the PGID. > > Also make sure to reset the SIGTERM handler before forking a process. Fai= ling > to do so, will result in stopping Shepherd if a SIGTERM is received betwe= en > fork and execv calls. Restore the SIGTERM handler once the process has be= en > forked. > > * modules/shepherd/service.scm (fork+exec-command): Save the current SIGT= ERM > handler and reset it before forking. Then, restore it on the parent after > forking. > (make-kill-destructor): Handle the case when PGID is zero, between the pr= ocess > fork and exec. [...] > + ;; Kill the whole process group PID belongs to. Don't assume that P= ID is > + ;; a process group ID: that's not the case when using #:pid-file, wh= ere > + ;; the process group ID is the PID of the process that "daemonized".= If > + ;; this procedure is called, between the process fork and exec, the = PGID > + ;; will still be zero (the Shepherd PGID). In that case, use the PID. > + (let ((current-pgid (getpgid 0)) > + (pgid (getpgid pid))) > + (if (eq? pgid current-pgid) > + (begin > + (kill pid signal)) > + (begin > + (kill (- pgid) signal)))) Shouldn=E2=80=99t it be: (let ((pgid (getpgid pid))) (if (=3D (getpgid 0) pgid) (kill pid signal) ;don't kill ourself (kill (-p pgid) signal))) ? Note: Use the most specific comparison procedure, =E2=80=98=3D=E2=80=99 in = this case, because we know we=E2=80=99re dealing with numbers (it enables proper type checking, better compiler optimizations, etc.). More importantly, =E2=80= =98eq?=E2=80=99 performs pointer comparison, so it shouldn=E2=80=99t be used with numbers (= in practice it works with =E2=80=9Cfixnums=E2=80=9D but not with bignums). > +# Try to trigger eventual race conditions, when killing a process betwee= n fork > +# and execv calls. > +for i in {1..50} > +do > + $herd restart test3 > +done Would it reproduce the problem well enough? Use `seq 1 50` to avoid relying on a Bash-specific construct (which I think it is, right?). Could you send an updated patch? Thanks for the bug hunting and for the patch! Ludo=E2=80=99.