From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 4OTxNGjjt14ZPwAA0tVLHw (envelope-from ) for ; Sun, 10 May 2020 11:20:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id ECscF3bjt141KgAAbx9fmQ (envelope-from ) for ; Sun, 10 May 2020 11:20:22 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BFC8294042C for ; Sun, 10 May 2020 11:20:09 +0000 (UTC) Received: from localhost ([::1]:37000 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jXk09-0004NR-7S for larch@yhetil.org; Sun, 10 May 2020 07:20:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55358) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jXk03-0004NI-2x for bug-guix@gnu.org; Sun, 10 May 2020 07:20:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37589) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jXk02-0005Zl-QA for bug-guix@gnu.org; Sun, 10 May 2020 07:20:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jXk02-00018G-Kr for bug-guix@gnu.org; Sun, 10 May 2020 07:20:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#40981: Graphical installer tests sometimes hang. Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sun, 10 May 2020 11:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40981 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 40981-submit@debbugs.gnu.org id=B40981.15891095684300 (code B ref 40981); Sun, 10 May 2020 11:20:02 +0000 Received: (at 40981) by debbugs.gnu.org; 10 May 2020 11:19:28 +0000 Received: from localhost ([127.0.0.1]:49135 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXjzT-00017I-Kd for submit@debbugs.gnu.org; Sun, 10 May 2020 07:19:28 -0400 Received: from eggs.gnu.org ([209.51.188.92]:52370) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXjzS-000176-Nv for 40981@debbugs.gnu.org; Sun, 10 May 2020 07:19:27 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:41142) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jXjzN-0005Xi-Hc; Sun, 10 May 2020 07:19:21 -0400 Received: from [2a01:cb18:832e:5f00:4441:1c5:ecde:ace3] (port=51388 helo=meru) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jXjzM-00012m-Vj; Sun, 10 May 2020 07:19:21 -0400 From: Mathieu Othacehe References: <87o8r9w5re.fsf@gmail.com> <87a72ox6ws.fsf@gmail.com> <87zhanx3rw.fsf@gmail.com> <87a72mn1l2.fsf@gnu.org> <875zdael5s.fsf@gmail.com> <87v9l9bcu0.fsf@gmail.com> <87mu6jelmx.fsf@gmail.com> <87tv0o9j33.fsf@gnu.org> Date: Sun, 10 May 2020 13:19:18 +0200 In-Reply-To: <87tv0o9j33.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sun, 10 May 2020 12:32:00 +0200") Message-ID: <87eersm409.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 40981@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 X-Spam-Score: -1.01 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Scan-Result: default: False [-1.01 / 13.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; GENERIC_REPUTATION(0.00)[-0.53893233736665]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.51.188.0/24:c]; IP_REPUTATION_HAM(0.00)[asn: 22989(0.08), country: US(-0.00), ip: 209.51.188.17(-0.54)]; DWL_DNSWL_FAIL(0.00)[209.51.188.17:server fail]; MX_GOOD(-0.50)[cached: eggs.gnu.org]; RCPT_COUNT_TWO(0.00)[2]; MAILLIST(-0.20)[mailman]; FORGED_RECIPIENTS_MAILLIST(0.00)[]; RCVD_IN_DNSWL_FAIL(0.00)[209.51.188.17:server fail]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:22989, ipnet:209.51.188.0/24, country:US]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_FROM(0.00)[larch=yhetil.org]; ARC_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[othacehe@gnu.org,bug-guix-bounces@gnu.org]; FROM_HAS_DN(0.00)[]; URIBL_BLOCKED(0.00)[zancanaro.id.au:email,elephly.net:email]; MIME_GOOD(-0.10)[multipart/mixed,text/plain,text/x-diff]; MIME_TRACE(0.00)[0:+,1:+,2:+]; DMARC_NA(0.00)[gnu.org]; HAS_LIST_UNSUB(-0.01)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.51.188.17:from]; RCVD_COUNT_SEVEN(0.00)[9]; FORGED_SENDER_MAILLIST(0.00)[] X-TUID: UU43J5rrXYr7 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hey Ludo, > Woow, good catch! Thanks :) >> The work-around here is to save the installed SIGTERM handler and reset >> it. Then, after forking, the parent can restore the SIGTERM handler. The >> child will use the default SIGTERM handler that terminates the process. > > OK, makes sense. (Another occasion to see how something like > =E2=80=98posix_spawn=E2=80=99 would be more appropriate than fork + exec= =E2=80=A6) Didn't know about that function, but it seems way easier to manipulate and less error prone indeed! > Shouldn=E2=80=99t it be: > > (let ((pgid (getpgid pid))) > (if (=3D (getpgid 0) pgid) > (kill pid signal) ;don't kill ourself > (kill (-p pgid) signal))) Yes, I changed it. > Note: Use the most specific comparison procedure, =E2=80=98=3D=E2=80=99 i= n this case, > because we know we=E2=80=99re dealing with numbers (it enables proper type > checking, better compiler optimizations, etc.). More importantly, =E2=80= =98eq?=E2=80=99 > performs pointer comparison, so it shouldn=E2=80=99t be used with numbers= (in > practice it works with =E2=80=9Cfixnums=E2=80=9D but not with bignums). Ok, noted! >> +# Try to trigger eventual race conditions, when killing a process betwe= en fork >> +# and execv calls. >> +for i in {1..50} >> +do >> + $herd restart test3 >> +done > > Would it reproduce the problem well enough? On a slow machine one time out of two, and on a faster machine, never. The 'reproducer' I used, was to add a 'sleep' between fork and exec, it works way better! Tell me if you think its better to drop it. > Could you send an updated patch? Here it is! > Thanks for the bug hunting and for the patch! Thanks for the fast review :) Mathieu --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-service-Fix-make-kill-destructor-when-PGID-is-zero.patch >From 79d3603bf15b8f815136178be8c8a236734a7596 Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Thu, 7 May 2020 18:39:41 +0200 Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero. When a process is forked, and before its GID is changed in "exec-command", it will share the parent GID, which is 0 for Shepherd. In that case, use the PID instead of the PGID. Also make sure to reset the SIGTERM handler before forking a process. Failing to do so, will result in stopping Shepherd if a SIGTERM is received between fork and execv calls. Restore the SIGTERM handler once the process has been forked. * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM handler and reset it before forking. Then, restore it on the parent after forking. (make-kill-destructor): Handle the case when PGID is zero, between the process fork and exec. --- modules/shepherd/service.scm | 33 ++++++++++++++++++++++++++------- tests/forking-service.sh | 18 ++++++++++++++++++ 2 files changed, 44 insertions(+), 7 deletions(-) diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index 8604d2f..45fcf32 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -5,6 +5,7 @@ ;; Copyright (C) 2016 Alex Kost ;; Copyright (C) 2018 Carlo Zancanaro ;; Copyright (C) 2019 Ricardo Wurmus +;; Copyright (C) 2020 Mathieu Othacehe ;; ;; This file is part of the GNU Shepherd. ;; @@ -884,7 +885,18 @@ its PID." (unless %sigchld-handler-installed? (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP) (set! %sigchld-handler-installed? #t)) - (let ((pid (primitive-fork))) + + ;; When forking a process, the signal handlers are inherited, until it + ;; forks. If SIGTERM is received by the forked process, before it calls + ;; execv, the installed SIGTERM handler, stopping Shepherd will be called. + ;; To avoid this, save the SIGTERM handler, disable it, and restore it once, + ;; the process has been forked. This way, the forked process will use the + ;; default SIGTERM handler stopping the process. + (let ((term-handler (match (sigaction SIGTERM) + ((proc . _) + proc))) + (pid (and (sigaction SIGTERM SIG_DFL) + (primitive-fork)))) (if (zero? pid) (exec-command command #:user user @@ -893,7 +905,10 @@ its PID." #:directory directory #:file-creation-mask file-creation-mask #:environment-variables environment-variables) - pid))) + (begin + ;; Restore the initial SIGTERM handler. + (sigaction SIGTERM term-handler) + pid)))) (define* (make-forkexec-constructor command #:key @@ -957,11 +972,15 @@ start." "Return a procedure that sends SIGNAL to the process group of the PID given as argument, where SIGNAL defaults to `SIGTERM'." (lambda (pid . args) - ;; Kill the whole process group PID belongs to. Don't assume that PID - ;; is a process group ID: that's not the case when using #:pid-file, - ;; where the process group ID is the PID of the process that - ;; "daemonized". - (kill (- (getpgid pid)) signal) + ;; Kill the whole process group PID belongs to. Don't assume that PID is + ;; a process group ID: that's not the case when using #:pid-file, where + ;; the process group ID is the PID of the process that "daemonized". If + ;; this procedure is called, between the process fork and exec, the PGID + ;; will still be zero (the Shepherd PGID). In that case, use the PID. + (let ((pgid (getpgid pid))) + (if (= (getpgid 0) pgid) + (kill pid signal) ;don't kill ourself + (kill (- pgid) signal))) #f)) ;; Produce a constructor that executes a command. diff --git a/tests/forking-service.sh b/tests/forking-service.sh index 7126c3d..bd9aac9 100644 --- a/tests/forking-service.sh +++ b/tests/forking-service.sh @@ -71,6 +71,17 @@ cat > "$conf"< + ;; A service that forks into a different process. + #:provides '(test3) + #:start (make-forkexec-constructor %command3) + #:stop (make-kill-destructor) + #:respawn? #t)) EOF cat $conf @@ -113,3 +124,10 @@ sleep 1; $herd status test2 | grep started test "`cat $PWD/$service2_started`" = "started started" + +# Try to trigger eventual race conditions, when killing a process between fork +# and execv calls. +for i in `seq 1 50` +do + $herd restart test3 +done -- 2.26.2 --=-=-=--