From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 Date: Thu, 05 Jul 2018 10:34:38 +0200 Message-ID: <87lgaqgjxd.fsf@gnu.org> References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:35856) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fazkD-0007Nd-TO for bug-guix@gnu.org; Thu, 05 Jul 2018 04:36:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fazkA-0004sK-Nz for bug-guix@gnu.org; Thu, 05 Jul 2018 04:36:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:39323) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fazkA-0004qq-Il for bug-guix@gnu.org; Thu, 05 Jul 2018 04:36:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04 Jul 2018 23:33:52 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: Andy Wingo , 31925@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello Mark, Thanks for chiming in! Mark H Weaver skribis: > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. libgc launches mark threads as soon as it is initialized, I think. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. The > finalization thread grabs the GC allocation lock every time it calls > 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors > (including pipes) register finalizers and therefore spawn the > finalization thread and make work for it to do. In 2.2 there=E2=80=99s scm_i_finalizer_pre_fork that takes care of shutting= down the finalization thread right before fork. So the finalization thread cannot be blamed, AIUI. > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. That=E2=80=99s definitely a possibility: the signal thread could be allocat= ing stuff, and thereby taking the alloc lock just at that time. >> If that is correct, the fix would be to call fork within >> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >> >> How does that sound? > > Sure, sounds good to me. Here=E2=80=99s a patch: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/libguile/posix.c b/libguile/posix.c index b0fcad5fd..088e75631 100644 --- a/libguile/posix.c +++ b/libguile/posix.c @@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1, #undef FUNC_NAME #ifdef HAVE_FORK +static void * +do_fork (void *pidp) +{ + * (int *) pidp = fork (); + return NULL; +} + SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, (), "Creates a new \"child\" process by duplicating the current \"parent\" process.\n" @@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, " further behavior unspecified. See \"Processes\" in the\n" " manual, for more information.\n"), scm_current_warning_port ()); - pid = fork (); + + /* Take the alloc lock to make sure it is released when the child + process starts. Failing to do that the child process could start + in a state where the alloc lock is taken and will never be + released. */ + GC_call_with_alloc_lock (do_fork, &pid); + if (pid == -1) SCM_SYSERROR; return scm_from_int (pid); --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Thoughts? Unfortunately my =E2=80=98call-with-decompressed-port=E2=80=99 reproducer d= oesn=E2=80=99t seem t to reproduce much today so I can=E2=80=99t tell if this helps (I let it run= more than 5 minutes with the supposedly-buggy Guile and nothing happened=E2=80= =A6). Thanks, Ludo=E2=80=99. --=-=-=--