From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Wingo Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 Date: Thu, 05 Jul 2018 10:00:52 +0200 Message-ID: <87lgaqjemj.fsf@igalia.com> References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:58521) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fazDL-0007TH-8G for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fazDG-0004vm-Ag for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:07 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:39304) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fazDG-0004v4-6m for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04 Jul 2018 23:33:52 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 31925@debbugs.gnu.org Hi! On Thu 05 Jul 2018 05:33, Mark H Weaver writes: >> One problem I=E2=80=99ve noticed is that the child process that >> =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck tryi= ng to get the >> allocation lock: >> >> So it seems quite clear that the thing has the alloc lock taken. I >> suppose this can happen if one of the libgc threads runs right when we >> call fork and takes the alloc lock, right? > > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. I think Mark is correct. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. So of course we agree you're only supposed to "fork" when there are no other threads running, I think. As far as the finalizer thread goes, "primitive-fork" calls "scm_i_finalizer_pre_fork" which should join the finalizer thread, before the fork. There could be a bug obviously but the intention is for Guile to shut down its internal threads. Here's the body of primitive-fork fwiw: { int pid; scm_i_finalizer_pre_fork (); if (scm_ilength (scm_all_threads ()) !=3D 1) /* Other threads may be holding on to resources that Guile needs -- it is not safe to permit one thread to fork while others are running. =20=20=20=20 In addition, POSIX clearly specifies that if a multi-threaded program forks, the child must only call functions that are async-signal-safe. We can't guarantee that in general. The best we can do is to allow forking only very early, before any call to sigaction spawns the signal-handling thread. */ scm_display (scm_from_latin1_string ("warning: call to primitive-fork while multiple threads are run= ning;\n" " further behavior unspecified. See \"Processes\" in t= he\n" " manual, for more information.\n"), scm_current_warning_port ()); pid =3D fork (); if (pid =3D=3D -1) SCM_SYSERROR; return scm_from_int (pid); } > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. The signal thread is a possibility though in that case you'd get a warning; the signal-handling thread appears in scm_all_threads. Do you see a warning? If you do, that is a problem :) >> If that is correct, the fix would be to call fork within >> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >> >> How does that sound? > > Sure, sounds good to me. I don't think this is necessary. I think the problem is that other threads are running. If we solve that, then we solve this issue; if we don't solve that, we don't know what else those threads are doing, so we don't know what mutexes and other state they might have. Andy