From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Wingo Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 Date: Thu, 05 Jul 2018 16:04:12 +0200 Message-ID: <874lhdkcdf.fsf@igalia.com> References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> <87601ukneg.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56392) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fb4sb-0004yq-Ii for bug-guix@gnu.org; Thu, 05 Jul 2018 10:05:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fb4sY-0001pN-D4 for bug-guix@gnu.org; Thu, 05 Jul 2018 10:05:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:39993) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fb4sY-0001md-8m for bug-guix@gnu.org; Thu, 05 Jul 2018 10:05:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87601ukneg.fsf@netris.org> (Mark H. Weaver's message of "Thu, 05 Jul 2018 06:05:59 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 31925@debbugs.gnu.org Hi, On Thu 05 Jul 2018 12:05, Mark H Weaver writes: > However, it's also the case that libgc uses 'pthread_atfork' (where > available) to arrange to grab the GC allocation as well as the "mark > locks" in the case where parallel marking is enabled. See > fork_prepare_proc, fork_parent_proc, and fork_child_proc in > pthread_support.c. I don't think this is enabled by default. You have to configure your libgc this way. When investigating similar bugs, I proposed enabling it by default a while ago: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-February/004958.html I ended up realizing that pthread_atfork was just a bogus interface. For one, it turns out that POSIX clearly says that if a multithreaded program forks, the behavior of the child after the fork is undefined if it calls any non-async-signal-safe function before calling exec(): https://lists.gnu.org/archive/html/guile-devel/2012-02/msg00157.html But practically, the only reasonable thing to do with atfork is to grab all of the locks, then release them after forking, in both child and parent. However you can't do this without deadlocking from a library, as the total lock order is a property of the program and not something a library can decide. There are thus two solutions: either ensure that there are no other threads when you fork, or only call async-signal-safe functions before you exec(). open-process does the latter. fork will warn if the former is not the case. When last I looked into this, I concluded that pthread_atfork doesn't buy us anything, though I could be wrong! >> Here's the body of primitive-fork fwiw: >> >> { >> int pid; >> scm_i_finalizer_pre_fork (); >> if (scm_ilength (scm_all_threads ()) != 1) > > I think there's a race here. I think it's possible for the finalizer > thread to be respawned after 'scm_i_finalizer_pre_fork' in two different > ways: > > (1) 'scm_all_threads' performs allocation, which could trigger GC. > > (2) another thread could perform heap allocation and trigger GC after > 'scm_i_finalizer_pre_fork' joins the thread. it might then shut > down before 'scm_all_thread' is called. > > However, these are highly unlikely scenarios, and most likely not the > problem we're seeing here. > > Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the > 'if', to avoid this race. Good point! Probably we should use some non-allocating scm_i_is_multi_threaded() or something. We can't move the pre-fork thing though because one of the effects we are looking for is to reduce the thread count! Cheers, Andy