From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andy Wingo <wingo@igalia.com>
Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27
Date: Thu, 05 Jul 2018 10:00:52 +0200
Message-ID: <87lgaqjemj.fsf@igalia.com>
References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org>
	<87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58521)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1fazDL-0007TH-8G
	for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1fazDG-0004vm-Ag
	for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:07 -0400
Received: from debbugs.gnu.org ([208.118.235.43]:39304)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1fazDG-0004v4-6m
	for bug-guix@gnu.org; Thu, 05 Jul 2018 04:02:02 -0400
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.31925.B31925.1530777664334@debbugs.gnu.org>
In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04
	Jul 2018 23:33:52 -0400")
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Cc: 31925@debbugs.gnu.org

Hi!

On Thu 05 Jul 2018 05:33, Mark H Weaver <mhw@netris.org> writes:

>> One problem I=E2=80=99ve noticed is that the child process that
>> =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck tryi=
ng to get the
>> allocation lock:
>>
>> So it seems quite clear that the thing has the alloc lock taken.  I
>> suppose this can happen if one of the libgc threads runs right when we
>> call fork and takes the alloc lock, right?
>
> Does libgc spawn threads that run concurrently with user threads?  If
> so, that would be news to me.  My understanding was that incremental
> marking occurs within GC allocation calls, and marking threads are only
> spawned after all user threads have been stopped, but I could be wrong.

I think Mark is correct.

> The first idea that comes to my mind is that perhaps the finalization
> thread is holding the GC allocation lock when 'fork' is called.

So of course we agree you're only supposed to "fork" when there are no
other threads running, I think.

As far as the finalizer thread goes, "primitive-fork" calls
"scm_i_finalizer_pre_fork" which should join the finalizer thread,
before the fork.  There could be a bug obviously but the intention is
for Guile to shut down its internal threads.  Here's the body of
primitive-fork fwiw:

    {
      int pid;
      scm_i_finalizer_pre_fork ();
      if (scm_ilength (scm_all_threads ()) !=3D 1)
        /* Other threads may be holding on to resources that Guile needs --
           it is not safe to permit one thread to fork while others are
           running.
=20=20=20=20
           In addition, POSIX clearly specifies that if a multi-threaded
           program forks, the child must only call functions that are
           async-signal-safe.  We can't guarantee that in general.  The best
           we can do is to allow forking only very early, before any call to
           sigaction spawns the signal-handling thread.  */
        scm_display
          (scm_from_latin1_string
           ("warning: call to primitive-fork while multiple threads are run=
ning;\n"
            "         further behavior unspecified.  See \"Processes\" in t=
he\n"
            "         manual, for more information.\n"),
           scm_current_warning_port ());
      pid =3D fork ();
      if (pid =3D=3D -1)
        SCM_SYSERROR;
      return scm_from_int (pid);
    }

> Another possibility: both the finalization thread and the signal
> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
> which also temporarily grabs the GC allocation lock before calling the
> specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
> libgc.  You spawn the signal delivery thread by calling 'sigaction' and
> you make work for it to do every second when the SIGALRM is delivered.

The signal thread is a possibility though in that case you'd get a
warning; the signal-handling thread appears in scm_all_threads.  Do you
see a warning?  If you do, that is a problem :)

>> If that is correct, the fix would be to call fork within
>> =E2=80=98GC_call_with_alloc_lock=E2=80=99.
>>
>> How does that sound?
>
> Sure, sounds good to me.

I don't think this is necessary.  I think the problem is that other
threads are running.  If we solve that, then we solve this issue; if we
don't solve that, we don't know what else those threads are doing, so we
don't know what mutexes and other state they might have.

Andy