all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: 30365@debbugs.gnu.org
Subject: bug#30365: Offloading sometimes hangs
Date: Wed, 07 Feb 2018 14:42:22 +0100	[thread overview]
Message-ID: <87o9l0zzk1.fsf@gnu.org> (raw)
In-Reply-To: <877erq8med.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 06 Feb 2018 11:04:10 +0100")

ludo@gnu.org (Ludovic Courtès) skribis:

> On the build machine side, the guile process that forwards data between
> the sshd and guix-daemon¹ is stuck on:
>
>   read(0, …)

> ¹ https://git.savannah.gnu.org/cgit/guix.git/tree/guix/ssh.scm?id=0362e5820ab6a1eb8eaf33bc47e592857c25f765#n102

I’ve been able to strace this Guile process.  When “things work
normally”, its loop does things like this:

--8<---------------cut here---------------start------------->8---
20456 10:52:17 read(9, "gmlo\0\0\0\0\201\0\0\0\0\0\0\0make[2]: Entering directory '/tmp/guix-build-make-4.2.1.drv-0/make-4.2.1/doc'\nmake[2]: Nothing to be done for 'install-exec-am'.\n\0\0\0\0\0\0\0", 65536) = 152
20456 10:52:17 write(1, "gmlo\0\0\0\0\201\0\0\0\0\0\0\0make[2]: Entering directory '/tmp/guix-build-make-4.2.1.drv-0/make-4.2.1/doc'\nmake[2]: Nothing to be done for 'install-exec-am'.\n\0\0\0\0\0\0\0", 152) = 152
20456 10:52:17 rt_sigprocmask(SIG_BLOCK, NULL,  <unfinished ...>
20456 10:52:17 <... rt_sigprocmask resumed> [], 8) = 0
20456 10:52:17 pselect6(10, [0 3 9], [], [], NULL, NULL <unfinished ...>
20456 10:52:17 <... pselect6 resumed> ) = 1 (in [9])
20456 10:52:17 read(9, "gmlo\0\0\0\0\223\0\0\0\0\0\0\0 /gnu/store/m82nil6vykdywv9wmpn2k0rnq5zkw61z-bootstrap-binaries-0/bin/mkdir -p '/gnu/store/az7flk1jg6aaz1g9s7gpgpy0xc0igg7v-make-4.2.1/share/info'\n\0\0\0\0\0", 65536) = 168
20456 10:52:17 write(1, "gmlo\0\0\0\0\223\0\0\0\0\0\0\0 /gnu/store/m82nil6vykdywv9wmpn2k0rnq5zkw61z-bootstrap-binaries-0/bin/mkdir -p '/gnu/store/az7flk1jg6aaz1g9s7gpgpy0xc0igg7v-make-4.2.1/share/info'\n\0\0\0\0\0", 168) = 168
20456 10:52:17 rt_sigprocmask(SIG_BLOCK, NULL,  <unfinished ...>
20456 10:52:17 <... rt_sigprocmask resumed> [], 8) = 0
20456 10:52:17 pselect6(10, [0 3 9], [], [], NULL, NULL <unfinished ...>
20456 10:52:17 <... pselect6 resumed> ) = 1 (in [9])
20456 10:52:17 read(9, "gmlo\0\0\0\0\304\0\0\0\0\0\0\0 /gnu/store/m82nil6vykdywv9wmpn2k0rnq5zkw61z-bootstrap-binaries-0/bin/install -c -m 644 ./make.info ./make.info-1 ./make.info-2 '/gnu/store/az7flk1jg6aaz1g9s7gpgpy0xc0igg7v-make-4.2.1/share/info'\n\0\0\0\0", 65536) = 216
20456 10:52:17 write(1, "gmlo\0\0\0\0\304\0\0\0\0\0\0\0 /gnu/store/m82nil6vykdywv9wmpn2k0rnq5zkw61z-bootstrap-binaries-0/bin/install -c -m 644 ./make.info ./make.info-1 ./make.info-2 '/gnu/store/az7flk1jg6aaz1g9s7gpgpy0xc0igg7v-make-4.2.1/share/info'\n\0\0\0\0", 216) = 216
--8<---------------cut here---------------end--------------->8---

Here FD 9 is the guix-daemon socket and 1 is stdout, which is connected
to the remote ‘guix offload’ process.

We then see a slightly different sequence leading to the read(0, …) call
that hangs:

--8<---------------cut here---------------start------------->8---
20456 10:52:18 pselect6(10, [0 3 9], [], [], NULL, NULL <unfinished ...>
20456 10:52:18 <... pselect6 resumed> ) = 1 (in [9])
20456 10:52:18 read(9, "gmlo\0\0\0\0005\0\0\0\0\0\0\0phase `validate-runpath' succeeded after 0.0 seconds\n\0\0\0", 65536) = 72
20456 10:52:18 write(1, "gmlo\0\0\0\0005\0\0\0\0\0\0\0phase `validate-runpath' succeeded after 0.0 seconds\n\0\0\0", 72) = 72
20456 10:52:18 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=261868144}) = 0
20456 10:52:18 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
20456 10:52:18 madvise(0xffffb0025000, 4096, MADV_DONTNEED <unfinished ...>
20456 10:52:18 <... madvise resumed> )  = 0
20456 10:52:18 write(6, "\0", 1 <unfinished ...>
20456 10:52:18 <... write resumed> )    = 1
20456 10:52:18 clock_gettime(CLOCK_PROCESS_CPUTIME_ID,  <unfinished ...>
20456 10:52:18 <... clock_gettime resumed> {tv_sec=0, tv_nsec=268384968}) = 0
20456 10:52:18 read(0,  <unfinished ...>
--8<---------------cut here---------------end--------------->8---

Notice that there’s no pselect showing up here!

The two clock_gettime correspond to ‘get_internal_run_time’ in libguile,
and ‘madvise’ corresponds to ‘return_unused_stack_to_os’; IOW, this part
corresponds to garbage collection.

So what we have here is that the Scheme procedure ‘select’ returned
stdin as “ready for reading”.  How did that happen?  I believe this is
due to <https://bugs.gnu.org/30368>: ‘scm_i_prepare_to_wait_on_fd’
returns 1, so ‘select’ returns EINTR but it does so without clearing the
FD sets.

Ludo’.

  reply	other threads:[~2018-02-07 13:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-06 10:04 bug#30365: Offloading sometimes hangs Ludovic Courtès
2018-02-07 13:42 ` Ludovic Courtès [this message]
2018-02-07 20:54   ` Ludovic Courtès
2018-02-09 23:16   ` Ludovic Courtès
2018-02-10 10:17     ` Ricardo Wurmus
2018-02-10 11:07       ` Ludovic Courtès
2018-02-15  0:06     ` Danny Milosavljevic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9l0zzk1.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=30365@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.