From: "Ludovic Courtès" <ludo@gnu.org>
To: Mathieu Othacehe <othacehe@gnu.org>
Cc: 43565@debbugs.gnu.org
Subject: bug#43565: cuirass: Fibers scheduling blocked.
Date: Fri, 23 Oct 2020 14:21:05 +0200 [thread overview]
Message-ID: <871rhpqgjy.fsf@gnu.org> (raw)
In-Reply-To: <874kmmzd92.fsf@gnu.org> (Mathieu Othacehe's message of "Thu, 22 Oct 2020 13:55:21 +0200")
[-- Attachment #1: Type: text/plain, Size: 5157 bytes --]
Good afternoon fearless hacker!
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> ‘process-build-log’ in Cuirass uses ‘read-line/non-blocking’ to read a
>> line from the log port of ‘build-derivations&’. If that really is
>> non-blocking (and I think it is), then we should be fine?
>>
>> We should attach GDB to Cuirass next time to see what’s blocking.
>
> Cuirass is currently hanging probably due to the same issue. I saved a
> GDB core dump in /home/mathieu/core.76483.
For those following along at home, we have 60 threads in there.
A couple of threads are blocked in ‘clock_nanosleep’, which I considered
fishy at first:
--8<---------------cut here---------------start------------->8---
(gdb) bt
#0 0x00007fe26752f7a1 in __GI___clock_nanosleep (clock_id=-612010, flags=0, req=0x7fdf6b40d140, rem=0x7fdf6b40d140)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
#1 0x00007fe267a0166d in ffi_call_unix64 ()
from /gnu/store/bw15z9kh9c65ycc2vbhl2izwfwfva7p1-libffi-3.3/lib/libffi.so.7
#2 0x00007fe2679ffac0 in ffi_call_int () from /gnu/store/bw15z9kh9c65ycc2vbhl2izwfwfva7p1-libffi-3.3/lib/libffi.so.7
#3 0x00007fe267af5f2e in scm_i_foreign_call (cif_scm=<optimized out>, pointer_scm=<optimized out>,
errno_ret=errno_ret@entry=0x7fe25a8e86cc, argv=0x7fe25b955df0) at foreign.c:1073
#4 0x00007fe267b64a84 in foreign_call (thread=0x7fe26741e480, cif=<optimized out>, pointer=<optimized out>)
at vm.c:1282
#5 0x00007fe2505253e0 in ?? ()
#6 0x00007fe26741e480 in ?? ()
#7 0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#8 0x00007fe26741e480 in ?? ()
#9 0x00007fe267b1043b in scm_jit_enter_mcode (thread=0x7fe26741e480, thread@entry=0x7fe2505253b0,
mcode=0x7fe25052627c "L\215\243\210") at jit.c:5852
#10 0x00007fe267b6bc24 in vm_regular_engine (thread=0x7fe2505253b0) at vm-engine.c:415
#11 0x00007fe267b6c5b5 in scm_call_n (proc=proc@entry=#<unmatched-tag 20045>, argv=argv@entry=0x0,
nargs=nargs@entry=0) at vm.c:1608
#12 0x00007fe267ae8ae9 in scm_call_0 (proc=proc@entry=#<unmatched-tag 20045>) at eval.c:490
#13 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=#<unmatched-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---
This can only come from (fibers posix-clocks) via
‘with-interrupts’—probably OK.
Then there’s a couple of threads block in ‘pthread_cond_wait’, but
that’s presumably also Fibers internals.
Then there’s a whole bunch of threads stuck in ‘read’:
--8<---------------cut here---------------start------------->8---
(gdb) bt
#0 0x00007fe267a180a4 in __libc_read (fd=80, buf=buf@entry=0x7fe22b0bb8f0, nbytes=nbytes@entry=8)
at ../sysdeps/unix/sysv/linux/read.c:26
#1 0x00007fe267af69c7 in fport_read (port=<optimized out>, dst=<optimized out>, start=<optimized out>, count=8)
at fports.c:597
#2 0x00007fe267b30542 in trampoline_to_c_read (port=#<port #<port-type file 7fe25fb4db40> 7fe22b7b9880>,
dst="#<vu8vector>" = {...}, start=0, count=8) at ports.c:266
#3 0x00007fe2580cb5fe in ?? ()
#4 0x00007fe267431d80 in ?? ()
#5 0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#6 0x00007fe267431d80 in ?? ()
#7 0x00007fe267b1043b in scm_jit_enter_mcode (thread=0x7fe267431d80, thread@entry=0x7fe2580cb5d0,
mcode=0x7fe229340690 "H\203\350(I\211\314I)\304I\203\374\060\017\205T\003") at jit.c:5852
#8 0x00007fe267b6b8e9 in vm_regular_engine (thread=0x7fe2580cb5d0) at vm-engine.c:360
#9 0x00007fe267b6c5b5 in scm_call_n (proc=proc@entry=#<unmatched-tag 20045>, argv=argv@entry=0x0,
nargs=nargs@entry=0) at vm.c:1608
#10 0x00007fe267ae8ae9 in scm_call_0 (proc=proc@entry=#<unmatched-tag 20045>) at eval.c:490
#11 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=#<unmatched-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---
‘trampoline_to_c_read’ is known as ‘port-read’ in Scheme, so I think the
call above comes from ‘read-bytes’ in (ice-9 suspendable-ports).
Normally, this file descriptor is O_NONBLOCK, and thus ‘fport_read’
immediately returns EAGAIN, so ‘trampoline_to_c_read’ returns #false.
But does Cuirass create file descriptors as O_NONBLOCK? This has to be
done explicitly, Fibers won’t do it for us. As it turns out, the answer
is no, in at least one important case: the connection to the daemon
(untested patch below).
While GC is running, Cuirass typically sends ‘build-derivations’ RPCs
and they block until the GC lock is released. That can lead to the
situation above: a bunch of threads blocked in ‘read’ from their daemon
socket, waiting for the RPC reply. OTOH, ‘build-derivations’ RPCs are
made from a fresh thread created by ‘build-derivations&’.
There are probably other situations where the daemon replies slowly.
For instance, ‘fetch-input’ can remain stuck until GC is over.
WDYT?
Thanks for investigating!
Ludo’.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1931 bytes --]
diff --git a/src/cuirass/base.scm b/src/cuirass/base.scm
index 5a0c826..6db43c4 100644
--- a/src/cuirass/base.scm
+++ b/src/cuirass/base.scm
@@ -36,6 +36,9 @@
#:use-module ((guix config) #:select (%state-directory))
#:use-module (git)
#:use-module (ice-9 binary-ports)
+ #:use-module ((ice-9 suspendable-ports)
+ #:select (current-read-waiter
+ current-write-waiter))
#:use-module (ice-9 format)
#:use-module (ice-9 match)
#:use-module (ice-9 popen)
@@ -79,7 +82,12 @@
;; currently closes in a 'dynamic-wind' handler, which means it would close
;; the store at each context switch. Remove this when the real 'with-store'
;; has been fixed.
- (let ((store (open-connection)))
+ (let* ((store (open-connection))
+ (socket (store-connection-socket store)))
+ ;; Mark SOCKET as non-blocking so Fibers can schedule the way it wants.
+ (let ((flags (fcntl socket F_GETFL)))
+ (fcntl socket F_SETFL (logior O_NONBLOCK flags)))
+
(unwind-protect
;; Always set #:keep-going? so we don't stop on the first build failure.
;; Set #:print-build-trace explicitly to make sure 'process-build-log'
@@ -422,7 +430,12 @@ Essentially this procedure inverts the inversion-of-control that
(lambda ()
(guard (c ((store-error? c)
(atomic-box-set! result c)))
- (parameterize ((current-build-output-port output))
+ (parameterize ((current-build-output-port output)
+
+ ;; STORE's socket is O_NONBLOCK but since we're
+ ;; not in a fiber, disable Fiber's handlers.
+ (current-read-waiter #f)
+ (current-write-waiter #f))
(let ((x (build-derivations store lst)))
(atomic-box-set! result x))))
(close-port output))
next prev parent reply other threads:[~2020-10-23 12:22 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 16:58 bug#43565: cuirass: Fibers scheduling blocked Mathieu Othacehe
2020-10-05 12:13 ` Ludovic Courtès
2020-10-22 11:55 ` Mathieu Othacehe
2020-10-23 12:21 ` Ludovic Courtès [this message]
2020-10-26 14:22 ` Mathieu Othacehe
2020-10-26 16:20 ` Ludovic Courtès
2020-10-27 18:03 ` Mathieu Othacehe
2020-11-02 10:09 ` Mathieu Othacehe
2020-11-19 10:56 ` Mathieu Othacehe
2020-11-20 8:37 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871rhpqgjy.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=43565@debbugs.gnu.org \
--cc=othacehe@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).