unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43565: cuirass: Fibers scheduling blocked.
@ 2020-09-22 16:58 Mathieu Othacehe
  2020-10-05 12:13 ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Othacehe @ 2020-09-22 16:58 UTC (permalink / raw)
  To: 43565


Hello,

Today between 04:04 and 10:36 no inputs were fetched. Fetching is
supposed to happen every 5 minutes. This seem to be correlated to the
duration of the garbage collection happening on berlin.

--8<---------------cut here---------------start------------->8---
2020-09-22T04:04:23 fetching input 'core-updates' of spec 'core-updates-core-updates'
2020-09-22T04:04:25 build succeeded: '/gnu/store/c7m6jxdkyjs7m5ynavagjwgp172a3xzv-partition.img.drv'
waiting for the big garbage collector lock...
...
2020-09-22T10:36:02 fetching input 'guix' of spec 'guix-master'
--8<---------------cut here---------------end--------------->8---

A potential cause is described here:
https://issues.guix.gnu.org/43552#1.

Thanks,

Mathieu

-- 
https://othacehe.org




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-09-22 16:58 bug#43565: cuirass: Fibers scheduling blocked Mathieu Othacehe
@ 2020-10-05 12:13 ` Ludovic Courtès
  2020-10-22 11:55   ` Mathieu Othacehe
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2020-10-05 12:13 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43565

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

> Today between 04:04 and 10:36 no inputs were fetched. Fetching is
> supposed to happen every 5 minutes. This seem to be correlated to the
> duration of the garbage collection happening on berlin.
>
> 2020-09-22T04:04:23 fetching input 'core-updates' of spec 'core-updates-core-updates'
> 2020-09-22T04:04:25 build succeeded: '/gnu/store/c7m6jxdkyjs7m5ynavagjwgp172a3xzv-partition.img.drv'
> waiting for the big garbage collector lock...
> ...
> 2020-09-22T10:36:02 fetching input 'guix' of spec 'guix-master'
>
> A potential cause is described here:
> https://issues.guix.gnu.org/43552#1.

‘process-build-log’ in Cuirass uses ‘read-line/non-blocking’ to read a
line from the log port of ‘build-derivations&’.  If that really is
non-blocking (and I think it is), then we should be fine?

We should attach GDB to Cuirass next time to see what’s blocking.

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-05 12:13 ` Ludovic Courtès
@ 2020-10-22 11:55   ` Mathieu Othacehe
  2020-10-23 12:21     ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Othacehe @ 2020-10-22 11:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43565


Hey Ludo!

> ‘process-build-log’ in Cuirass uses ‘read-line/non-blocking’ to read a
> line from the log port of ‘build-derivations&’.  If that really is
> non-blocking (and I think it is), then we should be fine?
>
> We should attach GDB to Cuirass next time to see what’s blocking.

Cuirass is currently hanging probably due to the same issue. I saved a
GDB core dump in /home/mathieu/core.76483.

Could use your help finding the guilty thread :)

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-22 11:55   ` Mathieu Othacehe
@ 2020-10-23 12:21     ` Ludovic Courtès
  2020-10-26 14:22       ` Mathieu Othacehe
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2020-10-23 12:21 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43565

[-- Attachment #1: Type: text/plain, Size: 5157 bytes --]

Good afternoon fearless hacker!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> ‘process-build-log’ in Cuirass uses ‘read-line/non-blocking’ to read a
>> line from the log port of ‘build-derivations&’.  If that really is
>> non-blocking (and I think it is), then we should be fine?
>>
>> We should attach GDB to Cuirass next time to see what’s blocking.
>
> Cuirass is currently hanging probably due to the same issue. I saved a
> GDB core dump in /home/mathieu/core.76483.

For those following along at home, we have 60 threads in there.

A couple of threads are blocked in ‘clock_nanosleep’, which I considered
fishy at first:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007fe26752f7a1 in __GI___clock_nanosleep (clock_id=-612010, flags=0, req=0x7fdf6b40d140, rem=0x7fdf6b40d140)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
#1  0x00007fe267a0166d in ffi_call_unix64 ()
   from /gnu/store/bw15z9kh9c65ycc2vbhl2izwfwfva7p1-libffi-3.3/lib/libffi.so.7
#2  0x00007fe2679ffac0 in ffi_call_int () from /gnu/store/bw15z9kh9c65ycc2vbhl2izwfwfva7p1-libffi-3.3/lib/libffi.so.7
#3  0x00007fe267af5f2e in scm_i_foreign_call (cif_scm=<optimized out>, pointer_scm=<optimized out>, 
    errno_ret=errno_ret@entry=0x7fe25a8e86cc, argv=0x7fe25b955df0) at foreign.c:1073
#4  0x00007fe267b64a84 in foreign_call (thread=0x7fe26741e480, cif=<optimized out>, pointer=<optimized out>)
    at vm.c:1282
#5  0x00007fe2505253e0 in ?? ()
#6  0x00007fe26741e480 in ?? ()
#7  0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#8  0x00007fe26741e480 in ?? ()
#9  0x00007fe267b1043b in scm_jit_enter_mcode (thread=0x7fe26741e480, thread@entry=0x7fe2505253b0, 
    mcode=0x7fe25052627c "L\215\243\210") at jit.c:5852
#10 0x00007fe267b6bc24 in vm_regular_engine (thread=0x7fe2505253b0) at vm-engine.c:415
#11 0x00007fe267b6c5b5 in scm_call_n (proc=proc@entry=#<unmatched-tag 20045>, argv=argv@entry=0x0, 
    nargs=nargs@entry=0) at vm.c:1608
#12 0x00007fe267ae8ae9 in scm_call_0 (proc=proc@entry=#<unmatched-tag 20045>) at eval.c:490
#13 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=#<unmatched-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---

This can only come from (fibers posix-clocks) via
‘with-interrupts’—probably OK.

Then there’s a couple of threads block in ‘pthread_cond_wait’, but
that’s presumably also Fibers internals.

Then there’s a whole bunch of threads stuck in ‘read’:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007fe267a180a4 in __libc_read (fd=80, buf=buf@entry=0x7fe22b0bb8f0, nbytes=nbytes@entry=8)
    at ../sysdeps/unix/sysv/linux/read.c:26
#1  0x00007fe267af69c7 in fport_read (port=<optimized out>, dst=<optimized out>, start=<optimized out>, count=8)
    at fports.c:597
#2  0x00007fe267b30542 in trampoline_to_c_read (port=#<port #<port-type file 7fe25fb4db40> 7fe22b7b9880>, 
    dst="#<vu8vector>" = {...}, start=0, count=8) at ports.c:266
#3  0x00007fe2580cb5fe in ?? ()
#4  0x00007fe267431d80 in ?? ()
#5  0x00007fe267bd7620 in ?? () from /gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.0.4/lib/libguile-3.0.so.1
#6  0x00007fe267431d80 in ?? ()
#7  0x00007fe267b1043b in scm_jit_enter_mcode (thread=0x7fe267431d80, thread@entry=0x7fe2580cb5d0, 
    mcode=0x7fe229340690 "H\203\350(I\211\314I)\304I\203\374\060\017\205T\003") at jit.c:5852
#8  0x00007fe267b6b8e9 in vm_regular_engine (thread=0x7fe2580cb5d0) at vm-engine.c:360
#9  0x00007fe267b6c5b5 in scm_call_n (proc=proc@entry=#<unmatched-tag 20045>, argv=argv@entry=0x0, 
    nargs=nargs@entry=0) at vm.c:1608
#10 0x00007fe267ae8ae9 in scm_call_0 (proc=proc@entry=#<unmatched-tag 20045>) at eval.c:490
#11 0x00007fe267adb138 in scm_call_with_unblocked_asyncs (proc=#<unmatched-tag 20045>) at async.c:406
--8<---------------cut here---------------end--------------->8---

‘trampoline_to_c_read’ is known as ‘port-read’ in Scheme, so I think the
call above comes from ‘read-bytes’ in (ice-9 suspendable-ports).

Normally, this file descriptor is O_NONBLOCK, and thus ‘fport_read’
immediately returns EAGAIN, so ‘trampoline_to_c_read’ returns #false.

But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
done explicitly, Fibers won’t do it for us.  As it turns out, the answer
is no, in at least one important case: the connection to the daemon
(untested patch below).

While GC is running, Cuirass typically sends ‘build-derivations’ RPCs
and they block until the GC lock is released.  That can lead to the
situation above: a bunch of threads blocked in ‘read’ from their daemon
socket, waiting for the RPC reply.  OTOH, ‘build-derivations’ RPCs are
made from a fresh thread created by ‘build-derivations&’.

There are probably other situations where the daemon replies slowly.
For instance, ‘fetch-input’ can remain stuck until GC is over.

WDYT?

Thanks for investigating!

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1931 bytes --]

diff --git a/src/cuirass/base.scm b/src/cuirass/base.scm
index 5a0c826..6db43c4 100644
--- a/src/cuirass/base.scm
+++ b/src/cuirass/base.scm
@@ -36,6 +36,9 @@
   #:use-module ((guix config) #:select (%state-directory))
   #:use-module (git)
   #:use-module (ice-9 binary-ports)
+  #:use-module ((ice-9 suspendable-ports)
+                #:select (current-read-waiter
+                          current-write-waiter))
   #:use-module (ice-9 format)
   #:use-module (ice-9 match)
   #:use-module (ice-9 popen)
@@ -79,7 +82,12 @@
   ;; currently closes in a 'dynamic-wind' handler, which means it would close
   ;; the store at each context switch.  Remove this when the real 'with-store'
   ;; has been fixed.
-  (let ((store (open-connection)))
+  (let* ((store  (open-connection))
+         (socket (store-connection-socket store)))
+    ;; Mark SOCKET as non-blocking so Fibers can schedule the way it wants.
+    (let ((flags (fcntl socket F_GETFL)))
+      (fcntl socket F_SETFL (logior O_NONBLOCK flags)))
+
     (unwind-protect
      ;; Always set #:keep-going? so we don't stop on the first build failure.
      ;; Set #:print-build-trace explicitly to make sure 'process-build-log'
@@ -422,7 +430,12 @@ Essentially this procedure inverts the inversion-of-control that
           (lambda ()
             (guard (c ((store-error? c)
                        (atomic-box-set! result c)))
-              (parameterize ((current-build-output-port output))
+              (parameterize ((current-build-output-port output)
+
+                             ;; STORE's socket is O_NONBLOCK but since we're
+                             ;; not in a fiber, disable Fiber's handlers.
+                             (current-read-waiter #f)
+                             (current-write-waiter #f))
                 (let ((x (build-derivations store lst)))
                   (atomic-box-set! result x))))
             (close-port output))

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-23 12:21     ` Ludovic Courtès
@ 2020-10-26 14:22       ` Mathieu Othacehe
  2020-10-26 16:20         ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Othacehe @ 2020-10-26 14:22 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43565


Hey!

Many thanks for your help, you rock!

> But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
> done explicitly, Fibers won’t do it for us.  As it turns out, the answer
> is no, in at least one important case: the connection to the daemon
> (untested patch below).
>
> While GC is running, Cuirass typically sends ‘build-derivations’ RPCs
> and they block until the GC lock is released.  That can lead to the
> situation above: a bunch of threads blocked in ‘read’ from their daemon
> socket, waiting for the RPC reply.  OTOH, ‘build-derivations’ RPCs are
> made from a fresh thread created by ‘build-derivations&’.

While I agree not opening file descriptors with O_NONBLOCK is an issue,
build-derivations is called in a separate thread. Blocking this separate
thread should not block the fibers.

For instance, the following program:

--8<---------------cut here---------------start------------->8---
(use-modules (fibers)
             (ice-9 threads))

(run-fibers
 (lambda ()
   (spawn-fiber
    (lambda ()
      (call-with-new-thread
       (lambda ()
         (read (car (pipe)))))))
   (spawn-fiber
    (lambda ()
      (while #t
        (format #t "alive~%")
        (sleep 1)))))
 #:hz 10
 #:drain? #t)
--8<---------------cut here---------------end--------------->8---

keeps displaying "alive" even if the spawned thread is blocking. I guess
that's also what's happening in Cuirass because the log shows that some
fibers are scheduled while the GC is running.

Now the question is why there's no fetching while the GC is running? The
answer is that "latest-repository-commit" called by "fetch-input" will
block the only fiber dedicated to fetching. Having multiple fibers
trying to fetch wouldn't solve anything because fetching requires some
building from the daemon.

Long story short, I think we can apply your patch that can be useful to
prevent fibers talking directly to the daemon to block, even though it
won't help for this particular hang, that will only be fixed the GC time
will be reduced to something more acceptable.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-26 14:22       ` Mathieu Othacehe
@ 2020-10-26 16:20         ` Ludovic Courtès
  2020-10-27 18:03           ` Mathieu Othacehe
  2020-11-02 10:09           ` Mathieu Othacehe
  0 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-10-26 16:20 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43565

Hello!

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> But does Cuirass create file descriptors as O_NONBLOCK?  This has to be
>> done explicitly, Fibers won’t do it for us.  As it turns out, the answer
>> is no, in at least one important case: the connection to the daemon
>> (untested patch below).
>>
>> While GC is running, Cuirass typically sends ‘build-derivations’ RPCs
>> and they block until the GC lock is released.  That can lead to the
>> situation above: a bunch of threads blocked in ‘read’ from their daemon
>> socket, waiting for the RPC reply.  OTOH, ‘build-derivations’ RPCs are
>> made from a fresh thread created by ‘build-derivations&’.
>
> While I agree not opening file descriptors with O_NONBLOCK is an issue,
> build-derivations is called in a separate thread. Blocking this separate
> thread should not block the fibers.

Agreed.

> Now the question is why there's no fetching while the GC is running? The
> answer is that "latest-repository-commit" called by "fetch-input" will
> block the only fiber dedicated to fetching. Having multiple fibers
> trying to fetch wouldn't solve anything because fetching requires some
> building from the daemon.

Exactly: when the GC lock is taken, ‘latest-repository-commit’ makes an
‘add-to-store’ RPC, and that RPC blocks.  Thus the whole fetch fiber is
blocked.

The patch should address this case.  That said, nothing useful happens
anyway when the GC lock is held, so it wouldn’t have any practical
effect.

I believe there are other cases where RPCs can be slow, for example when
there’s contention on the sqlite database.  Perhaps that could help a
bit there although again, it’s a situation where nothing useful can
happen.

> Long story short, I think we can apply your patch that can be useful to
> prevent fibers talking directly to the daemon to block, even though it
> won't help for this particular hang, that will only be fixed the GC time
> will be reduced to something more acceptable.

Yeah please go ahead if you want, or let me know if you’d rather let me
apply it.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-26 16:20         ` Ludovic Courtès
@ 2020-10-27 18:03           ` Mathieu Othacehe
  2020-11-02 10:09           ` Mathieu Othacehe
  1 sibling, 0 replies; 10+ messages in thread
From: Mathieu Othacehe @ 2020-10-27 18:03 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43565-done


Hey,

> Yeah please go ahead if you want, or let me know if you’d rather let me
> apply it.

I applied your patch, thanks! I'm closing this one, because there's
nothing much that can be done right now.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-10-26 16:20         ` Ludovic Courtès
  2020-10-27 18:03           ` Mathieu Othacehe
@ 2020-11-02 10:09           ` Mathieu Othacehe
  2020-11-19 10:56             ` Mathieu Othacehe
  1 sibling, 1 reply; 10+ messages in thread
From: Mathieu Othacehe @ 2020-11-02 10:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43565


Hey,

> Yeah please go ahead if you want, or let me know if you’d rather let me
> apply it.

I finally reverted this patch that causes the following error:

--8<---------------cut here---------------start------------->8---
2020-11-02T11:05:08 fatal: uncaught exception 'wrong-type-arg' in 'build' fiber!
2020-11-02T11:05:08 exception arguments: ("struct-vtable" "Wrong type argument in position 1 (expecting struct): ~S" (#f) (#f))
In ice-9/boot-9.scm:
  1731:15 12 (with-exception-handler #<procedure 7fb1a93f9930 at ic…> …)
  1736:10 11 (with-exception-handler _ _ #:unwind? _ # _)
    718:2 10 (call-with-prompt ("break") #<procedure 7fb1ab76f440 a…> …)
    718:2  9 (call-with-prompt ("continue") #<procedure 7fb1ab77084…> …)
In ice-9/eval.scm:
    619:8  8 (_ #(#(#<directory (guile-user) 7fb1ac680f00> #<var…> …)))
In srfi/srfi-1.scm:
    634:9  7 (for-each #<procedure 7fb1a9525900 at cuirass/base.scm…> …)
In ice-9/boot-9.scm:
  1731:15  6 (with-exception-handler #<procedure 7fb1a95a94e0 at ic…> …)
  1669:16  5 (raise-exception _ #:continuable? _)
  1764:13  4 (_ #<&compound-exception components: (#<&assertion-fail…>)
In cuirass/utils.scm:
    319:8  3 (_ _ . _)
In ice-9/boot-9.scm:
  1731:15  2 (with-exception-handler #<procedure 7fb1ab2e3720 at ic…> …)
In cuirass/utils.scm:
   320:22  1 (_)
In unknown file:
           0 (make-stack #t)
ERROR: In procedure make-stack:
In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-11-02 10:09           ` Mathieu Othacehe
@ 2020-11-19 10:56             ` Mathieu Othacehe
  2020-11-20  8:37               ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Othacehe @ 2020-11-19 10:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43565


Hey,

> In cuirass/utils.scm:
>    320:22  1 (_)
> In unknown file:
>            0 (make-stack #t)
> ERROR: In procedure make-stack:
> In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f

I think this error is caused by setting:

--8<---------------cut here---------------start------------->8---
  ;; STORE's socket is O_NONBLOCK but since we're
  ;; not in a fiber, disable Fiber's handlers.
  (current-read-waiter #f)
  (current-write-waiter #f)
--8<---------------cut here---------------end--------------->8---

where it should be:

--8<---------------cut here---------------start------------->8---
  ;; STORE's socket is O_NONBLOCK but since we're
  ;; not in a fiber, disable Fiber's handlers.
  (current-read-waiter
   (lambda (port)
     (port-poll port "r")))
  (current-write-waiter
   (lambda (port)
     (port-poll port "w")))
--8<---------------cut here---------------end--------------->8---

then this should also be done in "fetch-inputs" that is using non
blocking ports outside of Fibers.

However, I still have the following error:

--8<---------------cut here---------------start------------->8---
In ice-9/boot-9.scm:
  1731:15 17 (with-exception-handler #<procedure 7fac67194000 at ic…> …)
  1736:10 16 (with-exception-handler _ _ #:unwind? _ # _)
In ice-9/eval.scm:
    619:8 15 (_ #(#(#(#(#<directory (cuirass base) 7fac6b51c…>)) …) …))
In unknown file:
          14 (_ #<procedure 7fac69b10b20 at ice-9/eval.scm:330:13 ()> …)
          13 (partition #<procedure 7fac69b10880 at ice-9/eval.scm:…> …)
In guix/store.scm:
   1008:0 12 (valid-path? #<store-connection 256.99 7fac6b3fd6e0> "/…")
2020-11-19T11:47:23 Failed to compute metric average-eval-build-start-time (1).
   717:11 11 (process-stderr #<store-connection 256.99 7fac6b3fd6e0> _)
In guix/serialization.scm:
    76:12 10 (read-int #<input-output: socket 49>)
In ice-9/suspendable-ports.scm:
   307:17  9 (get-bytevector-n #<input-output: socket 49> 8)
2020-11-19T11:47:23 Failed to compute metric average-eval-build-complete-time (1).
2020-11-19T11:47:23 Failed to compute metric evaluation-completion-speed (1).
   284:18  8 (get-bytevector-n! #<input-output: socket 49> #vu8(0 …) …)
    67:33  7 (read-bytes #<input-output: socket 49> #vu8(0 0 0 0 0 …) …)
In fibers/internal.scm:
    402:6  6 (suspend-current-fiber _)
In ice-9/boot-9.scm:
  1669:16  5 (raise-exception _ #:continuable? _)
  1764:13  4 (_ #<&compound-exception components: (#<&error> #<&orig…>)
In cuirass/utils.scm:
    319:8  3 (_ _ . _)
In ice-9/boot-9.scm:
  1731:15  2 (with-exception-handler #<procedure 7fac683ea300 at ic…> …)
In cuirass/utils.scm:
   320:22  1 (_)
In unknown file:
           0 (make-stack #t)
ERROR: In procedure make-stack:
Attempt to suspend fiber within continuation barrier
--8<---------------cut here---------------end--------------->8---

that originates from "valid-path?" in "restart-builds", not sure how to
fix it yet.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#43565: cuirass: Fibers scheduling blocked.
  2020-11-19 10:56             ` Mathieu Othacehe
@ 2020-11-20  8:37               ` Ludovic Courtès
  0 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-11-20  8:37 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43565

[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> In cuirass/utils.scm:
>>    320:22  1 (_)
>> In unknown file:
>>            0 (make-stack #t)
>> ERROR: In procedure make-stack:
>> In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f
>
> I think this error is caused by setting:
>
>   ;; STORE's socket is O_NONBLOCK but since we're
>   ;; not in a fiber, disable Fiber's handlers.
>   (current-read-waiter #f)
>   (current-write-waiter #f)
>
>
> where it should be:
>
>   ;; STORE's socket is O_NONBLOCK but since we're
>   ;; not in a fiber, disable Fiber's handlers.
>   (current-read-waiter
>    (lambda (port)
>      (port-poll port "r")))
>   (current-write-waiter
>    (lambda (port)
>      (port-poll port "w")))

Ooh, good catch.

> then this should also be done in "fetch-inputs" that is using non
> blocking ports outside of Fibers.
>
> However, I still have the following error:
>
> In ice-9/boot-9.scm:
>   1731:15 17 (with-exception-handler #<procedure 7fac67194000 at ic…> …)
>   1736:10 16 (with-exception-handler _ _ #:unwind? _ # _)
> In ice-9/eval.scm:
>     619:8 15 (_ #(#(#(#(#<directory (cuirass base) 7fac6b51c…>)) …) …))
> In unknown file:
>           14 (_ #<procedure 7fac69b10b20 at ice-9/eval.scm:330:13 ()> …)
>           13 (partition #<procedure 7fac69b10880 at ice-9/eval.scm:…> …)
> In guix/store.scm:
>    1008:0 12 (valid-path? #<store-connection 256.99 7fac6b3fd6e0> "/…")
> 2020-11-19T11:47:23 Failed to compute metric average-eval-build-start-time (1).
>    717:11 11 (process-stderr #<store-connection 256.99 7fac6b3fd6e0> _)
> In guix/serialization.scm:
>     76:12 10 (read-int #<input-output: socket 49>)
> In ice-9/suspendable-ports.scm:
>    307:17  9 (get-bytevector-n #<input-output: socket 49> 8)
> 2020-11-19T11:47:23 Failed to compute metric average-eval-build-complete-time (1).
> 2020-11-19T11:47:23 Failed to compute metric evaluation-completion-speed (1).
>    284:18  8 (get-bytevector-n! #<input-output: socket 49> #vu8(0 …) …)
>     67:33  7 (read-bytes #<input-output: socket 49> #vu8(0 0 0 0 0 …) …)
> In fibers/internal.scm:
>     402:6  6 (suspend-current-fiber _)
> In ice-9/boot-9.scm:
>   1669:16  5 (raise-exception _ #:continuable? _)
>   1764:13  4 (_ #<&compound-exception components: (#<&error> #<&orig…>)
> In cuirass/utils.scm:
>     319:8  3 (_ _ . _)
> In ice-9/boot-9.scm:
>   1731:15  2 (with-exception-handler #<procedure 7fac683ea300 at ic…> …)
> In cuirass/utils.scm:
>    320:22  1 (_)
> In unknown file:
>            0 (make-stack #t)
> ERROR: In procedure make-stack:
> Attempt to suspend fiber within continuation barrier
>
> that originates from "valid-path?" in "restart-builds", not sure how to
> fix it yet.

I think that’s because of the ‘partition’ call: ‘partition’ is currently
implemented in C and the stack cannot be captured if it contains C calls
in the middle.

The simplest fix is probably to have a Scheme implementation:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 945 bytes --]

diff --git a/src/cuirass/base.scm b/src/cuirass/base.scm
index 5a0c826..99a17fa 100644
--- a/src/cuirass/base.scm
+++ b/src/cuirass/base.scm
@@ -632,6 +632,21 @@ This procedure is meant to be called at startup."
      db "UPDATE Builds SET status = 4 WHERE status = -2 AND timestamp < "
      (- (time-second (current-time time-utc)) age) ";")))
 
+(define (partition pred lst)
+  ;; Scheme implementation of SRFI-1 'partition' so stack activations can be
+  ;; captured via 'abort-to-prompt'.
+  (let loop ((lst   lst)
+             (pass '())
+             (fail '()))
+    (match lst
+      (()
+       (values (reverse pass) (reverse fail)))
+      ((head . tail)
+       (let ((pass? (pred head)))
+         (loop tail
+               (if pass? (cons head pass) pass)
+               (if pass? fail (cons head fail))))))))
+
 (define (restart-builds)
   "Restart builds whose status in the database is \"pending\" (scheduled or
 started)."

[-- Attachment #3: Type: text/plain, Size: 229 bytes --]


It’s a bummer that one has to be aware of all these implementation
details when using Fibers.  The vision I think is that asymptotically
these issues would vanish as more things move from C to Scheme.

Thanks,
Ludo’.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-20  8:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 16:58 bug#43565: cuirass: Fibers scheduling blocked Mathieu Othacehe
2020-10-05 12:13 ` Ludovic Courtès
2020-10-22 11:55   ` Mathieu Othacehe
2020-10-23 12:21     ` Ludovic Courtès
2020-10-26 14:22       ` Mathieu Othacehe
2020-10-26 16:20         ` Ludovic Courtès
2020-10-27 18:03           ` Mathieu Othacehe
2020-11-02 10:09           ` Mathieu Othacehe
2020-11-19 10:56             ` Mathieu Othacehe
2020-11-20  8:37               ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).