unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#28779: tests/workers.scm failure
@ 2017-10-10 15:48 Eric Bavier
  2017-11-16  8:29 ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Bavier @ 2017-10-10 15:48 UTC (permalink / raw)
  To: 28779

Roughly 1 in 2 runs of tests/workers.scm fails on my system.  Output:

========================================================
   GNU Guix 0.13.0.3413-984e3-dirty: ./test-suite.log
========================================================

# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/workers
===================

test-name: enqueue
location: /home/users/bavier/src/guix/tests/workers.scm:26
source:
+ (test-equal
+   "enqueue"
+   4242
+   (let* ((pool (make-pool))
+          (result 0)
+          (#{1+!}# (let ((lock (make-mutex)))
+                     (lambda ()
+                       (with-mutex lock (set! result (+ result 1)))))))
+     (let loop ((i 4242))
+       (unless
+         (zero? i)
+         (pool-enqueue! pool #{1+!}#)
+         (loop (- i 1))))
+     (let poll ()
+       (unless
+         (pool-idle? pool)
+         (pk 'busy result)
+         (sleep 1)
+         (poll)))
+     result))
expected-value: 4242
actual-value: 4241
result: FAIL


To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.  

Most failures show "actual-value: 4241", but I have also seen "actual-value: 4239" and "actual-value: 4240", which points to a race condition.

On this system '(current-processor-count) => 128'

Eric Bavier, Scientific Libraries, Cray Inc.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#28779: tests/workers.scm failure
  2017-10-10 15:48 bug#28779: tests/workers.scm failure Eric Bavier
@ 2017-11-16  8:29 ` Ludovic Courtès
  2017-11-17  2:58   ` Eric Bavier
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2017-11-16  8:29 UTC (permalink / raw)
  To: Eric Bavier; +Cc: 28779

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

Hi Eric,

Eric Bavier <bavier@cray.com> skribis:

> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> +   "enqueue"
> +   4242
> +   (let* ((pool (make-pool))
> +          (result 0)
> +          (#{1+!}# (let ((lock (make-mutex)))
> +                     (lambda ()
> +                       (with-mutex lock (set! result (+ result 1)))))))
> +     (let loop ((i 4242))
> +       (unless
> +         (zero? i)
> +         (pool-enqueue! pool #{1+!}#)
> +         (loop (- i 1))))
> +     (let poll ()
> +       (unless
> +         (pool-idle? pool)
> +         (pk 'busy result)
> +         (sleep 1)
> +         (poll)))
> +     result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.  

Indeed, good catch.

The attached patch is a bit crude but it should fix the problem.

Thoughts?

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3017 bytes --]

diff --git a/guix/workers.scm b/guix/workers.scm
index 846f5e50a..0f6f54bab 100644
--- a/guix/workers.scm
+++ b/guix/workers.scm
@@ -45,12 +45,13 @@
 ;;; Code:
 
 (define-record-type <pool>
-  (%make-pool queue mutex condvar workers)
+  (%make-pool queue mutex condvar workers busy)
   pool?
   (queue    pool-queue)
   (mutex    pool-mutex)
   (condvar  pool-condition-variable)
-  (workers  pool-workers))
+  (workers  pool-workers)
+  (busy     pool-busy))
 
 (define-syntax-rule (without-mutex mutex exp ...)
   (dynamic-wind
@@ -62,12 +63,14 @@
       (lock-mutex mutex))))
 
 (define* (worker-thunk mutex condvar pop-queue
-                       #:key (thread-name "guix worker"))
+                       #:key idle busy (thread-name "guix worker"))
   "Return the thunk executed by worker threads."
   (define (loop)
     (match (pop-queue)
       (#f                                         ;empty queue
-       (wait-condition-variable condvar mutex))
+       (idle)
+       (wait-condition-variable condvar mutex)
+       (busy))
       ((? procedure? proc)
        ;; Release MUTEX while executing PROC.
        (without-mutex mutex
@@ -97,19 +100,24 @@ threads as reported by the operating system."
   (let* ((mutex   (make-mutex))
          (condvar (make-condition-variable))
          (queue   (make-q))
+         (busy    count)
          (procs   (unfold (cut >= <> count)
                           (lambda (n)
                             (worker-thunk mutex condvar
                                           (lambda ()
                                             (and (not (q-empty? queue))
                                                  (q-pop! queue)))
+                                          #:busy (lambda ()
+                                                   (set! busy (+ 1 busy)))
+                                          #:idle (lambda ()
+                                                   (set! busy (- busy 1)))
                                           #:thread-name thread-name))
                           1+
                           0))
          (threads (map (lambda (proc)
                          (call-with-new-thread proc))
                        procs)))
-    (%make-pool queue mutex condvar threads)))
+    (%make-pool queue mutex condvar threads (lambda () busy))))
 
 (define (pool-enqueue! pool thunk)
   "Enqueue THUNK for future execution by POOL."
@@ -118,9 +126,11 @@ threads as reported by the operating system."
     (signal-condition-variable (pool-condition-variable pool))))
 
 (define (pool-idle? pool)
-  "Return true if POOL doesn't have any task in its queue."
+  "Return true if POOL doesn't have any task in its queue and all the workers
+are currently idle (i.e., waiting for a task)."
   (with-mutex (pool-mutex pool)
-    (q-empty? (pool-queue pool))))
+    (and (q-empty? (pool-queue pool))
+         (zero? ((pool-busy pool))))))
 
 (define-syntax-rule (eventually pool exp ...)
   "Run EXP eventually on one of the workers of POOL."

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#28779: tests/workers.scm failure
  2017-11-16  8:29 ` Ludovic Courtès
@ 2017-11-17  2:58   ` Eric Bavier
  2017-11-17 10:10     ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Bavier @ 2017-11-17  2:58 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 28779@debbugs.gnu.org

Looks good to me.

Thanks,

Eric Bavier, Scientific Libraries, Cray Inc.

________________________________________
From: Ludovic Courtès <ludo@gnu.org>
Sent: Thursday, November 16, 2017 02:29
To: Eric Bavier
Cc: 28779@debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure

Hi Eric,

Eric Bavier <bavier@cray.com> skribis:

> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> +   "enqueue"
> +   4242
> +   (let* ((pool (make-pool))
> +          (result 0)
> +          (#{1+!}# (let ((lock (make-mutex)))
> +                     (lambda ()
> +                       (with-mutex lock (set! result (+ result 1)))))))
> +     (let loop ((i 4242))
> +       (unless
> +         (zero? i)
> +         (pool-enqueue! pool #{1+!}#)
> +         (loop (- i 1))))
> +     (let poll ()
> +       (unless
> +         (pool-idle? pool)
> +         (pk 'busy result)
> +         (sleep 1)
> +         (poll)))
> +     result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.

Indeed, good catch.

The attached patch is a bit crude but it should fix the problem.

Thoughts?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#28779: tests/workers.scm failure
  2017-11-17  2:58   ` Eric Bavier
@ 2017-11-17 10:10     ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2017-11-17 10:10 UTC (permalink / raw)
  To: Eric Bavier; +Cc: 28779-done

Eric Bavier <bavier@cray.com> skribis:

> Looks good to me.

Pushed as 232b3d31016439b5600e47d845ffb7c9a4ee4723.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-17 10:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10 15:48 bug#28779: tests/workers.scm failure Eric Bavier
2017-11-16  8:29 ` Ludovic Courtès
2017-11-17  2:58   ` Eric Bavier
2017-11-17 10:10     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).