* bug#28779: tests/workers.scm failure
@ 2017-10-10 15:48 Eric Bavier
2017-11-16 8:29 ` Ludovic Courtès
0 siblings, 1 reply; 4+ messages in thread
From: Eric Bavier @ 2017-10-10 15:48 UTC (permalink / raw)
To: 28779
Roughly 1 in 2 runs of tests/workers.scm fails on my system. Output:
========================================================
GNU Guix 0.13.0.3413-984e3-dirty: ./test-suite.log
========================================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: tests/workers
===================
test-name: enqueue
location: /home/users/bavier/src/guix/tests/workers.scm:26
source:
+ (test-equal
+ "enqueue"
+ 4242
+ (let* ((pool (make-pool))
+ (result 0)
+ (#{1+!}# (let ((lock (make-mutex)))
+ (lambda ()
+ (with-mutex lock (set! result (+ result 1)))))))
+ (let loop ((i 4242))
+ (unless
+ (zero? i)
+ (pool-enqueue! pool #{1+!}#)
+ (loop (- i 1))))
+ (let poll ()
+ (unless
+ (pool-idle? pool)
+ (pk 'busy result)
+ (sleep 1)
+ (poll)))
+ result))
expected-value: 4242
actual-value: 4241
result: FAIL
To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Most failures show "actual-value: 4241", but I have also seen "actual-value: 4239" and "actual-value: 4240", which points to a race condition.
On this system '(current-processor-count) => 128'
Eric Bavier, Scientific Libraries, Cray Inc.
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#28779: tests/workers.scm failure
2017-10-10 15:48 bug#28779: tests/workers.scm failure Eric Bavier
@ 2017-11-16 8:29 ` Ludovic Courtès
2017-11-17 2:58 ` Eric Bavier
0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2017-11-16 8:29 UTC (permalink / raw)
To: Eric Bavier; +Cc: 28779
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
Hi Eric,
Eric Bavier <bavier@cray.com> skribis:
> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> + "enqueue"
> + 4242
> + (let* ((pool (make-pool))
> + (result 0)
> + (#{1+!}# (let ((lock (make-mutex)))
> + (lambda ()
> + (with-mutex lock (set! result (+ result 1)))))))
> + (let loop ((i 4242))
> + (unless
> + (zero? i)
> + (pool-enqueue! pool #{1+!}#)
> + (loop (- i 1))))
> + (let poll ()
> + (unless
> + (pool-idle? pool)
> + (pk 'busy result)
> + (sleep 1)
> + (poll)))
> + result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Indeed, good catch.
The attached patch is a bit crude but it should fix the problem.
Thoughts?
Thanks,
Ludo’.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3017 bytes --]
diff --git a/guix/workers.scm b/guix/workers.scm
index 846f5e50a..0f6f54bab 100644
--- a/guix/workers.scm
+++ b/guix/workers.scm
@@ -45,12 +45,13 @@
;;; Code:
(define-record-type <pool>
- (%make-pool queue mutex condvar workers)
+ (%make-pool queue mutex condvar workers busy)
pool?
(queue pool-queue)
(mutex pool-mutex)
(condvar pool-condition-variable)
- (workers pool-workers))
+ (workers pool-workers)
+ (busy pool-busy))
(define-syntax-rule (without-mutex mutex exp ...)
(dynamic-wind
@@ -62,12 +63,14 @@
(lock-mutex mutex))))
(define* (worker-thunk mutex condvar pop-queue
- #:key (thread-name "guix worker"))
+ #:key idle busy (thread-name "guix worker"))
"Return the thunk executed by worker threads."
(define (loop)
(match (pop-queue)
(#f ;empty queue
- (wait-condition-variable condvar mutex))
+ (idle)
+ (wait-condition-variable condvar mutex)
+ (busy))
((? procedure? proc)
;; Release MUTEX while executing PROC.
(without-mutex mutex
@@ -97,19 +100,24 @@ threads as reported by the operating system."
(let* ((mutex (make-mutex))
(condvar (make-condition-variable))
(queue (make-q))
+ (busy count)
(procs (unfold (cut >= <> count)
(lambda (n)
(worker-thunk mutex condvar
(lambda ()
(and (not (q-empty? queue))
(q-pop! queue)))
+ #:busy (lambda ()
+ (set! busy (+ 1 busy)))
+ #:idle (lambda ()
+ (set! busy (- busy 1)))
#:thread-name thread-name))
1+
0))
(threads (map (lambda (proc)
(call-with-new-thread proc))
procs)))
- (%make-pool queue mutex condvar threads)))
+ (%make-pool queue mutex condvar threads (lambda () busy))))
(define (pool-enqueue! pool thunk)
"Enqueue THUNK for future execution by POOL."
@@ -118,9 +126,11 @@ threads as reported by the operating system."
(signal-condition-variable (pool-condition-variable pool))))
(define (pool-idle? pool)
- "Return true if POOL doesn't have any task in its queue."
+ "Return true if POOL doesn't have any task in its queue and all the workers
+are currently idle (i.e., waiting for a task)."
(with-mutex (pool-mutex pool)
- (q-empty? (pool-queue pool))))
+ (and (q-empty? (pool-queue pool))
+ (zero? ((pool-busy pool))))))
(define-syntax-rule (eventually pool exp ...)
"Run EXP eventually on one of the workers of POOL."
^ permalink raw reply related [flat|nested] 4+ messages in thread
* bug#28779: tests/workers.scm failure
2017-11-16 8:29 ` Ludovic Courtès
@ 2017-11-17 2:58 ` Eric Bavier
2017-11-17 10:10 ` Ludovic Courtès
0 siblings, 1 reply; 4+ messages in thread
From: Eric Bavier @ 2017-11-17 2:58 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 28779@debbugs.gnu.org
Looks good to me.
Thanks,
Eric Bavier, Scientific Libraries, Cray Inc.
________________________________________
From: Ludovic Courtès <ludo@gnu.org>
Sent: Thursday, November 16, 2017 02:29
To: Eric Bavier
Cc: 28779@debbugs.gnu.org
Subject: Re: bug#28779: tests/workers.scm failure
Hi Eric,
Eric Bavier <bavier@cray.com> skribis:
> test-name: enqueue
> location: /home/users/bavier/src/guix/tests/workers.scm:26
> source:
> + (test-equal
> + "enqueue"
> + 4242
> + (let* ((pool (make-pool))
> + (result 0)
> + (#{1+!}# (let ((lock (make-mutex)))
> + (lambda ()
> + (with-mutex lock (set! result (+ result 1)))))))
> + (let loop ((i 4242))
> + (unless
> + (zero? i)
> + (pool-enqueue! pool #{1+!}#)
> + (loop (- i 1))))
> + (let poll ()
> + (unless
> + (pool-idle? pool)
> + (pk 'busy result)
> + (sleep 1)
> + (poll)))
> + result))
> expected-value: 4242
> actual-value: 4241
> result: FAIL
>
>
> To me the reason seems to be that the 'pool-idle? procedure indicates whether or not the task queue is empty, not whether all tasks have completed execution, so the poll loop exits before all 1+! updates are finished and the test fails.
Indeed, good catch.
The attached patch is a bit crude but it should fix the problem.
Thoughts?
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-11-17 10:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10 15:48 bug#28779: tests/workers.scm failure Eric Bavier
2017-11-16 8:29 ` Ludovic Courtès
2017-11-17 2:58 ` Eric Bavier
2017-11-17 10:10 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).