* deadlock in scm_join_thread(_timed) @ 2008-05-25 5:33 Julian Graham 2008-05-25 13:16 ` Neil Jerram 0 siblings, 1 reply; 9+ messages in thread From: Julian Graham @ 2008-05-25 5:33 UTC (permalink / raw) To: guile-devel Hi everyone, While I was testing and debugging some of the SRFI-18 code that Neil and I were working on, I noticed a deadlock that happens in scm_join_thread_timed. I'm pretty sure it affects the 1.8 codebase as well, although it's probably more common when doing timed joins. Thread joining in Guile (1.9 or 1.8) works as follows: 1. If the target thread has exited, return. 2. Block on the target thread's join queue. 3. When woken (because of a pthread_cond_signal, a spurious pthreads wakeup, or, in 1.9, a timeout expiration), check the target thread's exit status -- if it has exited, return. 4. Otherwise, SCM_TICK. 5. Go to step 2. The deadlock can happen if the thread exits during the tick, because there's no check of the exit status before block_self is called again. I'm pretty sure that moving step 1 into the beginning of the loop would fix this -- I can submit a patch against 1.8, 1.9, or both. Let me know what you guys would like. Regards, Julian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2008-05-25 5:33 deadlock in scm_join_thread(_timed) Julian Graham @ 2008-05-25 13:16 ` Neil Jerram 2008-05-27 2:53 ` Julian Graham 0 siblings, 1 reply; 9+ messages in thread From: Neil Jerram @ 2008-05-25 13:16 UTC (permalink / raw) To: Julian Graham; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 1586 bytes --] 2008/5/25 Julian Graham <joolean@gmail.com>: > Hi everyone, > > While I was testing and debugging some of the SRFI-18 code that Neil > and I were working on, I noticed a deadlock that happens in > scm_join_thread_timed. I'm pretty sure it affects the 1.8 codebase as > well, although it's probably more common when doing timed joins. > > Thread joining in Guile (1.9 or 1.8) works as follows: > > 1. If the target thread has exited, return. > 2. Block on the target thread's join queue. > 3. When woken (because of a pthread_cond_signal, a spurious pthreads > wakeup, or, in 1.9, a timeout expiration), check the target thread's > exit status -- if it has exited, return. > 4. Otherwise, SCM_TICK. > 5. Go to step 2. > > The deadlock can happen if the thread exits during the tick, because > there's no check of the exit status before block_self is called again. > I'm pretty sure that moving step 1 into the beginning of the loop > would fix this -- I can submit a patch against 1.8, 1.9, or both. > Let me know what you guys would like. > Hi Julian, Based on the synopsis above, I agree that moving step 1 inside the loop should fix this. In addition, though, I think it would be very good if we could add a minimal test that currently reproduces the deadlock, and so will serve to guard against future regressions here. Do you have such a test? No need for a patch against both 1.8 and 1.9; just one will do, and git cherry-pick will handle the other for us (unless the fix is significantly different in the two branches). Regards, Neil > > Regards, > Julian > > [-- Attachment #2: Type: text/html, Size: 2166 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2008-05-25 13:16 ` Neil Jerram @ 2008-05-27 2:53 ` Julian Graham 2009-05-20 18:24 ` Neil Jerram 0 siblings, 1 reply; 9+ messages in thread From: Julian Graham @ 2008-05-27 2:53 UTC (permalink / raw) To: Neil Jerram; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 709 bytes --] Hi Neil, > Based on the synopsis above, I agree that moving step 1 inside the loop > should fix this. In addition, though, I think it would be very good if we > could add a minimal test that currently reproduces the deadlock, and so will > serve to guard against future regressions here. Do you have such a test? I don't -- it seems to be pretty dependent on timing. I noticed it while running my SRFI-18 test suite in a loop, and it took hours to trigger. Any suggestions? > No need for a patch against both 1.8 and 1.9; just one will do, and git > cherry-pick will handle the other for us (unless the fix is significantly > different in the two branches). Okay, find it attached. Regards, Julian [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-threads.c-join_thread_timed-Avoid-deadlock-by-a.patch --] [-- Type: text/x-diff; name=0001-threads.c-join_thread_timed-Avoid-deadlock-by-a.patch, Size: 1467 bytes --] From 1a90835bd4c646fb1776d6a83489820772fb5bbd Mon Sep 17 00:00:00 2001 From: Julian Graham <julian@smokebottle.(none)> Date: Mon, 26 May 2008 21:27:24 -0400 Subject: [PATCH] * threads.c (join_thread_timed): Avoid deadlock by always checking thread exit status. --- libguile/threads.c | 32 ++++++++++++-------------------- 1 files changed, 12 insertions(+), 20 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index bf4ab16..e875cdf 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1094,29 +1094,21 @@ SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2, 0, timeout_ptr = &ctimeout; } - if (t->exited) - res = t->result; - else + while (1) { - while (1) + int err = 0; + if (t->exited) { - int err = block_self (t->join_queue, thread, &t->admin_mutex, - timeout_ptr); - if (err == 0) - { - if (t->exited) - { - res = t->result; - break; - } - } - else if (err == ETIMEDOUT) - break; - - scm_i_pthread_mutex_unlock (&t->admin_mutex); - SCM_TICK; - scm_i_scm_pthread_mutex_lock (&t->admin_mutex); + res = t->result; + break; } + err = block_self (t->join_queue, thread, &t->admin_mutex, timeout_ptr); + if (err == ETIMEDOUT) + break; + + scm_i_pthread_mutex_unlock (&t->admin_mutex); + SCM_TICK; + scm_i_scm_pthread_mutex_lock (&t->admin_mutex); } scm_i_pthread_mutex_unlock (&t->admin_mutex); -- 1.5.4.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2008-05-27 2:53 ` Julian Graham @ 2009-05-20 18:24 ` Neil Jerram 2009-05-20 20:58 ` Neil Jerram 0 siblings, 1 reply; 9+ messages in thread From: Neil Jerram @ 2009-05-20 18:24 UTC (permalink / raw) To: Julian Graham; +Cc: guile-devel "Julian Graham" <joolean@gmail.com> writes: > Hi Neil, > >> Based on the synopsis above, I agree that moving step 1 inside the loop >> should fix this. In addition, though, I think it would be very good if we >> could add a minimal test that currently reproduces the deadlock, and so will >> serve to guard against future regressions here. Do you have such a test? > > I don't -- it seems to be pretty dependent on timing. I noticed it > while running my SRFI-18 test suite in a loop, and it took hours to > trigger. Any suggestions? I'm looking into this now. Here's a nice little program that reproduces the deadlock. (use-modules (ice-9 threads)) (define other-thread (make-thread sleep 5)) (letrec ((delay-count 10) (aproc (lambda () (display delay-count) (newline) (set! delay-count (- delay-count 1)) (if (zero? delay-count) (begin (display "sleeping...\n") (sleep 10) (display "waking...\n")) (system-async-mark aproc))))) (sleep 2) (system-async-mark aproc) (join-thread other-thread)) Regards, Neil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2009-05-20 18:24 ` Neil Jerram @ 2009-05-20 20:58 ` Neil Jerram 2009-05-20 22:25 ` Ludovic Courtès 0 siblings, 1 reply; 9+ messages in thread From: Neil Jerram @ 2009-05-20 20:58 UTC (permalink / raw) To: Julian Graham; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 659 bytes --] Neil Jerram <neil@ossau.uklinux.net> writes: > "Julian Graham" <joolean@gmail.com> writes: > >> Hi Neil, >> >>> Based on the synopsis above, I agree that moving step 1 inside the loop >>> should fix this. In addition, though, I think it would be very good if we >>> could add a minimal test that currently reproduces the deadlock, and so will >>> serve to guard against future regressions here. Do you have such a test? >> >> I don't -- it seems to be pretty dependent on timing. I noticed it >> while running my SRFI-18 test suite in a loop, and it took hours to >> trigger. Any suggestions? Here is a proposed patch for branch_release-1-8. Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Remove-possible-deadlock-in-scm_join_thread.patch --] [-- Type: text/x-diff, Size: 3117 bytes --] From 66f3b6c1b043b814663668b5f83210c6e8d1e12d Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 20 May 2009 21:55:35 +0100 Subject: [PATCH] Remove possible deadlock in scm_join_thread * libguile/threads.c (scm_join_thread): Always recheck t->exited before calling block_self again, in case thread t has now exited. * test-suite/tests/threads.test (joining): New test. --- libguile/threads.c | 17 +++++++---------- test-suite/tests/threads.test | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index fc3e607..3d6df11 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -934,17 +934,14 @@ SCM_DEFINE (scm_join_thread, "join-thread", 1, 0, 0, scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); t = SCM_I_THREAD_DATA (thread); - if (!t->exited) + while (!t->exited) { - while (1) - { - block_self (t->join_queue, thread, &thread_admin_mutex, NULL); - if (t->exited) - break; - scm_i_pthread_mutex_unlock (&thread_admin_mutex); - SCM_TICK; - scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); - } + block_self (t->join_queue, thread, &thread_admin_mutex, NULL); + if (t->exited) + break; + scm_i_pthread_mutex_unlock (&thread_admin_mutex); + SCM_TICK; + scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); } res = t->result; diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index 0146016..34ee7ee 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -133,4 +133,36 @@ (lambda (n) (set! result (cons n result))) (lambda (n) (* 2 n)) '(0 1 2 3 4 5)) - (equal? result '(10 8 6 4 2 0))))))) + (equal? result '(10 8 6 4 2 0))))) + + ;; + ;; thread joining + ;; + + (with-test-prefix "joining" + + ;; scm_join_thread has a SCM_TICK in the middle of it, to + ;; allow asyncs to run (including signal delivery). We used + ;; to have a bug whereby if the joined thread terminated at + ;; the same time as the joining thread is in this SCM_TICK, + ;; scm_join_thread would not notice and would hang forever. + ;; So in this test we are setting up the following sequence of + ;; events. + ;; T=0 other thread is created and starts running + ;; T=2 main thread sets up an async that will sleep for 10 seconds + ;; T=2 main thread calls join-thread, which will... + ;; T=2 ...call the async, which starts sleeping + ;; T=5 other thread finishes its work and terminates + ;; T=7 async completes, main thread continues inside join-thread. + (pass-if "don't hang when joined thread terminates in SCM_TICK" + (let ((other-thread (make-thread sleep 5))) + (letrec ((delay-count 10) + (aproc (lambda () + (set! delay-count (- delay-count 1)) + (if (zero? delay-count) + (sleep 5) + (system-async-mark aproc))))) + (sleep 2) + (system-async-mark aproc) + (join-thread other-thread))) + #t)))) -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2009-05-20 20:58 ` Neil Jerram @ 2009-05-20 22:25 ` Ludovic Courtès 2009-05-23 22:57 ` Neil Jerram 0 siblings, 1 reply; 9+ messages in thread From: Ludovic Courtès @ 2009-05-20 22:25 UTC (permalink / raw) To: guile-devel Hello! Neil Jerram <neil@ossau.uklinux.net> writes: > Here is a proposed patch for branch_release-1-8. At first sight this looks good to me. Thanks! Ludo'. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2009-05-20 22:25 ` Ludovic Courtès @ 2009-05-23 22:57 ` Neil Jerram 2009-05-24 14:03 ` Ludovic Courtès 0 siblings, 1 reply; 9+ messages in thread From: Neil Jerram @ 2009-05-23 22:57 UTC (permalink / raw) To: Ludovic Courtès, Julian Graham; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 658 bytes --] ludo@gnu.org (Ludovic Courtès) writes: > Hello! > > Neil Jerram <neil@ossau.uklinux.net> writes: > >> Here is a proposed patch for branch_release-1-8. > > At first sight this looks good to me. Thanks! And here's the corresponding patch for master. It's slightly different, because scm_join_thread_timed in master allows for the join attempt timing out and should return a special timeout value in that case. Also I had to fix another problem, wait-condition-variable leaving asyncs blocked, before I could reproduce the scm_join_thread_timed issue in threads.test, so a patch for that problem is attached too. Regards, Neil [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Fix-wait-condition-variable-so-that-it-doesn-t-leave.patch --] [-- Type: text/x-diff, Size: 3465 bytes --] From a83a927bdbd6d5b971aa6f8172b78a2cdf34a5ef Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Sat, 23 May 2009 17:55:58 +0100 Subject: [PATCH] Fix wait-condition-variable so that it doesn't leave asyncs blocked * libguile/threads.c (fat_mutex_unlock): Unblock asyncs when breaking out of loop. * test-suite/tests/threads.test (asyncs-still-working?): New function, to test if asyncs are working (i.e. unblocked). Use this throughout threads.test, in particular before and after the "timed locking succeeds if mutex unlocked within timeout" test. --- libguile/threads.c | 1 + test-suite/tests/threads.test | 35 +++++++++++++++++++++++++++++++++-- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index bb874e2..947e595 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1491,6 +1491,7 @@ fat_mutex_unlock (SCM mutex, SCM cond, { if (relock) scm_lock_mutex_timed (mutex, SCM_UNDEFINED, owner); + t->block_asyncs--; break; } diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index caace7f..bd9f2f3 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -21,6 +21,12 @@ :use-module (ice-9 threads) :use-module (test-suite lib)) +(define (asyncs-still-working?) + (let ((a #f)) + (system-async-mark (lambda () + (set! a #t))) + (equal? '(a b c) '(a b c)) + a)) (if (provided? 'threads) (begin @@ -101,6 +107,9 @@ (with-test-prefix "n-for-each-par-map" + (pass-if "asyncs are still working 2" + (asyncs-still-working?)) + (pass-if "0 in limit 10" (n-for-each-par-map 10 noop noop '()) #t) @@ -143,12 +152,18 @@ (with-test-prefix "lock-mutex" + (pass-if "asyncs are still working 3" + (asyncs-still-working?)) + (pass-if "timed locking fails if timeout exceeded" (let ((m (make-mutex))) (lock-mutex m) (let ((t (begin-thread (lock-mutex m (+ (current-time) 1))))) (not (join-thread t))))) + (pass-if "asyncs are still working 6" + (asyncs-still-working?)) + (pass-if "timed locking succeeds if mutex unlocked within timeout" (let* ((m (make-mutex)) (c (make-condition-variable)) @@ -164,7 +179,12 @@ (unlock-mutex cm) (sleep 1) (unlock-mutex m) - (join-thread t))))) + (join-thread t)))) + + (pass-if "asyncs are still working 7" + (asyncs-still-working?)) + + ) ;; ;; timed mutex unlocking @@ -172,12 +192,18 @@ (with-test-prefix "unlock-mutex" + (pass-if "asyncs are still working 5" + (asyncs-still-working?)) + (pass-if "timed unlocking returns #f if timeout exceeded" (let ((m (make-mutex)) (c (make-condition-variable))) (lock-mutex m) (not (unlock-mutex m c (current-time))))) + (pass-if "asyncs are still working 4" + (asyncs-still-working?)) + (pass-if "timed unlocking returns #t if condition signaled" (let ((m1 (make-mutex)) (m2 (make-mutex)) @@ -226,7 +252,12 @@ (pass-if "timed joining succeeds if thread exits within timeout" (let ((t (begin-thread (begin (sleep 1) #t)))) - (join-thread t (+ (current-time) 2))))) + (join-thread t (+ (current-time) 2)))) + + (pass-if "asyncs are still working 1" + (asyncs-still-working?)) + + ) ;; ;; thread cancellation -- 1.5.6.5 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Remove-possible-deadlock-in-scm_join_thread_timed.patch --] [-- Type: text/x-diff, Size: 2789 bytes --] From 01404cdfacabf49a7b834837bd3c2acebaefc591 Mon Sep 17 00:00:00 2001 From: Neil Jerram <neil@ossau.uklinux.net> Date: Wed, 20 May 2009 21:55:35 +0100 Subject: [PATCH] Remove possible deadlock in scm_join_thread_timed * libguile/threads.c (scm_join_thread_timed): Recheck t->exited before looping round to call block_self again, in case thread t has now exited. * test-suite/tests/threads.test ("don't hang when joined thread terminates in SCM_TICK"): New test. --- libguile/threads.c | 10 ++++++++++ test-suite/tests/threads.test | 26 +++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index 947e595..d63c619 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1161,6 +1161,16 @@ SCM_DEFINE (scm_join_thread_timed, "join-thread", 1, 2, 0, scm_i_pthread_mutex_unlock (&t->admin_mutex); SCM_TICK; scm_i_scm_pthread_mutex_lock (&t->admin_mutex); + + /* Check for exit again, since we just released and + reacquired the admin mutex, before the next block_self + call (which would block forever if t has already + exited). */ + if (t->exited) + { + res = t->result; + break; + } } } diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index bd9f2f3..6d877b1 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -257,7 +257,31 @@ (pass-if "asyncs are still working 1" (asyncs-still-working?)) - ) + ;; scm_join_thread_timed has a SCM_TICK in the middle of it, + ;; to allow asyncs to run (including signal delivery). We + ;; used to have a bug whereby if the joined thread terminated + ;; at the same time as the joining thread is in this SCM_TICK, + ;; scm_join_thread_timed would not notice and would hang + ;; forever. So in this test we are setting up the following + ;; sequence of events. + ;; T=0 other thread is created and starts running + ;; T=2 main thread sets up an async that will sleep for 10 seconds + ;; T=2 main thread calls join-thread, which will... + ;; T=2 ...call the async, which starts sleeping + ;; T=5 other thread finishes its work and terminates + ;; T=7 async completes, main thread continues inside join-thread. + (pass-if "don't hang when joined thread terminates in SCM_TICK" + (let ((other-thread (make-thread sleep 5))) + (letrec ((delay-count 10) + (aproc (lambda () + (set! delay-count (- delay-count 1)) + (if (zero? delay-count) + (sleep 5) + (system-async-mark aproc))))) + (sleep 2) + (system-async-mark aproc) + (join-thread other-thread))) + #t)) ;; ;; thread cancellation -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2009-05-23 22:57 ` Neil Jerram @ 2009-05-24 14:03 ` Ludovic Courtès 2009-05-24 21:26 ` Neil Jerram 0 siblings, 1 reply; 9+ messages in thread From: Ludovic Courtès @ 2009-05-24 14:03 UTC (permalink / raw) To: Neil Jerram; +Cc: guile-devel Hello! Neil Jerram <neil@ossau.uklinux.net> writes: > Thanks! And here's the corresponding patch for master. It's slightly > different, because scm_join_thread_timed in master allows for the join > attempt timing out and should return a special timeout value in that > case. Also I had to fix another problem, wait-condition-variable > leaving asyncs blocked, before I could reproduce the > scm_join_thread_timed issue in threads.test, so a patch for that > problem is attached too. Cool, thank you! > +(define (asyncs-still-working?) > + (let ((a #f)) > + (system-async-mark (lambda () > + (set! a #t))) > + (equal? '(a b c) '(a b c)) > + a)) I guess `equal?' is here to trigger an `SCM_TICK', right? Perhaps a comment could be added to make it explicit? Thanks, Ludo'. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: deadlock in scm_join_thread(_timed) 2009-05-24 14:03 ` Ludovic Courtès @ 2009-05-24 21:26 ` Neil Jerram 0 siblings, 0 replies; 9+ messages in thread From: Neil Jerram @ 2009-05-24 21:26 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel ludo@gnu.org (Ludovic Courtès) writes: >> +(define (asyncs-still-working?) >> + (let ((a #f)) >> + (system-async-mark (lambda () >> + (set! a #t))) >> + (equal? '(a b c) '(a b c)) >> + a)) > > I guess `equal?' is here to trigger an `SCM_TICK', right? Perhaps a > comment could be added to make it explicit? Good idea, I'll do that before pushing. Thanks, Neil ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-05-24 21:26 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-25 5:33 deadlock in scm_join_thread(_timed) Julian Graham 2008-05-25 13:16 ` Neil Jerram 2008-05-27 2:53 ` Julian Graham 2009-05-20 18:24 ` Neil Jerram 2009-05-20 20:58 ` Neil Jerram 2009-05-20 22:25 ` Ludovic Courtès 2009-05-23 22:57 ` Neil Jerram 2009-05-24 14:03 ` Ludovic Courtès 2009-05-24 21:26 ` Neil Jerram
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).