From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Neil Jerram Newsgroups: gmane.lisp.guile.devel Subject: Re: deadlock in scm_join_thread(_timed) Date: Wed, 20 May 2009 21:58:33 +0100 Message-ID: <87iqjvze2u.fsf@arudy.ossau.uklinux.net> References: <2bc5f8210805242233x3ac66a60r6d135abd1d8a80a5@mail.gmail.com> <49dd78620805250616n47ae2ecfjfe4015f066199d24@mail.gmail.com> <2bc5f8210805261953s18bef0b9i9e32ee7cb4486c0e@mail.gmail.com> <87my97zl6u.fsf@arudy.ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1242854671 32702 80.91.229.12 (20 May 2009 21:24:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 20 May 2009 21:24:31 +0000 (UTC) Cc: guile-devel To: "Julian Graham" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed May 20 23:24:24 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1M6tGv-0000sF-QW for guile-devel@m.gmane.org; Wed, 20 May 2009 23:24:22 +0200 Original-Received: from localhost ([127.0.0.1]:49445 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6tGu-0003Ol-Jb for guile-devel@m.gmane.org; Wed, 20 May 2009 17:24:20 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M6ss9-0005HE-0Q for guile-devel@gnu.org; Wed, 20 May 2009 16:58:45 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M6ss4-0005EW-2L for guile-devel@gnu.org; Wed, 20 May 2009 16:58:44 -0400 Original-Received: from [199.232.76.173] (port=43676 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M6ss3-0005EQ-U1 for guile-devel@gnu.org; Wed, 20 May 2009 16:58:39 -0400 Original-Received: from mail3.uklinux.net ([80.84.72.33]:59026) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M6ss2-0003Xu-UM for guile-devel@gnu.org; Wed, 20 May 2009 16:58:39 -0400 Original-Received: from arudy (host86-145-50-237.range86-145.btcentralplus.com [86.145.50.237]) by mail3.uklinux.net (Postfix) with ESMTP id 63F1D1F8E88; Wed, 20 May 2009 21:58:34 +0100 (BST) Original-Received: from arudy.ossau.uklinux.net (arudy [127.0.0.1]) by arudy (Postfix) with ESMTP id 504F238013; Wed, 20 May 2009 21:58:33 +0100 (BST) In-Reply-To: <87my97zl6u.fsf@arudy.ossau.uklinux.net> (Neil Jerram's message of "Wed\, 20 May 2009 19\:24\:57 +0100") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.4-2.6 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:8495 Archived-At: --=-=-= Neil Jerram writes: > "Julian Graham" writes: > >> Hi Neil, >> >>> Based on the synopsis above, I agree that moving step 1 inside the loop >>> should fix this. In addition, though, I think it would be very good if we >>> could add a minimal test that currently reproduces the deadlock, and so will >>> serve to guard against future regressions here. Do you have such a test? >> >> I don't -- it seems to be pretty dependent on timing. I noticed it >> while running my SRFI-18 test suite in a loop, and it took hours to >> trigger. Any suggestions? Here is a proposed patch for branch_release-1-8. Neil --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-Remove-possible-deadlock-in-scm_join_thread.patch >From 66f3b6c1b043b814663668b5f83210c6e8d1e12d Mon Sep 17 00:00:00 2001 From: Neil Jerram Date: Wed, 20 May 2009 21:55:35 +0100 Subject: [PATCH] Remove possible deadlock in scm_join_thread * libguile/threads.c (scm_join_thread): Always recheck t->exited before calling block_self again, in case thread t has now exited. * test-suite/tests/threads.test (joining): New test. --- libguile/threads.c | 17 +++++++---------- test-suite/tests/threads.test | 34 +++++++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index fc3e607..3d6df11 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -934,17 +934,14 @@ SCM_DEFINE (scm_join_thread, "join-thread", 1, 0, 0, scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); t = SCM_I_THREAD_DATA (thread); - if (!t->exited) + while (!t->exited) { - while (1) - { - block_self (t->join_queue, thread, &thread_admin_mutex, NULL); - if (t->exited) - break; - scm_i_pthread_mutex_unlock (&thread_admin_mutex); - SCM_TICK; - scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); - } + block_self (t->join_queue, thread, &thread_admin_mutex, NULL); + if (t->exited) + break; + scm_i_pthread_mutex_unlock (&thread_admin_mutex); + SCM_TICK; + scm_i_scm_pthread_mutex_lock (&thread_admin_mutex); } res = t->result; diff --git a/test-suite/tests/threads.test b/test-suite/tests/threads.test index 0146016..34ee7ee 100644 --- a/test-suite/tests/threads.test +++ b/test-suite/tests/threads.test @@ -133,4 +133,36 @@ (lambda (n) (set! result (cons n result))) (lambda (n) (* 2 n)) '(0 1 2 3 4 5)) - (equal? result '(10 8 6 4 2 0))))))) + (equal? result '(10 8 6 4 2 0))))) + + ;; + ;; thread joining + ;; + + (with-test-prefix "joining" + + ;; scm_join_thread has a SCM_TICK in the middle of it, to + ;; allow asyncs to run (including signal delivery). We used + ;; to have a bug whereby if the joined thread terminated at + ;; the same time as the joining thread is in this SCM_TICK, + ;; scm_join_thread would not notice and would hang forever. + ;; So in this test we are setting up the following sequence of + ;; events. + ;; T=0 other thread is created and starts running + ;; T=2 main thread sets up an async that will sleep for 10 seconds + ;; T=2 main thread calls join-thread, which will... + ;; T=2 ...call the async, which starts sleeping + ;; T=5 other thread finishes its work and terminates + ;; T=7 async completes, main thread continues inside join-thread. + (pass-if "don't hang when joined thread terminates in SCM_TICK" + (let ((other-thread (make-thread sleep 5))) + (letrec ((delay-count 10) + (aproc (lambda () + (set! delay-count (- delay-count 1)) + (if (zero? delay-count) + (sleep 5) + (system-async-mark aproc))))) + (sleep 2) + (system-async-mark aproc) + (join-thread other-thread))) + #t)))) -- 1.5.6.5 --=-=-=--