From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: Race condition in threading code? Date: Sun, 31 Aug 2008 14:58:52 +0200 Message-ID: <877i9x9w8j.fsf@gnu.org> References: <2bc5f8210808161142n2b415569y8499f3efafb4a@mail.gmail.com> <87prnu293y.fsf@gnu.org> <2bc5f8210808270614s3ddc6e9fued2ed9f95da15303@mail.gmail.com> <2bc5f8210808301605v5a6376ffs98b58c848c2f64fa@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1220187574 2931 80.91.229.12 (31 Aug 2008 12:59:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 31 Aug 2008 12:59:34 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Aug 31 15:00:29 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KZmXY-0004i5-1L for guile-devel@m.gmane.org; Sun, 31 Aug 2008 15:00:24 +0200 Original-Received: from localhost ([127.0.0.1]:40401 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KZmWZ-0001t6-Bz for guile-devel@m.gmane.org; Sun, 31 Aug 2008 08:59:23 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KZmWS-0001qU-2b for guile-devel@gnu.org; Sun, 31 Aug 2008 08:59:16 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KZmWP-0001kt-Ln for guile-devel@gnu.org; Sun, 31 Aug 2008 08:59:15 -0400 Original-Received: from [199.232.76.173] (port=42750 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KZmWP-0001kf-H1 for guile-devel@gnu.org; Sun, 31 Aug 2008 08:59:13 -0400 Original-Received: from main.gmane.org ([80.91.229.2]:58262 helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KZmWO-0007bg-Uj for guile-devel@gnu.org; Sun, 31 Aug 2008 08:59:13 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1KZmWG-0001y4-Qe for guile-devel@gnu.org; Sun, 31 Aug 2008 12:59:04 +0000 Original-Received: from reverse-83.fdn.fr ([80.67.176.83]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 31 Aug 2008 12:59:04 +0000 Original-Received: from ludo by reverse-83.fdn.fr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 31 Aug 2008 12:59:04 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 38 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: reverse-83.fdn.fr X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 15 Fructidor an 216 de la =?iso-8859-1?Q?R=E9volutio?= =?iso-8859-1?Q?n?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: i686-pc-linux-gnu User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) Cancel-Lock: sha1:kVyGgjTCLBmsZm+tA/+pOFzZyeE= X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:7555 Archived-At: Hi Julian, "Julian Graham" writes: > ... > scm_i_pthread_mutex_unlock (&m->lock); > SCM_TICK; > scm_i_scm_pthread_mutex_lock (&m->lock); > } > block_self (m->waiting, mutex, &m->lock, timeout); > > ...which means that if the loop is entered while the mutex is still > locked but the owner unlocks it after the locking thread releases the > administrative lock to run the tick, the locking thread will sleep > forever because it doesn't re-check the state of the mutex. I've made > a small change (blocking before doing the tick instead of after) that > seems to resolve the issue (so far no lock-ups using Han-Wen's x.test > for a couple of hours). There's a patch attached. I think I understand your description, assuming "the mutex" is M, "the administrative lock" is `M->lock', and "the state" is the rest of the `fat_mutex' structure. Let me rephrase it: what can happen is that, during the tick, another thread could actually take M, increase `M->level' and mark itself as the owner. After the tick, our primary thread takes `M->lock' back, thinking it now owns M, and goes to sleep; but M is actually already taken by that other thread, so our primary thread never wakes up. (Not sure this description is any clearer...) I guess it can be applied to 1.8 as well? Another question: why is there this mixture of `scm_i_pthread' and `scm_i_scm_pthread' calls? Thanks for tracking it down! Ludo'.