Race condition in threading code?

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* Race condition in threading code?
@ 2008-08-16 18:21 Han-Wen Nienhuys
  2008-08-16 18:42 ` Julian Graham
  0 siblings, 1 reply; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-16 18:21 UTC (permalink / raw)
  To: guile-devel

Hi there, 

I have finished my GC cleanup.  It mostly seems to work, but I get transient errors in the srfi-18 test. 

I am suspecting a race condition in the eval module

WARNING: (srfi srfi-18): imported module (srfi srfi-34) overrides core binding `raise'
ERROR: srfi-18.test: thread-start!: thread activates only after start - arguments: ((syntax-error "memoization" "In file ~S, line ~S: ~A ~S in expression ~S." ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135 "Bad binding" ct (let (ct (current-thread)) #@let (or (hashq-ref thread-exception-handlers ct) (hashq-set! thread-exception-handlers ct (list initial-handler))))) #f))
ERROR: srfi-18.test: thread-terminate!: termination destroys non-started thread - arguments: ((syntax-error "memoization" "In file ~S, line ~S: ~A ~S in expression ~S." ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135 "Bad binding" ct (let (ct (current-thread)) #@let (or (hashq-ref thread-exception-handlers ct) (hashq-set! thread-exception-handlers ct (list initial-handler))))) #f))

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-16 18:21 Race condition in threading code? Han-Wen Nienhuys
@ 2008-08-16 18:42 ` Julian Graham
  2008-08-16 18:45   ` Han-Wen Nienhuys
  0 siblings, 1 reply; 24+ messages in thread
From: Julian Graham @ 2008-08-16 18:42 UTC (permalink / raw)
  To: hanwen; +Cc: guile-devel

Hmmm... I don't recall seeing those when I was writing that test
suite.  Just to be clear, were you getting those errors before making
your changes?


On Sat, Aug 16, 2008 at 2:21 PM, Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> Hi there,
>
> I have finished my GC cleanup.  It mostly seems to work, but I get transient errors in the srfi-18 test.
>
> I am suspecting a race condition in the eval module
>
> WARNING: (srfi srfi-18): imported module (srfi srfi-34) overrides core binding `raise'
> ERROR: srfi-18.test: thread-start!: thread activates only after start - arguments: ((syntax-error "memoization" "In file ~S, line ~S: ~A ~S in expression ~S." ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135 "Bad binding" ct (let (ct (current-thread)) #@let (or (hashq-ref thread-exception-handlers ct) (hashq-set! thread-exception-handlers ct (list initial-handler))))) #f))
> ERROR: srfi-18.test: thread-terminate!: termination destroys non-started thread - arguments: ((syntax-error "memoization" "In file ~S, line ~S: ~A ~S in expression ~S." ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135 "Bad binding" ct (let (ct (current-thread)) #@let (or (hashq-ref thread-exception-handlers ct) (hashq-set! thread-exception-handlers ct (list initial-handler))))) #f))




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-16 18:42 ` Julian Graham
@ 2008-08-16 18:45   ` Han-Wen Nienhuys
  2008-08-26 20:23     ` Andy Wingo
  0 siblings, 1 reply; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-16 18:45 UTC (permalink / raw)
  To: guile-devel

Julian Graham escreveu:
> Hmmm... I don't recall seeing those when I was writing that test
> suite.  Just to be clear, were you getting those errors before making
> your changes?

No, but some very unrelated changes made them go away again. 

I'm running GUILE through helgrind, but get inundated with locking errors.
(there are many locks in the code, and no fixed ordering.)


-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-16 18:45   ` Han-Wen Nienhuys
@ 2008-08-26 20:23     ` Andy Wingo
  2008-08-27  0:41       ` Han-Wen Nienhuys
                         ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Andy Wingo @ 2008-08-26 20:23 UTC (permalink / raw)
  To: hanwen; +Cc: guile-devel

Hi,

On Sat 16 Aug 2008 11:45, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:

> Julian Graham escreveu:
>> Hmmm... I don't recall seeing those when I was writing that test
>> suite.  Just to be clear, were you getting those errors before making
>> your changes?
>
> No, but some very unrelated changes made them go away again. 

I still get that error, having merged master into vm. Do you have other
fixes?

The original error that you had, reflowed:

ERROR: srfi-18.test: thread-start!:
  thread activates only after start
   - arguments: ((syntax-error "memoization"
                  "In file ~S, line ~S: ~A ~S in expression ~S."
                  ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
                   "Bad binding" ct
                    (let (ct (current-thread))
                    #@let (or (hashq-ref thread-exception-handlers ct)
                          (hashq-set! thread-exception-handlers ct
                    (list initial-handler))))) #f))

Bad binding with an #@let???? This would seem to indicate that a
memoized cell was not marked, and was then swept/re-used.

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-26 20:23     ` Andy Wingo
@ 2008-08-27  0:41       ` Han-Wen Nienhuys
  2008-08-27  2:36       ` Han-Wen Nienhuys
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-27  0:41 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

It could be an existing problem that is exposed by the new tighter allocation.
With a larger heap, there are larger intervals between recollection of the heap.

Let me run this through DEBUGINFO when I get the chance.


On Tue, Aug 26, 2008 at 5:23 PM, Andy Wingo <wingo@pobox.com> wrote:
> Hi,
>
> On Sat 16 Aug 2008 11:45, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>> Julian Graham escreveu:
>>> Hmmm... I don't recall seeing those when I was writing that test
>>> suite.  Just to be clear, were you getting those errors before making
>>> your changes?
>>
>> No, but some very unrelated changes made them go away again.
>
> I still get that error, having merged master into vm. Do you have other
> fixes?
>
> The original error that you had, reflowed:
>
> ERROR: srfi-18.test: thread-start!:
>  thread activates only after start
>   - arguments: ((syntax-error "memoization"
>                  "In file ~S, line ~S: ~A ~S in expression ~S."
>                  ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>                   "Bad binding" ct
>                    (let (ct (current-thread))
>                    #@let (or (hashq-ref thread-exception-handlers ct)
>                          (hashq-set! thread-exception-handlers ct
>                    (list initial-handler))))) #f))
>
> Bad binding with an #@let???? This would seem to indicate that a
> memoized cell was not marked, and was then swept/re-used.
>
> Andy
> --
> http://wingolog.org/
>



-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-26 20:23     ` Andy Wingo
  2008-08-27  0:41       ` Han-Wen Nienhuys
@ 2008-08-27  2:36       ` Han-Wen Nienhuys
  2008-08-27  7:46       ` Ludovic Courtès
  2008-08-31 20:08       ` Ludovic Courtès
  3 siblings, 0 replies; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-27  2:36 UTC (permalink / raw)
  To: guile-devel

Andy Wingo escreveu:
> Hi,
> 
> On Sat 16 Aug 2008 11:45, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> 
>> Julian Graham escreveu:
>>> Hmmm... I don't recall seeing those when I was writing that test
>>> suite.  Just to be clear, were you getting those errors before making
>>> your changes?
>> No, but some very unrelated changes made them go away again. 
> 
> I still get that error, having merged master into vm. Do you have other
> fixes?

Hi,

can we have some certainty that this 

> 
> The original error that you had, reflowed:
> 
> ERROR: srfi-18.test: thread-start!:
>   thread activates only after start
>    - arguments: ((syntax-error "memoization"
>                   "In file ~S, line ~S: ~A ~S in expression ~S."
>                   ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>                    "Bad binding" ct
>                     (let (ct (current-thread))
>                     #@let (or (hashq-ref thread-exception-handlers ct)
>                           (hashq-set! thread-exception-handlers ct
>                     (list initial-handler))))) #f))
> 
> Bad binding with an #@let???? This would seem to indicate that a
> memoized cell was not marked, and was then swept/re-used.


x.test:

(define-module (test-suite test-srfi-18)
  #:use-module (test-suite lib)
  #:use-module (srfi srfi-18))

(iota 1000000)
(for-each (lambda (x)
 (display x) (newline)
 (gc) (gc)
 (with-test-prefix "thread-start!"

  (pass-if "thread activates only after start" 
    (let* ((started #f)
	   (m (make-mutex 'thread-start-mutex))
	   (t (make-thread (lambda () (set! started #t)) 'thread-start-1)))
      (and (not started) (thread-start! t) (thread-join! t) started))))

) (iota 1000))


...
[Thread 0xb7322b90 (LWP 10865) exited]
333
[New Thread 0xb7322b90 (LWP 10866)]
[Thread 0xb7322b90 (LWP 10866) exited]
334
[New Thread 0xb7322b90 (LWP 10867)]
[Thread 0xb7322b90 (LWP 10867) exited]
335
[New Thread 0xb7322b90 (LWP 10868)]


<hang forever>


The large allocation and the 2 (gc) calls ensure that this test itself does not trigger gcs.

The code exercised here appears to deadlock.  I would be suspect of other threading problems 
(race conditions) in this code as well.


^C
Program received signal SIGINT, Interrupt.
0x00110416 in __kernel_vsyscall ()
(gdb) bt
#0  0x00110416 in __kernel_vsyscall ()
#1  0x007f3ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x001c7453 in scm_pthread_cond_wait (cond=0x955c04c, mutex=0x95f2e90)
    at threads.c:1804
#3  0x001c79bd in block_self (queue=0xb7397690, 
    sleep_object=<value optimized out>, mutex=0x95f2e90, waittime=0x0)
    at threads.c:248
#4  0x001c8875 in fat_mutex_lock (mutex=0xb73976c0, timeout=0x0, 
    owner=0xb7f87a28, ret=0xbfee5348) at threads.c:1299
#5  0x001c8b6b in scm_lock_mutex_timed (m=0xb73976c0, timeout=0x204, 
    owner=0xb7f87a28) at threads.c:1331
#6  0x001c8d54 in fat_mutex_unlock (mutex=0xb73976c0, cond=0xb7397590, 
    waittime=0x0, relock=1) at threads.c:1449
#7  0x001c8e89 in scm_timed_wait_condition_variable (cv=0xb7397590, 
    mx=0xb73976c0, t=0x204) at threads.c:1626
#8  0x00171ab3 in scm_gsubr_apply (args=0x404) at gsubr.c:221
#9  0x001588f3 in scm_apply (proc=0xb7f87a60, arg1=0xb7f87fb0, args=0xb73970e8)
    at eval.i.c:1778
#10 0x0014a2b7 in ceval (x=0x404, env=0xb7397170) at eval.i.c:1360
#11 0x0014a4cd in ceval (x=0x95a3fc0, env=0xb7397170) at eval.i.c:358
#12 0x0014dfbb in ceval (x=0xb7366e38, env=0xb73977f0) at eval.i.c:609
#13 0x0014dfbb in ceval (x=0xb73613d0, env=0x95b7020) at eval.i.c:609
#14 0x0015a21d in scm_call_0 (proc=0x95b7988) at eval.c:3053


-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-26 20:23     ` Andy Wingo
  2008-08-27  0:41       ` Han-Wen Nienhuys
  2008-08-27  2:36       ` Han-Wen Nienhuys
@ 2008-08-27  7:46       ` Ludovic Courtès
  2008-08-27 13:14         ` Julian Graham
  2008-08-31 20:08       ` Ludovic Courtès
  3 siblings, 1 reply; 24+ messages in thread
From: Ludovic Courtès @ 2008-08-27  7:46 UTC (permalink / raw)
  To: guile-devel

Hi,

Andy Wingo <wingo@pobox.com> writes:

> On Sat 16 Aug 2008 11:45, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>> Julian Graham escreveu:
>>> Hmmm... I don't recall seeing those when I was writing that test
>>> suite.  Just to be clear, were you getting those errors before making
>>> your changes?
>>
>> No, but some very unrelated changes made them go away again. 
>
> I still get that error, having merged master into vm. Do you have other
> fixes?

I've seen `srfi-18.test' hang from time to time, but not often enough to
give me an incentive to nail it down.  :-(  I don't think it relates to
Han-Wen's GC changes.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-27  7:46       ` Ludovic Courtès
@ 2008-08-27 13:14         ` Julian Graham
  2008-08-30 23:05           ` Julian Graham
  0 siblings, 1 reply; 24+ messages in thread
From: Julian Graham @ 2008-08-27 13:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

> I've seen `srfi-18.test' hang from time to time, but not often enough to
> give me an incentive to nail it down.  :-(  I don't think it relates to
> Han-Wen's GC changes.


Crap, I'm seeing some lockups now, too.  Sorry, guys.  I'm debugging,
but don't let that stop you from investigating as well.  ;)


Regards,
Julian




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-27 13:14         ` Julian Graham
@ 2008-08-30 23:05           ` Julian Graham
  2008-08-31  1:49             ` Han-Wen Nienhuys
  2008-08-31 12:58             ` Ludovic Courtès
  0 siblings, 2 replies; 24+ messages in thread
From: Julian Graham @ 2008-08-30 23:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]

Okay, I think I know what the problem is: Part of the SRFI-18 thread
start / creation process involves contention for a mutex, and there's
a bug in fat_mutex_lock code that causes the locking thread to
sometimes miss an unlocking thread's notification that a mutex is
available.  So it's actually a mutex bug -- specifically, in the loop
code in fat_mutex_lock that ends with the following snippet:

      ...
          scm_i_pthread_mutex_unlock (&m->lock);
          SCM_TICK;
          scm_i_scm_pthread_mutex_lock (&m->lock);
        }
      block_self (m->waiting, mutex, &m->lock, timeout);

...which means that if the loop is entered while the mutex is still
locked but the owner unlocks it after the locking thread releases the
administrative lock to run the tick, the locking thread will sleep
forever because it doesn't re-check the state of the mutex.  I've made
a small change (blocking before doing the tick instead of after) that
seems to resolve the issue (so far no lock-ups using Han-Wen's x.test
for a couple of hours).  There's a patch attached.

(Sorry, should have noticed this earlier; the problem existed before
the changes I introduced to support SRFI-18...)

Regards,
Julian

On Wed, Aug 27, 2008 at 9:14 AM, Julian Graham <joolean@gmail.com> wrote:
>> I've seen `srfi-18.test' hang from time to time, but not often enough to
>> give me an incentive to nail it down.  :-(  I don't think it relates to
>> Han-Wen's GC changes.
>
>
> Crap, I'm seeing some lockups now, too.  Sorry, guys.  I'm debugging,
> but don't let that stop you from investigating as well.  ;)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Resolve-a-deadlock-caused-by-not-checking-mutex-stat.patch --]
[-- Type: text/x-diff; name=0001-Resolve-a-deadlock-caused-by-not-checking-mutex-stat.patch, Size: 1352 bytes --]

From 12a5c7ca5e0bec9386ee24b2e2320d1aa03e55d5 Mon Sep 17 00:00:00 2001
From: Julian Graham <julian@countyhell.(none)>
Date: Sat, 30 Aug 2008 19:03:21 -0400
Subject: [PATCH] Resolve a deadlock caused by not checking mutex state after calling
 `SCM_TICK'.

---
 libguile/ChangeLog |    5 +++++
 libguile/threads.c |    2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/libguile/ChangeLog b/libguile/ChangeLog
index e8d9362..40e2bb4 100644
--- a/libguile/ChangeLog
+++ b/libguile/ChangeLog
@@ -1,3 +1,8 @@
+2008-08-29  Julian Graham  <joolean@gmail.com>
+
+	* threads.c (fat_mutex_lock): Resolve a deadlock caused by not
+	checking mutex state after calling `SCM_TICK'.	
+
 2008-08-27  Ludovic Courtès  <ludo@gnu.org>

 	Fix builds `--without-threads'.  Reported by Han-Wen Nienhuys
diff --git a/libguile/threads.c b/libguile/threads.c
index 7e55f3b..8699fd0 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1292,11 +1292,11 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 		  break;
 		}
 	    }
+	  block_self (m->waiting, mutex, &m->lock, timeout);
 	  scm_i_pthread_mutex_unlock (&m->lock);
 	  SCM_TICK;
 	  scm_i_scm_pthread_mutex_lock (&m->lock);
 	}
-      block_self (m->waiting, mutex, &m->lock, timeout);
     }
   scm_i_pthread_mutex_unlock (&m->lock);
   return err;
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-30 23:05           ` Julian Graham
@ 2008-08-31  1:49             ` Han-Wen Nienhuys
  2008-08-31  2:54               ` Julian Graham
  2008-08-31 10:34               ` Ludovic Courtès
  2008-08-31 12:58             ` Ludovic Courtès
  1 sibling, 2 replies; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-31  1:49 UTC (permalink / raw)
  To: guile-devel

Julian Graham escreveu:
> Okay, I think I know what the problem is: Part of the SRFI-18 thread
> start / creation process involves contention for a mutex, and there's
> a bug in fat_mutex_lock code that causes the locking thread to
> sometimes miss an unlocking thread's notification that a mutex is
> available.  So it's actually a mutex bug -- specifically, in the loop
> code in fat_mutex_lock that ends with the following snippet:
> 
>       ...
>           scm_i_pthread_mutex_unlock (&m->lock);
>           SCM_TICK;
>           scm_i_scm_pthread_mutex_lock (&m->lock);
>         }
>       block_self (m->waiting, mutex, &m->lock, timeout);
> 
> ...which means that if the loop is entered while the mutex is still
> locked but the owner unlocks it after the locking thread releases the
> administrative lock to run the tick, the locking thread will sleep
> forever because it doesn't re-check the state of the mutex.  I've made
> a small change (blocking before doing the tick instead of after) that
> seems to resolve the issue (so far no lock-ups using Han-Wen's x.test
> for a couple of hours).  There's a patch attached.
> 
> (Sorry, should have noticed this earlier; the problem existed before
> the changes I introduced to support SRFI-18...)

Would this also explain the 'corruption' in the evaluator we have been 
seeing ("bad bindings at .. ")?

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31  1:49             ` Han-Wen Nienhuys
@ 2008-08-31  2:54               ` Julian Graham
  2008-08-31 13:52                 ` Han-Wen Nienhuys
  2008-08-31 10:34               ` Ludovic Courtès
  1 sibling, 1 reply; 24+ messages in thread
From: Julian Graham @ 2008-08-31  2:54 UTC (permalink / raw)
  To: hanwen; +Cc: guile-devel

> Would this also explain the 'corruption' in the evaluator we have been
> seeing ("bad bindings at .. ")?


I don't think so; in fact, I just got one of those errors while
looping your test code.  I think it's something else, unrelated to the
patch I just sent in.  I'll keep looking at it.  You have any luck
narrowing things down with Helgrind?


Regards,
Julian




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31  1:49             ` Han-Wen Nienhuys
  2008-08-31  2:54               ` Julian Graham
@ 2008-08-31 10:34               ` Ludovic Courtès
  1 sibling, 0 replies; 24+ messages in thread
From: Ludovic Courtès @ 2008-08-31 10:34 UTC (permalink / raw)
  To: guile-devel

Hi,

Han-Wen Nienhuys <hanwen@xs4all.nl> writes:

> Would this also explain the 'corruption' in the evaluator we have been 
> seeing ("bad bindings at .. ")?

I've seen that after switching branches.  Somehow, that must be the
result of files that did not get recompiled, something like that.
Running "make clean all" fixes it.

Thanks,
Ludo'.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-30 23:05           ` Julian Graham
  2008-08-31  1:49             ` Han-Wen Nienhuys
@ 2008-08-31 12:58             ` Ludovic Courtès
  2008-08-31 15:05               ` Julian Graham
  1 sibling, 1 reply; 24+ messages in thread
From: Ludovic Courtès @ 2008-08-31 12:58 UTC (permalink / raw)
  To: guile-devel

Hi Julian,

"Julian Graham" <joolean@gmail.com> writes:

>       ...
>           scm_i_pthread_mutex_unlock (&m->lock);
>           SCM_TICK;
>           scm_i_scm_pthread_mutex_lock (&m->lock);
>         }
>       block_self (m->waiting, mutex, &m->lock, timeout);
>
> ...which means that if the loop is entered while the mutex is still
> locked but the owner unlocks it after the locking thread releases the
> administrative lock to run the tick, the locking thread will sleep
> forever because it doesn't re-check the state of the mutex.  I've made
> a small change (blocking before doing the tick instead of after) that
> seems to resolve the issue (so far no lock-ups using Han-Wen's x.test
> for a couple of hours).  There's a patch attached.

I think I understand your description, assuming "the mutex" is M, "the
administrative lock" is `M->lock', and "the state" is the rest of the
`fat_mutex' structure.

Let me rephrase it: what can happen is that, during the tick, another
thread could actually take M, increase `M->level' and mark itself as the
owner.  After the tick, our primary thread takes `M->lock' back,
thinking it now owns M, and goes to sleep; but M is actually already
taken by that other thread, so our primary thread never wakes up.  (Not
sure this description is any clearer...)

I guess it can be applied to 1.8 as well?

Another question: why is there this mixture of `scm_i_pthread' and
`scm_i_scm_pthread' calls?

Thanks for tracking it down!

Ludo'.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31  2:54               ` Julian Graham
@ 2008-08-31 13:52                 ` Han-Wen Nienhuys
  0 siblings, 0 replies; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-31 13:52 UTC (permalink / raw)
  To: guile-devel

Julian Graham escreveu:
>> Would this also explain the 'corruption' in the evaluator we have been
>> seeing ("bad bindings at .. ")?
> 
> 
> I don't think so; in fact, I just got one of those errors while
> looping your test code.  I think it's something else, unrelated to the
> patch I just sent in.  I'll keep looking at it.  You have any luck
> narrowing things down with Helgrind?

No, there is no consistent lock ordering in Guile at all, so Helgrind gives
1000s of errors.

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 12:58             ` Ludovic Courtès
@ 2008-08-31 15:05               ` Julian Graham
  2008-08-31 19:39                 ` Ludovic Courtès
  0 siblings, 1 reply; 24+ messages in thread
From: Julian Graham @ 2008-08-31 15:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludovic,

> Let me rephrase it: what can happen is that, during the tick, another
> thread could actually take M, increase `M->level' and mark itself as the
> owner.  After the tick, our primary thread takes `M->lock' back,
> thinking it now owns M, and goes to sleep; but M is actually already
> taken by that other thread, so our primary thread never wakes up.  (Not
> sure this description is any clearer...)

Almost, but not quite.  Let me try again:

Thread A wants to lock fat_mutex M.  It seizes the administrative lock
M->lock and examines the state of M.  M is held by thread B, so thread
A prepares to put itself onto the blocking queue for M by calling
`SCM_TICK'.  In order to call `SCM_TICK', thread A must temporarily
release M->lock.

When it does this, thread B, the owner of M, seizes M->lock and
releases M, which involves waking up the next waiting thread on the
blocking queue for M -- but thread A hasn't finished doing the tick
and so isn't on the blocking queue.  Thread B releases M->lock and
goes about its business.

Thread A finishes the tick and seizes M->lock again and adds itself to
the blocking queue for M without re-examining M's state.  The only way
thread A can ever wake up after this is if another thread locks and
releases M.

> I guess it can be applied to 1.8 as well?

I would say so, yes.  I'll make a patch against it if you tell me how
to do that with git.  :)

> Another question: why is there this mixture of `scm_i_pthread' and
> `scm_i_scm_pthread' calls?

The scm_i_pthread_* functions are actually preprocessor #defines that
map directly onto pthreads API functions.  The scm_i_scm_pthread_*
functions are wrappers around pthreads functions that could block --
the wrappers leave Guile mode before calling into pthreads.
pthread_mutex_lock can block, so from Guile mode (e.g., from
fat_mutex_lock), it needs to be called via
scm_i_scm_pthread_mutex_lock; but pthread_mutex_unlock can't block, so
it can be called directly via scm_i_pthread_mutex_unlock.

Is that what you were asking?

Regards,
Julian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 15:05               ` Julian Graham
@ 2008-08-31 19:39                 ` Ludovic Courtès
  2008-09-07  0:12                   ` Julian Graham
  0 siblings, 1 reply; 24+ messages in thread
From: Ludovic Courtès @ 2008-08-31 19:39 UTC (permalink / raw)
  To: guile-devel

Hey!

"Julian Graham" <joolean@gmail.com> writes:

> Thread A wants to lock fat_mutex M.  It seizes the administrative lock
> M->lock and examines the state of M.  M is held by thread B, so thread
> A prepares to put itself onto the blocking queue for M by calling
> `SCM_TICK'.  In order to call `SCM_TICK', thread A must temporarily
> release M->lock.
>
> When it does this, thread B, the owner of M, seizes M->lock and
> releases M, which involves waking up the next waiting thread on the
> blocking queue for M -- but thread A hasn't finished doing the tick
> and so isn't on the blocking queue.  Thread B releases M->lock and
> goes about its business.
>
> Thread A finishes the tick and seizes M->lock again and adds itself to
> the blocking queue for M without re-examining M's state.  The only way
> thread A can ever wake up after this is if another thread locks and
> releases M.

Thanks for the clarification, that's what I was trying to express.  ;-)

I just pushed it to `master'.

>> I guess it can be applied to 1.8 as well?
>
> I would say so, yes.  I'll make a patch against it if you tell me how
> to do that with git.  :)

The easiest way is to "cherry-pick" the change.  So, assuming the commit
is at the tip of `master', you'd do:

  $ git checkout branch_release-1-8
  $ git cherry-pick master

  ... resolve conflicts...

  $ git commit -a -c THE-ID-THAT-GIT-TOLD-YOU-BEFORE

And that's it!

>> Another question: why is there this mixture of `scm_i_pthread' and
>> `scm_i_scm_pthread' calls?
>
> The scm_i_pthread_* functions are actually preprocessor #defines that
> map directly onto pthreads API functions.  The scm_i_scm_pthread_*
> functions are wrappers around pthreads functions that could block --
> the wrappers leave Guile mode before calling into pthreads.
> pthread_mutex_lock can block, so from Guile mode (e.g., from
> fat_mutex_lock), it needs to be called via
> scm_i_scm_pthread_mutex_lock; but pthread_mutex_unlock can't block, so
> it can be called directly via scm_i_pthread_mutex_unlock.
>
> Is that what you were asking?

Yes, exactly.  I had overlooked this.

Thanks!

Ludo'.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-26 20:23     ` Andy Wingo
                         ` (2 preceding siblings ...)
  2008-08-27  7:46       ` Ludovic Courtès
@ 2008-08-31 20:08       ` Ludovic Courtès
  2008-08-31 23:59         ` Han-Wen Nienhuys
  2008-09-01  0:18         ` Han-Wen Nienhuys
  3 siblings, 2 replies; 24+ messages in thread
From: Ludovic Courtès @ 2008-08-31 20:08 UTC (permalink / raw)
  To: guile-devel

Hello,

Andy Wingo <wingo@pobox.com> writes:

> ERROR: srfi-18.test: thread-start!:
>   thread activates only after start
>    - arguments: ((syntax-error "memoization"
>                   "In file ~S, line ~S: ~A ~S in expression ~S."
>                   ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>                    "Bad binding" ct
>                     (let (ct (current-thread))
>                     #@let (or (hashq-ref thread-exception-handlers ct)
>                           (hashq-set! thread-exception-handlers ct
>                     (list initial-handler))))) #f))

I'm seeing this as well, but it's a `#@let*' here (single-binding `let's
are memoized as `#@let*'):

  ((syntax-error "memoization"
                 "In file ~S, line ~S: ~A ~S in expression ~S."
                 ("/home/ludo/src/guile/srfi/srfi-18.scm" 138
                  "Bad binding"
                  ct
                  (#@let* (ct (#<variable b7d28110 value: #<primitive-procedure current-thread>>))
                    (#@or (#<variable b7d2ad88 value: #<primitive-procedure hashq-ref>>
                                      #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0)
                          (#<variable b7d2adc0 value: #<primitive-procedure hashq-set!>> #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0 (#<variable b7d2c498 value: #<primitive-procedure list>> #<variable 839d130 value: #<procedure initial-handler (obj)>>))
                          )))
                 #f))

It can be reproduced, but very infrequently, with this program:

  (use-modules (ice-9 threads))

  (define (foo x y)
    (let ((z (+ x y)))
      (let ((a (+ z 1)))
        (let ((b (- a 2)))
          (let ((c (* b 3)))
            c)))))

  (define (entry)
    (foo 1 2))

  (for-each (lambda (i) (make-thread entry))
            (iota 123))

My explanation is that the `let*' memoizer, aka. `scm_m_letstar ()', is
not thread-safe; it's clearly not atomic, and it's of course not
protected by a mutex or so.

I can't think of any simple fix.  `scm_m_letstar ()' could be made
atomic by having it duplicate the input list instead of modifying it
directly; it could then atomically update the input.  However,
allocating cells during memoization wouldn't be a good idea
performance-wise.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 20:08       ` Ludovic Courtès
@ 2008-08-31 23:59         ` Han-Wen Nienhuys
  2008-09-01  8:11           ` Ludovic Courtès
  2008-09-01  0:18         ` Han-Wen Nienhuys
  1 sibling, 1 reply; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-31 23:59 UTC (permalink / raw)
  To: guile-devel; +Cc: Marius Vollmer

Ludovic Courtès escreveu:
> Hello,
> 
> Andy Wingo <wingo@pobox.com> writes:
> 
>> ERROR: srfi-18.test: thread-start!:
>>   thread activates only after start
>>    - arguments: ((syntax-error "memoization"
>>                   "In file ~S, line ~S: ~A ~S in expression ~S."
>>                   ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>>                    "Bad binding" ct
>>                     (let (ct (current-thread))
>>                     #@let (or (hashq-ref thread-exception-handlers ct)
>>                           (hashq-set! thread-exception-handlers ct
>>                     (list initial-handler))))) #f))
> 
> I'm seeing this as well, but it's a `#@let*' here (single-binding `let's
> are memoized as `#@let*'):
> 
>   ((syntax-error "memoization"
>                  "In file ~S, line ~S: ~A ~S in expression ~S."
>                  ("/home/ludo/src/guile/srfi/srfi-18.scm" 138
>                   "Bad binding"
>                   ct
>                   (#@let* (ct (#<variable b7d28110 value: #<primitive-procedure current-thread>>))
>                     (#@or (#<variable b7d2ad88 value: #<primitive-procedure hashq-ref>>
>                                       #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0)
>                           (#<variable b7d2adc0 value: #<primitive-procedure hashq-set!>> #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0 (#<variable b7d2c498 value: #<primitive-procedure list>> #<variable 839d130 value: #<procedure initial-handler (obj)>>))
>                           )))
>                  #f))
> 
> It can be reproduced, but very infrequently, with this program:
> 
>   (use-modules (ice-9 threads))
> 
>   (define (foo x y)
>     (let ((z (+ x y)))
>       (let ((a (+ z 1)))
>         (let ((b (- a 2)))
>           (let ((c (* b 3)))
>             c)))))
> 
>   (define (entry)
>     (foo 1 2))
> 
>   (for-each (lambda (i) (make-thread entry))
>             (iota 123))
> 
> My explanation is that the `let*' memoizer, aka. `scm_m_letstar ()', is
> not thread-safe; it's clearly not atomic, and it's of course not
> protected by a mutex or so.
> 
> I can't think of any simple fix.  `scm_m_letstar ()' could be made
> atomic by having it duplicate the input list instead of modifying it
> directly; it could then atomically update the input.  However,
> allocating cells during memoization wouldn't be a good idea
> performance-wise.

I don't understand: memoization is only supposed to happen once for
each piece of code, right?  So, the cost of it is not that interesting?

I remember seeing a very scary looking explanation in eval.c about the
evaluator being unlocked but still thread-safe since the result of memoizing
was supposed to be confluent (ie. duplicate runs would yield independent
results.)

/* The Lookup Car Race
    - by Eva Luator

This was added by Marius Vollmer, but at the time, GUILE did not support
real posix threads, so any problem may not have manifested itself before.

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 20:08       ` Ludovic Courtès
  2008-08-31 23:59         ` Han-Wen Nienhuys
@ 2008-09-01  0:18         ` Han-Wen Nienhuys
  2008-09-03  4:56           ` Han-Wen Nienhuys
  1 sibling, 1 reply; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-09-01  0:18 UTC (permalink / raw)
  To: guile-devel

Ludovic Courtès escreveu:
> Hello,
> 
> Andy Wingo <wingo@pobox.com> writes:
> 
>> ERROR: srfi-18.test: thread-start!:
>>   thread activates only after start
>>    - arguments: ((syntax-error "memoization"
>>                   "In file ~S, line ~S: ~A ~S in expression ~S."
>>                   ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>>                    "Bad binding" ct
>>                     (let (ct (current-thread))
>>                     #@let (or (hashq-ref thread-exception-handlers ct)
>>                           (hashq-set! thread-exception-handlers ct
>>                     (list initial-handler))))) #f))
> 
> I'm seeing this as well, but it's a `#@let*' here (single-binding `let's
> are memoized as `#@let*'):
> 
>   ((syntax-error "memoization"
>                  "In file ~S, line ~S: ~A ~S in expression ~S."
>                  ("/home/ludo/src/guile/srfi/srfi-18.scm" 138
>                   "Bad binding"
>                   ct
>                   (#@let* (ct (#<variable b7d28110 value: #<primitive-procedure current-thread>>))
>                     (#@or (#<variable b7d2ad88 value: #<primitive-procedure hashq-ref>>
>                                       #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0)
>                           (#<variable b7d2adc0 value: #<primitive-procedure hashq-set!>> #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0 (#<variable b7d2c498 value: #<primitive-procedure list>> #<variable 839d130 value: #<procedure initial-handler (obj)>>))
>                           )))
>                  #f))
> 
> It can be reproduced, but very infrequently, with this program:
> 
>   (use-modules (ice-9 threads))
> 
>   (define (foo x y)
>     (let ((z (+ x y)))
>       (let ((a (+ z 1)))
>         (let ((b (- a 2)))
>           (let ((c (* b 3)))
>             c)))))
> 
>   (define (entry)
>     (foo 1 2))
> 
>   (for-each (lambda (i) (make-thread entry))
>             (iota 123))
> 
> My explanation is that the `let*' memoizer, aka. `scm_m_letstar ()', is
> not thread-safe; it's clearly not atomic, and it's of course not
> protected by a mutex or so.

Is that the only one?

  SCM
  scm_m_let (SCM expr, SCM env)
  ...
      /* plain let */
      SCM rvariables;
      SCM inits;
      transform_bindings (bindings, expr, &rvariables, &inits);

      {
        const SCM new_body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
        const SCM new_tail = scm_cons2 (rvariables, inits, new_body);
        SCM_SETCAR (expr, SCM_IM_LET); 
        // ****!!!
        SCM_SETCDR (expr, new_tail);

What happens if another thread tries to evaluate expr at the place marked ****!!! ?

At the very least, we should have an atomic SCM_SETCELL() which overwrites car and
cdr atomically.

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 23:59         ` Han-Wen Nienhuys
@ 2008-09-01  8:11           ` Ludovic Courtès
  0 siblings, 0 replies; 24+ messages in thread
From: Ludovic Courtès @ 2008-09-01  8:11 UTC (permalink / raw)
  To: guile-devel

Hi,

Han-Wen Nienhuys <hanwen@xs4all.nl> writes:

> I don't understand: memoization is only supposed to happen once for
> each piece of code, right?  So, the cost of it is not that interesting?

Yes, it's done only once, but if a piece of code hasn't yet been
memoized and is called simultaneously by several threads, then we have a
problem.

> I remember seeing a very scary looking explanation in eval.c about the
> evaluator being unlocked but still thread-safe since the result of memoizing
> was supposed to be confluent (ie. duplicate runs would yield independent
> results.)
>
> /* The Lookup Car Race
>     - by Eva Luator

That comment only relates to `scm_lookupcar ()'...

> Is that the only one?
>
>   SCM
>   scm_m_let (SCM expr, SCM env)
>   ...
>       /* plain let */
>       SCM rvariables;
>       SCM inits;
>       transform_bindings (bindings, expr, &rvariables, &inits);
>
>       {
>         const SCM new_body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
>         const SCM new_tail = scm_cons2 (rvariables, inits, new_body);
>         SCM_SETCAR (expr, SCM_IM_LET); 
>         // ****!!!
>         SCM_SETCDR (expr, new_tail);
>
> What happens if another thread tries to evaluate expr at the place marked ****!!! ?

You're right, troubles all around!

> At the very least, we should have an atomic SCM_SETCELL() which overwrites car and
> cdr atomically.

I'm not sure whether than can be done, since that's two machine words.
At any rate, that wouldn't be sufficient, e.g., to fix `scm_m_letstar ()'.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-09-01  0:18         ` Han-Wen Nienhuys
@ 2008-09-03  4:56           ` Han-Wen Nienhuys
  2008-09-04 18:12             ` Andy Wingo
  0 siblings, 1 reply; 24+ messages in thread
From: Han-Wen Nienhuys @ 2008-09-03  4:56 UTC (permalink / raw)
  To: guile-devel; +Cc: Marius Vollmer

Han-Wen Nienhuys escreveu:

>>> ERROR: srfi-18.test: thread-start!:
>>>   thread activates only after start
>>>    - arguments: ((syntax-error "memoization"
>>>                   "In file ~S, line ~S: ~A ~S in expression ~S."
>>>                   ("/home/lilydev/vc/guile/srfi/srfi-18.scm" 135
>>>                    "Bad binding" ct
>>>                     (let (ct (current-thread))
>>>                     #@let (or (hashq-ref thread-exception-handlers ct)
>>>                           (hashq-set! thread-exception-handlers ct
>>>                     (list initial-handler))))) #f))
>> I'm seeing this as well, but it's a `#@let*' here (single-binding `let's
>> are memoized as `#@let*'):
>>
>>   ((syntax-error "memoization"
>>                  "In file ~S, line ~S: ~A ~S in expression ~S."
>>                  ("/home/ludo/src/guile/srfi/srfi-18.scm" 138
>>                   "Bad binding"
>>                   ct
>>                   (#@let* (ct (#<variable b7d28110 value: #<primitive-procedure current-thread>>))
>>                     (#@or (#<variable b7d2ad88 value: #<primitive-procedure hashq-ref>>
>>                                       #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0)
>>                           (#<variable b7d2adc0 value: #<primitive-procedure hashq-set!>> #<variable 839df08 value: #<weak-key-hash-table 1/31>> #@0-0 (#<variable b7d2c498 value: #<primitive-procedure list>> #<variable 839d130 value: #<procedure initial-handler (obj)>>))
>>                           )))
>>                  #f))
>>
>> It can be reproduced, but very infrequently, with this program:
>>
>>   (use-modules (ice-9 threads))
>>
>>   (define (foo x y)
>>     (let ((z (+ x y)))
>>       (let ((a (+ z 1)))
>>         (let ((b (- a 2)))
>>           (let ((c (* b 3)))
>>             c)))))
>>
>>   (define (entry)
>>     (foo 1 2))
>>
>>   (for-each (lambda (i) (make-thread entry))
>>             (iota 123))
>>
>> My explanation is that the `let*' memoizer, aka. `scm_m_letstar ()', is
>> not thread-safe; it's clearly not atomic, and it's of course not
>> protected by a mutex or so.
> 
> Is that the only one?
> 
>   SCM
>   scm_m_let (SCM expr, SCM env)
>   ...
>       /* plain let */
>       SCM rvariables;
>       SCM inits;
>       transform_bindings (bindings, expr, &rvariables, &inits);
> 
>       {
>         const SCM new_body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
>         const SCM new_tail = scm_cons2 (rvariables, inits, new_body);
>         SCM_SETCAR (expr, SCM_IM_LET); 
>         // ****!!!
>         SCM_SETCDR (expr, new_tail);
> 
> What happens if another thread tries to evaluate expr at the place marked ****!!! ?
> 
> At the very least, we should have an atomic SCM_SETCELL() which overwrites car and
> cdr atomically.

Anyone?  Does anyone still understand how the evaluator works?  (if not, let's move 
to the VM earlier than later.)

-- 
 Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-09-03  4:56           ` Han-Wen Nienhuys
@ 2008-09-04 18:12             ` Andy Wingo
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Wingo @ 2008-09-04 18:12 UTC (permalink / raw)
  To: hanwen; +Cc: Marius Vollmer, guile-devel

Hello!

On Tue 02 Sep 2008 21:56, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:

>>   SCM
>>   scm_m_let (SCM expr, SCM env)
>>   ...
>>       /* plain let */
>>       SCM rvariables;
>>       SCM inits;
>>       transform_bindings (bindings, expr, &rvariables, &inits);
>> 
>>       {
>>         const SCM new_body = m_body (SCM_IM_LET, SCM_CDR (cdr_expr));
>>         const SCM new_tail = scm_cons2 (rvariables, inits, new_body);
>>         SCM_SETCAR (expr, SCM_IM_LET); 
>>         // ****!!!
>>         SCM_SETCDR (expr, new_tail);
>> 
>> What happens if another thread tries to evaluate expr at the place marked ****!!! ?
>> 
>> At the very least, we should have an atomic SCM_SETCELL() which overwrites car and
>> cdr atomically.
>
> Anyone?  Does anyone still understand how the evaluator works?

It sounds to me like you're right; and it sounds very similar to the
lookup car race. Anything that mutates the code structure in place
(memoized or not) needs to be atomic; we can rely on all threads
reaching the same conclusion, as in the lookup car race, but the actual
mutation should happen atomically.

>  (if not, let's move 
> to the VM earlier than later.)

I think we have much more experience with the bugs of the interpreter
than the bugs of the vm ;-)

Seriously, it is good to have both -- for bug checking, sanity testing,
and avoiding circularity. Otherwise we have to start distributing .go
files. Guile's interpreter is fantastic, and there's no reason to throw
it out -- although we may want to make it simpler, and push much of the
optimization efforts into the vm.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-08-31 19:39                 ` Ludovic Courtès
@ 2008-09-07  0:12                   ` Julian Graham
  2008-09-08 12:37                     ` Ludovic Courtès
  0 siblings, 1 reply; 24+ messages in thread
From: Julian Graham @ 2008-09-07  0:12 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

Hi Ludovic,

> The easiest way is to "cherry-pick" the change.  So, assuming the commit
> is at the tip of `master', you'd do:
>
>  $ git checkout branch_release-1-8
>  $ git cherry-pick master
>
>  ... resolve conflicts...
>
>  $ git commit -a -c THE-ID-THAT-GIT-TOLD-YOU-BEFORE
>
> And that's it!

Hmm... that didn't quite seem to work.  After some googling, I
discovered the following, which -- I think -- did the trick:

$ git checkout --tracking -b branch_release-1-8 origin/branch_release-1-8

...And then I just reapplied the fix by hand because it was only one
line.  :)  A patch against the branch is attached.  Let me know if
there are any problems with it.


Regards,
Julian

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Resolve-a-deadlock-caused-by-not-checking-mutex-stat.patch --]
[-- Type: text/x-diff; name=0001-Resolve-a-deadlock-caused-by-not-checking-mutex-stat.patch, Size: 1275 bytes --]

From becf94cd88834d4dbb729346f34d52ca20b50d34 Mon Sep 17 00:00:00 2001
From: Julian Graham <julian@countyhell.(none)>
Date: Sat, 6 Sep 2008 20:06:17 -0400
Subject: [PATCH] Resolve a deadlock caused by not checking mutex state after calling
 `SCM_TICK'.

---
 libguile/ChangeLog |    5 +++++
 libguile/threads.c |    2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/libguile/ChangeLog b/libguile/ChangeLog
index 5550b7a..75cd283 100644
--- a/libguile/ChangeLog
+++ b/libguile/ChangeLog
@@ -1,3 +1,8 @@
+2008-09-06  Julian Graham  <joolean@gmail.com>
+
+	* threads.c (fat_mutex_lock): Resolve a deadlock caused by not
+	checking mutex state after calling `SCM_TICK'.
+
 2008-08-20  Ludovic Courtès  <ludo@gnu.org>
 
 	* eval.c, filesys.c, gc.c, numbers.c, posix.c, srfi-14.c,
diff --git a/libguile/threads.c b/libguile/threads.c
index 843dfca..dede85e 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1030,9 +1030,9 @@ fat_mutex_lock (SCM mutex)
     {
       while (1)
 	{
-	  block_self (m->waiting, mutex, &m->lock, NULL);
 	  if (scm_is_eq (m->owner, thread))
 	    break;
+	  block_self (m->waiting, mutex, &m->lock, NULL);
 	  scm_i_pthread_mutex_unlock (&m->lock);
 	  SCM_TICK;
 	  scm_i_scm_pthread_mutex_lock (&m->lock);
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Race condition in threading code?
  2008-09-07  0:12                   ` Julian Graham
@ 2008-09-08 12:37                     ` Ludovic Courtès
  0 siblings, 0 replies; 24+ messages in thread
From: Ludovic Courtès @ 2008-09-08 12:37 UTC (permalink / raw)
  To: guile-devel

Hi Julian,

"Julian Graham" <joolean@gmail.com> writes:

> ...And then I just reapplied the fix by hand because it was only one
> line.  :)  A patch against the branch is attached.  Let me know if
> there are any problems with it.

Perfect, thanks!  I added a `NEWS' entry as well.

Ludo'.





^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-09-08 12:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-16 18:21 Race condition in threading code? Han-Wen Nienhuys
2008-08-16 18:42 ` Julian Graham
2008-08-16 18:45   ` Han-Wen Nienhuys
2008-08-26 20:23     ` Andy Wingo
2008-08-27  0:41       ` Han-Wen Nienhuys
2008-08-27  2:36       ` Han-Wen Nienhuys
2008-08-27  7:46       ` Ludovic Courtès
2008-08-27 13:14         ` Julian Graham
2008-08-30 23:05           ` Julian Graham
2008-08-31  1:49             ` Han-Wen Nienhuys
2008-08-31  2:54               ` Julian Graham
2008-08-31 13:52                 ` Han-Wen Nienhuys
2008-08-31 10:34               ` Ludovic Courtès
2008-08-31 12:58             ` Ludovic Courtès
2008-08-31 15:05               ` Julian Graham
2008-08-31 19:39                 ` Ludovic Courtès
2008-09-07  0:12                   ` Julian Graham
2008-09-08 12:37                     ` Ludovic Courtès
2008-08-31 20:08       ` Ludovic Courtès
2008-08-31 23:59         ` Han-Wen Nienhuys
2008-09-01  8:11           ` Ludovic Courtès
2008-09-01  0:18         ` Han-Wen Nienhuys
2008-09-03  4:56           ` Han-Wen Nienhuys
2008-09-04 18:12             ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).