Okay, I think I know what the problem is: Part of the SRFI-18 thread start / creation process involves contention for a mutex, and there's a bug in fat_mutex_lock code that causes the locking thread to sometimes miss an unlocking thread's notification that a mutex is available. So it's actually a mutex bug -- specifically, in the loop code in fat_mutex_lock that ends with the following snippet: ... scm_i_pthread_mutex_unlock (&m->lock); SCM_TICK; scm_i_scm_pthread_mutex_lock (&m->lock); } block_self (m->waiting, mutex, &m->lock, timeout); ...which means that if the loop is entered while the mutex is still locked but the owner unlocks it after the locking thread releases the administrative lock to run the tick, the locking thread will sleep forever because it doesn't re-check the state of the mutex. I've made a small change (blocking before doing the tick instead of after) that seems to resolve the issue (so far no lock-ups using Han-Wen's x.test for a couple of hours). There's a patch attached. (Sorry, should have noticed this earlier; the problem existed before the changes I introduced to support SRFI-18...) Regards, Julian On Wed, Aug 27, 2008 at 9:14 AM, Julian Graham wrote: >> I've seen `srfi-18.test' hang from time to time, but not often enough to >> give me an incentive to nail it down. :-( I don't think it relates to >> Han-Wen's GC changes. > > > Crap, I'm seeing some lockups now, too. Sorry, guys. I'm debugging, > but don't let that stop you from investigating as well. ;)