unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Terrific Dead Lock
@ 2008-03-13 22:29 Ludovic Courtès
  2008-03-14 10:24 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2008-03-13 22:29 UTC (permalink / raw)
  To: guile-devel

Hello,

I'm experiencing a dead lock while running the test suite (in a NixOS
build), and I don't remember ever seeing it before.  Sorry for the long
copy/paste, but it helped me understand the problem as I was writing
this message.

Here we go:

(gdb) info threads 
* 3 Thread 0x40b70b90 (LWP 6675)  0xffffe410 in ?? ()
  2 Thread 0x416d3b90 (LWP 6853)  0xffffe410 in ?? ()
  1 Thread 0x402da8d0 (LWP 5049)  0xffffe410 in ?? ()

(gdb) thread 1
[Switching to thread 1 (Thread 0x402da8d0 (LWP 5049))]#0  0xffffe410 in ?? ()
(gdb) bt
#0  0xffffe410 in ?? ()
#1  0xbfbc3e58 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000080 in ?? ()
#4  0x401912b9 in __lll_lock_wait () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5  0x4018c9d6 in _L_lock_95 () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#6  0x4018c3ba in pthread_mutex_lock () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#7  0x400bb6fb in scm_i_thread_put_to_sleep () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8  0x40069159 in scm_i_gc () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9  0x4006afbe in increase_mtrigger () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x4009d8be in scm_make_srcprops () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x400977d9 in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x40097622 in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#14 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#15 0x4009769e in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#16 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#17 0x4009769e in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#18 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#19 0x4007d8da in scm_primitive_load () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#20 0x40062ed3 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#21 0x4004dc2b in scm_start_stack () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#22 0x4004e3a1 in scm_m_start_stack () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#23 0x4005cb71 in scm_apply () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#24 0x40061a15 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#25 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#26 0x400664ad in apply_thunk () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#27 0x4006668e in scm_c_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#28 0x400666e5 in scm_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#29 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#30 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#31 0x40051e98 in scm_dynamic_wind () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#32 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#33 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#34 0x400664ad in apply_thunk () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#35 0x4006668e in scm_c_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#36 0x400666e5 in scm_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#37 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#38 0x40064bb6 in call_closure_1 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#39 0x4005d48e in scm_for_each () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#40 0x40062eba in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#41 0x40063156 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#42 0x40063a79 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#43 0x400648da in scm_primitive_eval_x () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#44 0x40064935 in scm_eval_x () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#45 0x4009a021 in scm_shell () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#46 0x4007a546 in invoke_main_func () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#47 0x4004c492 in c_body () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#48 0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#49 0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#50 0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#51 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#52 0x400bce6e in scm_with_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#53 0x4007a4df in scm_boot_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#54 0x08048a06 in main ()

(gdb) thread 2
[Switching to thread 2 (Thread 0x416d3b90 (LWP 6853))]#0  0xffffe410 in ?? ()
(gdb) bt
#0  0xffffe410 in ?? ()
#1  0x416d31a8 in ?? ()
#2  0x00000002 in ?? ()
#3  0x00000080 in ?? ()
#4  0x401912b9 in __lll_lock_wait () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5  0x4018c9e4 in _L_lock_236 () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#6  0x4018c43b in pthread_mutex_lock () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#7  0x400bdbed in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8  0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9  0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x400bce6e in scm_with_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x400bcec3 in on_thread_exit () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x40189dc0 in __nptl_deallocate_tsd () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#14 0x4018a189 in start_thread () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#15 0x40264dae in clone () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6

(gdb) thread 3
[Switching to thread 3 (Thread 0x40b70b90 (LWP 6675))]#0  0xffffe410 in ?? ()
(gdb) bt
#0  0xffffe410 in ?? ()
#1  0x40b6ff78 in ?? ()
#2  0x00000001 in ?? ()
#3  0x40b7005b in ?? ()
#4  0x401916cb in read () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#5  0x400988f3 in do_read_without_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#6  0x400bb7cc in scm_without_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#7  0x40098855 in signal_delivery_thread () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#8  0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#9  0x400bdde9 in scm_internal_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#10 0x400bca4d in really_spawn () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#11 0x4004c492 in c_body () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#12 0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#13 0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#14 0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#15 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#16 0x400bcddf in spawn_thread () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17
#17 0x4018a17b in start_thread () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0
#18 0x40264dae in clone () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6

All this happens apparently while reading `unif.test' (which comes right
after `time.test'):

$ sudo tail -n 3 /tmp/nix-5221-14/guile-1.8.4/check-guile.log 
PASS: time.test: strptime: in another thread after error
PASS: time.test: strptime: GNU %s format: gmtoff on GMT
PASS: time.test: strptime: GNU %s format: gmtoff on EST+5


To summarize:

  * Thread 2 is exiting.  It holds THREAD_ADMIN_MUTEX (it acquired it at
    the beginning of `do_thread_exit ()') and is waiting on
    SCM_I_CRITICAL_SECTION_MUTEX in `scm_c_catch ()'.

  * Thread 1 is reading, actually GC'ing.  It's trying to acquire
    THREAD_ADMIN_MUTEX in `scm_i_thread_put_to_sleep ()'.  It holds
    SCM_I_CRITICAL_SECTION_MUTEX from `scm_make_srcprops ()'.
    
One might wonder: why the heck does `scm_make_srcprops ()' enter a
critical section?  Could it just use a private mutex to protect accesses
to `srcprops_freelist'?

Han-Wen's reimplementation of it in HEAD (2007-01-19) doesn't use a
critical section, nor a mutex, but is thread-safe AFAIUI.

Two possibilities to fix it:

  1. Copy `srcprop.[ch]' and `eval.c' bits from HEAD to 1.8.  After all,
     it's probably solid enough (I use almost only HEAD).  See [0] for
     an overview of the initial patch.  It doesn't break the public API
     nor the ABI, but it (re)moves stuff from the `srcprop.h'.

  2. Remove the critical section from 1.8 and synchronize accesses to
     `srcprops_freelist' with a private mutex, assuming that's a correct
     fix.

I'd be in favor of the first approach.

Comments?

Thanks,
Ludovic.

[0] http://thread.gmane.org/gmane.lisp.guile.devel/6439





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-16 20:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-13 22:29 Terrific Dead Lock Ludovic Courtès
2008-03-14 10:24 ` Ludovic Courtès
2008-03-17 23:20   ` Neil Jerram
2008-03-18 21:20     ` Ludovic Courtès
2008-04-14 14:29     ` Ludovic Courtès
2008-04-15 21:06       ` Neil Jerram
2008-04-16 10:03         ` Ludovic Courtès
2008-04-16 20:29           ` Neil Jerram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).