From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Terrific Dead Lock Date: Thu, 13 Mar 2008 23:29:56 +0100 Message-ID: <8763vq6yvv.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1205447444 16370 80.91.229.12 (13 Mar 2008 22:30:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 13 Mar 2008 22:30:44 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Mar 13 23:31:12 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JZvwy-0003SS-Ob for guile-devel@m.gmane.org; Thu, 13 Mar 2008 23:31:01 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JZvwP-0005dX-Hg for guile-devel@m.gmane.org; Thu, 13 Mar 2008 18:30:25 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JZvwL-0005ar-TN for guile-devel@gnu.org; Thu, 13 Mar 2008 18:30:21 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JZvwJ-0005Y8-PD for guile-devel@gnu.org; Thu, 13 Mar 2008 18:30:21 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JZvwJ-0005Xz-J3 for guile-devel@gnu.org; Thu, 13 Mar 2008 18:30:19 -0400 Original-Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JZvwI-0004yg-RN for guile-devel@gnu.org; Thu, 13 Mar 2008 18:30:19 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1JZvw9-0001Xa-Ou for guile-devel@gnu.org; Thu, 13 Mar 2008 22:30:09 +0000 Original-Received: from reverse-83.fdn.fr ([80.67.176.83]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 13 Mar 2008 22:30:09 +0000 Original-Received: from ludo by reverse-83.fdn.fr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 13 Mar 2008 22:30:09 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 161 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: reverse-83.fdn.fr X-Revolutionary-Date: 24 =?iso-8859-1?Q?Vent=F4se?= an 216 de la =?iso-8859-1?Q?R=E9volution?= X-PGP-Key-ID: 0xEB1F5364 X-PGP-Key: http://www.laas.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: i686-pc-linux-gnu User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) Cancel-Lock: sha1:765e39Zb75c6R0nw9On254ig+WQ= X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:7070 Archived-At: Hello, I'm experiencing a dead lock while running the test suite (in a NixOS build), and I don't remember ever seeing it before. Sorry for the long copy/paste, but it helped me understand the problem as I was writing this message. Here we go: (gdb) info threads * 3 Thread 0x40b70b90 (LWP 6675) 0xffffe410 in ?? () 2 Thread 0x416d3b90 (LWP 6853) 0xffffe410 in ?? () 1 Thread 0x402da8d0 (LWP 5049) 0xffffe410 in ?? () (gdb) thread 1 [Switching to thread 1 (Thread 0x402da8d0 (LWP 5049))]#0 0xffffe410 in ?? () (gdb) bt #0 0xffffe410 in ?? () #1 0xbfbc3e58 in ?? () #2 0x00000002 in ?? () #3 0x00000080 in ?? () #4 0x401912b9 in __lll_lock_wait () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #5 0x4018c9d6 in _L_lock_95 () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #6 0x4018c3ba in pthread_mutex_lock () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #7 0x400bb6fb in scm_i_thread_put_to_sleep () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #8 0x40069159 in scm_i_gc () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #9 0x4006afbe in increase_mtrigger () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #10 0x4009d8be in scm_make_srcprops () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #11 0x400977d9 in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #12 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #13 0x40097622 in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #14 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #15 0x4009769e in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #16 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #17 0x4009769e in scm_read_sexp () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #18 0x4009672f in scm_read_expression () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #19 0x4007d8da in scm_primitive_load () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #20 0x40062ed3 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #21 0x4004dc2b in scm_start_stack () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #22 0x4004e3a1 in scm_m_start_stack () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #23 0x4005cb71 in scm_apply () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #24 0x40061a15 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #25 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #26 0x400664ad in apply_thunk () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #27 0x4006668e in scm_c_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #28 0x400666e5 in scm_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #29 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #30 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #31 0x40051e98 in scm_dynamic_wind () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #32 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #33 0x400617bd in scm_call_0 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #34 0x400664ad in apply_thunk () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #35 0x4006668e in scm_c_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #36 0x400666e5 in scm_with_fluid () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #37 0x40062093 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #38 0x40064bb6 in call_closure_1 () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #39 0x4005d48e in scm_for_each () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #40 0x40062eba in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #41 0x40063156 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #42 0x40063a79 in ceval () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #43 0x400648da in scm_primitive_eval_x () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #44 0x40064935 in scm_eval_x () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #45 0x4009a021 in scm_shell () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #46 0x4007a546 in invoke_main_func () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #47 0x4004c492 in c_body () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #48 0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #49 0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #50 0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #51 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #52 0x400bce6e in scm_with_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #53 0x4007a4df in scm_boot_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #54 0x08048a06 in main () (gdb) thread 2 [Switching to thread 2 (Thread 0x416d3b90 (LWP 6853))]#0 0xffffe410 in ?? () (gdb) bt #0 0xffffe410 in ?? () #1 0x416d31a8 in ?? () #2 0x00000002 in ?? () #3 0x00000080 in ?? () #4 0x401912b9 in __lll_lock_wait () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #5 0x4018c9e4 in _L_lock_236 () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #6 0x4018c43b in pthread_mutex_lock () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #7 0x400bdbed in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #8 0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #9 0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #10 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #11 0x400bce6e in scm_with_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #12 0x400bcec3 in on_thread_exit () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #13 0x40189dc0 in __nptl_deallocate_tsd () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #14 0x4018a189 in start_thread () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #15 0x40264dae in clone () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x40b70b90 (LWP 6675))]#0 0xffffe410 in ?? () (gdb) bt #0 0xffffe410 in ?? () #1 0x40b6ff78 in ?? () #2 0x00000001 in ?? () #3 0x40b7005b in ?? () #4 0x401916cb in read () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #5 0x400988f3 in do_read_without_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #6 0x400bb7cc in scm_without_guile () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #7 0x40098855 in signal_delivery_thread () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #8 0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #9 0x400bdde9 in scm_internal_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #10 0x400bca4d in really_spawn () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #11 0x4004c492 in c_body () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #12 0x400bdbd9 in scm_c_catch () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #13 0x4004ca02 in scm_i_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #14 0x4004cae3 in scm_c_with_continuation_barrier () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #15 0x400bcd79 in scm_i_with_guile_and_parent () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #16 0x400bcddf in spawn_thread () from /tmp/nix-5221-14/guile-1.8.4/libguile/.libs/libguile.so.17 #17 0x4018a17b in start_thread () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libpthread.so.0 #18 0x40264dae in clone () from /nix/store/zahfcxzylmadvaj865j5xmm1dsvs03r7-glibc-2.7/lib/libc.so.6 All this happens apparently while reading `unif.test' (which comes right after `time.test'): $ sudo tail -n 3 /tmp/nix-5221-14/guile-1.8.4/check-guile.log PASS: time.test: strptime: in another thread after error PASS: time.test: strptime: GNU %s format: gmtoff on GMT PASS: time.test: strptime: GNU %s format: gmtoff on EST+5 To summarize: * Thread 2 is exiting. It holds THREAD_ADMIN_MUTEX (it acquired it at the beginning of `do_thread_exit ()') and is waiting on SCM_I_CRITICAL_SECTION_MUTEX in `scm_c_catch ()'. * Thread 1 is reading, actually GC'ing. It's trying to acquire THREAD_ADMIN_MUTEX in `scm_i_thread_put_to_sleep ()'. It holds SCM_I_CRITICAL_SECTION_MUTEX from `scm_make_srcprops ()'. One might wonder: why the heck does `scm_make_srcprops ()' enter a critical section? Could it just use a private mutex to protect accesses to `srcprops_freelist'? Han-Wen's reimplementation of it in HEAD (2007-01-19) doesn't use a critical section, nor a mutex, but is thread-safe AFAIUI. Two possibilities to fix it: 1. Copy `srcprop.[ch]' and `eval.c' bits from HEAD to 1.8. After all, it's probably solid enough (I use almost only HEAD). See [0] for an overview of the initial patch. It doesn't break the public API nor the ABI, but it (re)moves stuff from the `srcprop.h'. 2. Remove the critical section from 1.8 and synchronize accesses to `srcprops_freelist' with a private mutex, assuming that's a correct fix. I'd be in favor of the first approach. Comments? Thanks, Ludovic. [0] http://thread.gmane.org/gmane.lisp.guile.devel/6439