* Segfault on armv5tel-linux-gnueabi @ 2011-06-21 22:36 Ludovic Courtès 2011-06-22 8:31 ` Andy Wingo 2011-06-23 21:43 ` Ludovic Courtès 0 siblings, 2 replies; 5+ messages in thread From: Ludovic Courtès @ 2011-06-21 22:36 UTC (permalink / raw) To: bug-guile Hello! A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720 (v2.0.0-114-gf60a764) introduced a bug showing up on armv5tel-linux-gnueabi. The symptom is that ‘./check-guile threads.test’ segfaults. The backtrace I have so far isn’t very talkative: --8<---------------cut here---------------start------------->8--- Program terminated with signal 11, Segmentation fault. #0 0x4050f82c in siglongjmp () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6 (gdb) thread apply all bt Thread 4 (Thread 18394): #0 0x405ad7b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6 #1 0x40135fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #2 0x00000000 in ?? () Thread 3 (Thread 18314): #0 0x4013e0a8 in sem_wait@@GLIBC_2.4 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #1 0x004abacc in ?? () #2 0x004abacc in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 2 (Thread 18339): #0 0x4013f314 in read () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #1 0x4013eba4 in __pthread_enable_asynccancel () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #2 0x00000000 in ?? () Thread 1 (Thread 18393): #0 0x4050f82c in siglongjmp () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6 #1 0x40300fa4 in fport_flush (port=<value optimized out>) at ../../libguile/fports.c:816 #2 0x447bb9b8 in ?? () #3 0x447bb9b8 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) --8<---------------cut here---------------end--------------->8--- I’ll keep investigating and report back. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Segfault on armv5tel-linux-gnueabi 2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès @ 2011-06-22 8:31 ` Andy Wingo 2011-06-22 10:20 ` Ludovic Courtès 2011-06-23 21:43 ` Ludovic Courtès 1 sibling, 1 reply; 5+ messages in thread From: Andy Wingo @ 2011-06-22 8:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: bug-guile Hi :) On Wed 22 Jun 2011 00:36, ludo@gnu.org (Ludovic Courtès) writes: > A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720 > (v2.0.0-114-gf60a764) introduced a bug showing up on > armv5tel-linux-gnueabi. With what libgc? Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Segfault on armv5tel-linux-gnueabi 2011-06-22 8:31 ` Andy Wingo @ 2011-06-22 10:20 ` Ludovic Courtès 0 siblings, 0 replies; 5+ messages in thread From: Ludovic Courtès @ 2011-06-22 10:20 UTC (permalink / raw) To: Andy Wingo; +Cc: bug-guile Hi, Andy Wingo <wingo@pobox.com> skribis: > On Wed 22 Jun 2011 00:36, ludo@gnu.org (Ludovic Courtès) writes: > >> A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720 >> (v2.0.0-114-gf60a764) introduced a bug showing up on >> armv5tel-linux-gnueabi. > > With what libgc? A 20110122 checkout—i.e., post 7.2alpha4, with a fix for the interception of ‘pthread_exit’, dated 2010-08-14, which fixes deadlocks we had [0]. It’s the one in Nixpkgs, and thus used on Hydra. Thanks, Ludo’. [0] http://thread.gmane.org/gmane.lisp.guile.bugs/5007 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Segfault on armv5tel-linux-gnueabi 2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès 2011-06-22 8:31 ` Andy Wingo @ 2011-06-23 21:43 ` Ludovic Courtès 2011-06-29 23:30 ` Ludovic Courtès 1 sibling, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2011-06-23 21:43 UTC (permalink / raw) To: bug-guile Hi! ludo@gnu.org (Ludovic Courtès) skribis: > A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720 > (v2.0.0-114-gf60a764) introduced a bug showing up on > armv5tel-linux-gnueabi. > > The symptom is that ‘./check-guile threads.test’ segfaults. The > backtrace I have so far isn’t very talkative: The initial problem is a VM stack overflow, which leads to a segfault because our stack overflow handling is so fragile. :-) Running ‘meta/guile test-suite/tests/threads.test’ with a breakpoint at ‘scm_error’, we see: --8<---------------cut here---------------start------------->8--- (gdb) thread apply all bt [New Thread 0x44eee470 (LWP 19537)] Thread 59 (Thread 0x44eee470 (LWP 19537)): #0 0x405987b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6 #1 0x40144fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #2 0x00000000 in ?? () Thread 58 (Thread 0x44630470 (LWP 19536)): #0 scm_error (key=0xc7060, subr=0x0, message=0x402a253c "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61 #1 0x40255be0 in scm_report_stack_overflow () at ../../libguile/stackchk.c:58 #2 0x4027a62c in scm_c_vm_run (vm=0x14a9a8, program=0x708e8, argv=0x4462fcc8, nargs=4) at ../../libguile/vm.c:564 #3 0x401ec344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x301c40) at ../../libguile/eval.c:506 #4 0x40262b2c in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x301c60, handler=0x301c50, pre_unwind_handler=0x301c40) at ../../libguile/throw.c:86 #5 0x401e3380 in scm_i_with_continuation_barrier (body=0x401e2bdc <c_body>, body_data=0x4462fd4c, handler=0x401e2eb0 <c_handler>, handler_data=0x4462fd4c, pre_unwind_handler=0x401e2d10 <pre_unwind_handler>, pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450 #6 0x401e3440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546 #7 0x401142d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1 #8 0x402608f8 in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:917 #9 scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:923 #10 0x401142d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1 #11 0x4026061c in on_thread_exit (v=0x389a80) at ../../libguile/threads.c:714 #12 0x40144348 in __nptl_deallocate_tsd () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #13 0x40151ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #14 0x40151ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 4 (Thread 0x43dff470 (LWP 19482)): #0 0x4014e314 in read () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #1 0x4014dba4 in __pthread_enable_asynccancel () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #2 0x00000000 in ?? () Thread 1 (Thread 0x4016e000 (LWP 19479)): #0 0x405987b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6 #1 0x40144fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #2 0x44eee6a4 in ?? () #3 0x44eee6a4 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) --8<---------------cut here---------------end--------------->8--- Commenting out the 5 tests from threads.test that invoke ‘cancel-thread’ solves the problem. Looks like a déjà vu. To be continued... Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Segfault on armv5tel-linux-gnueabi 2011-06-23 21:43 ` Ludovic Courtès @ 2011-06-29 23:30 ` Ludovic Courtès 0 siblings, 0 replies; 5+ messages in thread From: Ludovic Courtès @ 2011-06-29 23:30 UTC (permalink / raw) To: bug-guile Hello, Sooo, the test case can be reduced to this: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 threads)) (define (test) (pk 'test) (let* ((m (make-mutex)) (c (make-condition-variable)) (t (begin-thread (begin (pk 'kid (current-thread)) (lock-mutex m) (wait-condition-variable c m) (pk 'kid-done (current-thread))))) (r (join-thread t (current-time)))) (pk 'parent (current-thread)) (cancel-thread t) (not r))) (test) (test) ;; <- VM stack overflow, then segfault (test) --8<---------------cut here---------------end--------------->8--- With breakpoints at ‘pthread_cancel’ and ‘scm_error’, we get a nicer backtrace: --8<---------------cut here---------------start------------->8--- (gdb) thread apply all bt Thread 2 (Thread 0x41257470 (LWP 23878)): #0 scm_error (key=0xc7060, subr=0x0, message=0x403ba554 "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61 #1 0x4036dbe0 in scm_report_stack_overflow () at ../../libguile/stackchk.c:58 #2 0x40392640 in scm_c_vm_run (vm=0x1f57e8, program=0x708e8, argv=0x41256cc8, nargs=4) at ../../libguile/vm.c:564 #3 0x40304344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x1c1880) at ../../libguile/eval.c:506 #4 0x4037ab40 in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x1c18a0, handler=0x1c1890, pre_unwind_handler=0x1c1880) at ../../libguile/throw.c:86 #5 0x402fb380 in scm_i_with_continuation_barrier (body=0x402fabdc <c_body>, body_data=0x41256d4c, handler=0x402faeb0 <c_handler>, handler_data=0x41256d4c, pre_unwind_handler=0x402fad10 <pre_unwind_handler>, pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450 #6 0x402fb440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546 #7 0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1 #8 0x403788fc in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:919 #9 scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:925 #10 0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1 #11 0x4037861c in on_thread_exit (v=0x1a52a0) at ../../libguile/threads.c:716 #12 0x4015a348 in __nptl_deallocate_tsd () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #13 0x40167ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 #14 0x40167ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 (Thread 0x400a5000 (LWP 23877)): #0 scm_cancel_thread (thread=<value optimized out>) at ../../libguile/threads.c:1142 #1 0x40390524 in vm_regular_engine (vm=0xda3a8, program=0x0, argv=0x107160, nargs=404768) at ../../libguile/vm-i-system.c:892 #2 0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0x1c1820, argv=0x0, nargs=0) at ../../libguile/vm.c:565 #3 0x40390524 in vm_regular_engine (vm=0xda3a8, program=0x107118, argv=0x10710c, nargs=404768) at ../../libguile/vm-i-system.c:892 #4 0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0xe3670, argv=0xbed6b1ec, nargs=1) at ../../libguile/vm.c:565 #5 0x40304618 in scm_primitive_eval (exp=0x1b5820) at ../../libguile/eval.c:639 #6 0x40304698 in scm_eval (exp=0x1b5820, module_or_state=0x161828) at ../../libguile/eval.c:673 #7 0x403566c4 in scm_shell (argc=<value optimized out>, argv=0xbed6b884) at ../../libguile/script.c:402 #8 0x40321408 in invoke_main_func (body_data=0xbed6b718) at ../../libguile/init.c:336 #9 0x402fabf0 in c_body (d=0xbed6b6c4) at ../../libguile/continuations.c:512 #10 0x4037a6f8 in apply_catch_closure (clo=<value optimized out>, args=0x304) at ../../libguile/throw.c:146 #11 0x4039031c in vm_regular_engine (vm=0xda3a8, program=0x107054, argv=0x107054, nargs=1747296) at ../../libguile/vm-i-system.c:960 #12 0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0x708e8, argv=0xbed6b640, nargs=4) at ../../libguile/vm.c:565 #13 0x40304344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x1aa940) at ../../libguile/eval.c:506 #14 0x4037ab40 in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x1aa960, handler=0x1aa950, pre_unwind_handler=0x1aa940) at ../../libguile/throw.c:86 #15 0x402fb380 in scm_i_with_continuation_barrier (body=0x402fabdc <c_body>, body_data=0xbed6b6c4, handler=0x402faeb0 <c_handler>, handler_data=0xbed6b6c4, pre_unwind_handler=0x402fad10 <pre_unwind_handler>, pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450 #16 0x402fb440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546 #17 0x4037871c in with_guile_and_parent (base=0xbed6b6f0, data=<value optimized out>) at ../../libguile/threads.c:876 #18 0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1 #19 0x403788fc in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:919 #20 scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:925 #21 0x403214d0 in scm_boot_guile (argc=<value optimized out>, argv=<value optimized out>, main_func=<value optimized out>, closure=<value optimized out>) at ../../libguile/init.c:319 #22 0x000089a8 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../libguile/guile.c:70 (gdb) thread 1 [Switching to thread 1 (Thread 0x400a5000 (LWP 23877))]#0 scm_cancel_thread (thread=<value optimized out>) at ../../libguile/threads.c:1142 1142 } (gdb) p t $14 = (scm_i_thread *) 0x1a52a0 (gdb) thread 2 [Switching to thread 2 (Thread 0x41257470 (LWP 23878))]#0 scm_error (key=0xc7060, subr=0x0, message=0x403ba554 "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61 61 (key, (gdb) p scm_i_current_thread $15 = (scm_i_thread *) 0x1a52a0 --8<---------------cut here---------------end--------------->8--- The thread experiencing the stack overflow is the one being canceled. Its ‘on_thread_exit’ is called because it’s a pthread key destructor. When ‘on_thread_exit’ is called, t->guile_mode == 1, which causes ‘with_guile_and_parent’ to keep t->base unchanged, which eventually causes SCM_STACK_OVERFLOW_P to misdiagnose a stack overflow. Adding ‘t->guile_mode = 0’ at the beginning of ‘on_thread_exit’ solves this problem, because it forces t->base to be adjusted. I’ll see how to solve it correctly. Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-06-29 23:30 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès 2011-06-22 8:31 ` Andy Wingo 2011-06-22 10:20 ` Ludovic Courtès 2011-06-23 21:43 ` Ludovic Courtès 2011-06-29 23:30 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).