unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* Segfault on armv5tel-linux-gnueabi
@ 2011-06-21 22:36 Ludovic Courtès
  2011-06-22  8:31 ` Andy Wingo
  2011-06-23 21:43 ` Ludovic Courtès
  0 siblings, 2 replies; 5+ messages in thread
From: Ludovic Courtès @ 2011-06-21 22:36 UTC (permalink / raw)
  To: bug-guile

Hello!

A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720
(v2.0.0-114-gf60a764) introduced a bug showing up on
armv5tel-linux-gnueabi.

The symptom is that ‘./check-guile threads.test’ segfaults.  The
backtrace I have so far isn’t very talkative:

--8<---------------cut here---------------start------------->8---
Program terminated with signal 11, Segmentation fault.
#0  0x4050f82c in siglongjmp () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6
(gdb) thread apply all bt

Thread 4 (Thread 18394):
#0  0x405ad7b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6
#1  0x40135fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#2  0x00000000 in ?? ()

Thread 3 (Thread 18314):
#0  0x4013e0a8 in sem_wait@@GLIBC_2.4 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#1  0x004abacc in ?? ()
#2  0x004abacc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 18339):
#0  0x4013f314 in read () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#1  0x4013eba4 in __pthread_enable_asynccancel () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#2  0x00000000 in ?? ()

Thread 1 (Thread 18393):
#0  0x4050f82c in siglongjmp () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6
#1  0x40300fa4 in fport_flush (port=<value optimized out>) at ../../libguile/fports.c:816
#2  0x447bb9b8 in ?? ()
#3  0x447bb9b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
--8<---------------cut here---------------end--------------->8---

I’ll keep investigating and report back.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Segfault on armv5tel-linux-gnueabi
  2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès
@ 2011-06-22  8:31 ` Andy Wingo
  2011-06-22 10:20   ` Ludovic Courtès
  2011-06-23 21:43 ` Ludovic Courtès
  1 sibling, 1 reply; 5+ messages in thread
From: Andy Wingo @ 2011-06-22  8:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: bug-guile

Hi :)

On Wed 22 Jun 2011 00:36, ludo@gnu.org (Ludovic Courtès) writes:

> A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720
> (v2.0.0-114-gf60a764) introduced a bug showing up on
> armv5tel-linux-gnueabi.

With what libgc?

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Segfault on armv5tel-linux-gnueabi
  2011-06-22  8:31 ` Andy Wingo
@ 2011-06-22 10:20   ` Ludovic Courtès
  0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2011-06-22 10:20 UTC (permalink / raw)
  To: Andy Wingo; +Cc: bug-guile

Hi,

Andy Wingo <wingo@pobox.com> skribis:

> On Wed 22 Jun 2011 00:36, ludo@gnu.org (Ludovic Courtès) writes:
>
>> A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720
>> (v2.0.0-114-gf60a764) introduced a bug showing up on
>> armv5tel-linux-gnueabi.
>
> With what libgc?

A 20110122 checkout—i.e., post 7.2alpha4, with a fix for the
interception of ‘pthread_exit’, dated 2010-08-14, which fixes deadlocks
we had [0].

It’s the one in Nixpkgs, and thus used on Hydra.

Thanks,
Ludo’.

[0] http://thread.gmane.org/gmane.lisp.guile.bugs/5007



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Segfault on armv5tel-linux-gnueabi
  2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès
  2011-06-22  8:31 ` Andy Wingo
@ 2011-06-23 21:43 ` Ludovic Courtès
  2011-06-29 23:30   ` Ludovic Courtès
  1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2011-06-23 21:43 UTC (permalink / raw)
  To: bug-guile

Hi!

ludo@gnu.org (Ludovic Courtès) skribis:

> A bisect found that commit f60a7648d5926555c7760364a6fbb7dc0cf60720
> (v2.0.0-114-gf60a764) introduced a bug showing up on
> armv5tel-linux-gnueabi.
>
> The symptom is that ‘./check-guile threads.test’ segfaults.  The
> backtrace I have so far isn’t very talkative:

The initial problem is a VM stack overflow, which leads to a segfault
because our stack overflow handling is so fragile.  :-)

Running ‘meta/guile test-suite/tests/threads.test’ with a breakpoint at
‘scm_error’, we see:

--8<---------------cut here---------------start------------->8---
(gdb) thread apply all bt
[New Thread 0x44eee470 (LWP 19537)]

Thread 59 (Thread 0x44eee470 (LWP 19537)):
#0  0x405987b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6
#1  0x40144fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#2  0x00000000 in ?? ()

Thread 58 (Thread 0x44630470 (LWP 19536)):
#0  scm_error (key=0xc7060, subr=0x0, message=0x402a253c "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61
#1  0x40255be0 in scm_report_stack_overflow () at ../../libguile/stackchk.c:58
#2  0x4027a62c in scm_c_vm_run (vm=0x14a9a8, program=0x708e8, argv=0x4462fcc8, nargs=4) at ../../libguile/vm.c:564
#3  0x401ec344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x301c40) at ../../libguile/eval.c:506
#4  0x40262b2c in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x301c60, handler=0x301c50, pre_unwind_handler=0x301c40) at ../../libguile/throw.c:86
#5  0x401e3380 in scm_i_with_continuation_barrier (body=0x401e2bdc <c_body>, body_data=0x4462fd4c, handler=0x401e2eb0 <c_handler>, handler_data=0x4462fd4c, pre_unwind_handler=0x401e2d10 <pre_unwind_handler>, 
    pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450
#6  0x401e3440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546
#7  0x401142d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1
#8  0x402608f8 in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:917
#9  scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:923
#10 0x401142d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1
#11 0x4026061c in on_thread_exit (v=0x389a80) at ../../libguile/threads.c:714
#12 0x40144348 in __nptl_deallocate_tsd () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#13 0x40151ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#14 0x40151ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0x43dff470 (LWP 19482)):
#0  0x4014e314 in read () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#1  0x4014dba4 in __pthread_enable_asynccancel () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#2  0x00000000 in ?? ()

Thread 1 (Thread 0x4016e000 (LWP 19479)):
#0  0x405987b8 in clone () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libc.so.6
#1  0x40144fe0 in T.337 () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#2  0x44eee6a4 in ?? ()
#3  0x44eee6a4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
--8<---------------cut here---------------end--------------->8---

Commenting out the 5 tests from threads.test that invoke ‘cancel-thread’
solves the problem.

Looks like a déjà vu.  To be continued...

Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Segfault on armv5tel-linux-gnueabi
  2011-06-23 21:43 ` Ludovic Courtès
@ 2011-06-29 23:30   ` Ludovic Courtès
  0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2011-06-29 23:30 UTC (permalink / raw)
  To: bug-guile

Hello,

Sooo, the test case can be reduced to this:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 threads))

(define (test)
  (pk 'test)
  (let* ((m (make-mutex))
         (c (make-condition-variable))
         (t (begin-thread (begin (pk 'kid (current-thread)) (lock-mutex m)
                                 (wait-condition-variable c m)
                                 (pk 'kid-done (current-thread)))))
         (r (join-thread t (current-time))))
    (pk 'parent (current-thread))
    (cancel-thread t)
    (not r))) 

(test)
(test) ;; <- VM stack overflow, then segfault
(test)
--8<---------------cut here---------------end--------------->8---

With breakpoints at ‘pthread_cancel’ and ‘scm_error’, we get a nicer
backtrace:

--8<---------------cut here---------------start------------->8---
(gdb) thread apply all bt

Thread 2 (Thread 0x41257470 (LWP 23878)):
#0  scm_error (key=0xc7060, subr=0x0, message=0x403ba554 "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61
#1  0x4036dbe0 in scm_report_stack_overflow () at ../../libguile/stackchk.c:58
#2  0x40392640 in scm_c_vm_run (vm=0x1f57e8, program=0x708e8, argv=0x41256cc8, nargs=4) at ../../libguile/vm.c:564
#3  0x40304344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x1c1880) at ../../libguile/eval.c:506
#4  0x4037ab40 in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x1c18a0, handler=0x1c1890, pre_unwind_handler=0x1c1880) at ../../libguile/throw.c:86
#5  0x402fb380 in scm_i_with_continuation_barrier (body=0x402fabdc <c_body>, body_data=0x41256d4c, handler=0x402faeb0 <c_handler>, handler_data=0x41256d4c, pre_unwind_handler=0x402fad10 <pre_unwind_handler>, 
    pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450
#6  0x402fb440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546
#7  0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1
#8  0x403788fc in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:919
#9  scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:925
#10 0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1
#11 0x4037861c in on_thread_exit (v=0x1a52a0) at ../../libguile/threads.c:716
#12 0x4015a348 in __nptl_deallocate_tsd () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#13 0x40167ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
#14 0x40167ea4 in ?? () from /nix/store/x7n64n36xpqbsi10lgpr3x9f1z9jsp83-glibc-2.12.2/lib/libpthread.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0x400a5000 (LWP 23877)):
#0  scm_cancel_thread (thread=<value optimized out>) at ../../libguile/threads.c:1142
#1  0x40390524 in vm_regular_engine (vm=0xda3a8, program=0x0, argv=0x107160, nargs=404768) at ../../libguile/vm-i-system.c:892
#2  0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0x1c1820, argv=0x0, nargs=0) at ../../libguile/vm.c:565
#3  0x40390524 in vm_regular_engine (vm=0xda3a8, program=0x107118, argv=0x10710c, nargs=404768) at ../../libguile/vm-i-system.c:892
#4  0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0xe3670, argv=0xbed6b1ec, nargs=1) at ../../libguile/vm.c:565
#5  0x40304618 in scm_primitive_eval (exp=0x1b5820) at ../../libguile/eval.c:639
#6  0x40304698 in scm_eval (exp=0x1b5820, module_or_state=0x161828) at ../../libguile/eval.c:673
#7  0x403566c4 in scm_shell (argc=<value optimized out>, argv=0xbed6b884) at ../../libguile/script.c:402
#8  0x40321408 in invoke_main_func (body_data=0xbed6b718) at ../../libguile/init.c:336
#9  0x402fabf0 in c_body (d=0xbed6b6c4) at ../../libguile/continuations.c:512
#10 0x4037a6f8 in apply_catch_closure (clo=<value optimized out>, args=0x304) at ../../libguile/throw.c:146
#11 0x4039031c in vm_regular_engine (vm=0xda3a8, program=0x107054, argv=0x107054, nargs=1747296) at ../../libguile/vm-i-system.c:960
#12 0x40392634 in scm_c_vm_run (vm=0xda3a8, program=0x708e8, argv=0xbed6b640, nargs=4) at ../../libguile/vm.c:565
#13 0x40304344 in scm_call_4 (proc=0x708e8, arg1=<value optimized out>, arg2=<value optimized out>, arg3=<value optimized out>, arg4=0x1aa940) at ../../libguile/eval.c:506
#14 0x4037ab40 in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x1aa960, handler=0x1aa950, pre_unwind_handler=0x1aa940) at ../../libguile/throw.c:86
#15 0x402fb380 in scm_i_with_continuation_barrier (body=0x402fabdc <c_body>, body_data=0xbed6b6c4, handler=0x402faeb0 <c_handler>, handler_data=0xbed6b6c4, pre_unwind_handler=0x402fad10 <pre_unwind_handler>, 
    pre_unwind_handler_data=0xda340) at ../../libguile/continuations.c:450
#16 0x402fb440 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>) at ../../libguile/continuations.c:546
#17 0x4037871c in with_guile_and_parent (base=0xbed6b6f0, data=<value optimized out>) at ../../libguile/threads.c:876
#18 0x4012a2d8 in GC_call_with_stack_base () from /nix/store/iva9d3m74d1sw2ymas27kacnj2k3rp81-boehm-gc-7.2pre20110122/lib/libgc.so.1
#19 0x403788fc in scm_i_with_guile_and_parent (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:919
#20 scm_with_guile (func=<value optimized out>, data=<value optimized out>) at ../../libguile/threads.c:925
#21 0x403214d0 in scm_boot_guile (argc=<value optimized out>, argv=<value optimized out>, main_func=<value optimized out>, closure=<value optimized out>) at ../../libguile/init.c:319
#22 0x000089a8 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../libguile/guile.c:70
(gdb) thread 1
[Switching to thread 1 (Thread 0x400a5000 (LWP 23877))]#0  scm_cancel_thread (thread=<value optimized out>) at ../../libguile/threads.c:1142
1142    }
(gdb) p t
$14 = (scm_i_thread *) 0x1a52a0
(gdb) thread 2
[Switching to thread 2 (Thread 0x41257470 (LWP 23878))]#0  scm_error (key=0xc7060, subr=0x0, message=0x403ba554 "Stack overflow", args=0x4, rest=0x4) at ../../libguile/error.c:61
61          (key,
(gdb) p scm_i_current_thread 
$15 = (scm_i_thread *) 0x1a52a0
--8<---------------cut here---------------end--------------->8---

The thread experiencing the stack overflow is the one being canceled.
Its ‘on_thread_exit’ is called because it’s a pthread key destructor.

When ‘on_thread_exit’ is called, t->guile_mode == 1, which causes
‘with_guile_and_parent’ to keep t->base unchanged, which eventually
causes SCM_STACK_OVERFLOW_P to misdiagnose a stack overflow.

Adding ‘t->guile_mode = 0’ at the beginning of ‘on_thread_exit’ solves
this problem, because it forces t->base to be adjusted.

I’ll see how to solve it correctly.

Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-29 23:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-21 22:36 Segfault on armv5tel-linux-gnueabi Ludovic Courtès
2011-06-22  8:31 ` Andy Wingo
2011-06-22 10:20   ` Ludovic Courtès
2011-06-23 21:43 ` Ludovic Courtès
2011-06-29 23:30   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).