* further pthread foo
@ 2011-03-18 22:26 Andy Wingo
[not found] ` <m339mkxdyh.fsf-CaTCM8lwFkgB9AHHLWeGtNQXobZC6xk2@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Andy Wingo @ 2011-03-18 22:26 UTC (permalink / raw)
To: gc; +Cc: bug-guile
Hello again!
Continuing on the same topic as my previous mail, which you may read
here:
http://thread.gmane.org/gmane.lisp.guile.bugs/5340
I modified the program to do a `scm_init_guile ()' before creating any
threads. This initializes libgc from the main thread, so all should be
well. But then I run into a problem:
(gdb) r
Starting program: /tmp/many_threads
[Thread debugging using libthread_db enabled]
0: create[New Thread 0x7ffff740b700 (LWP 23030)]
join
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff740b700 (LWP 23030)]
0x00007ffff7a09200 in GC_malloc (bytes=16) at thread_local_alloc.c:161
161 GC_FAST_MALLOC_GRANS(result, granules, tiny_fl, DIRECT_GRANULES,
(gdb) thr apply all bt
Thread 2 (Thread 0x7ffff740b700 (LWP 23030)):
#0 0x00007ffff7a09200 in GC_malloc (bytes=16) at thread_local_alloc.c:161
#1 0x00007ffff7d1208e in scm_cell (x=<value optimized out>, y=<value optimized out>) at ../libguile/inline.h:124
#2 scm_cons (x=<value optimized out>, y=<value optimized out>) at pairs.c:77
#3 0x00007ffff7cd05b7 in scm_i_with_continuation_barrier (body=0x7ffff7cd03a0 <c_body>, body_data=0x7ffff740adb0,
handler=0x7ffff7cd03c0 <c_handler>, handler_data=0x7ffff740adb0, pre_unwind_handler=<value optimized out>,
pre_unwind_handler_data=<value optimized out>) at continuations.c:444
#4 0x00007ffff7cd0690 in scm_c_with_continuation_barrier (func=<value optimized out>, data=<value optimized out>)
at continuations.c:491
#5 0x00007ffff7a0982f in GC_call_with_gc_active (fn=0x7ffff7d48740 <with_guile_trampoline>, client_data=0x7ffff740ae20)
at pthread_support.c:1127
#6 0x00007ffff7d48aed in with_gc_active (func=0x7ffff7d488e0 <do_thread_exit>, data=0x949600, parent=<value optimized out>)
at threads.c:97
#7 scm_i_with_guile_and_parent (func=0x7ffff7d488e0 <do_thread_exit>, data=0x949600, parent=<value optimized out>)
at threads.c:826
#8 0x00007ffff7d48bc7 in do_thread_exit_trampoline (sb=<value optimized out>, v=0x949600) at threads.c:549
#9 0x00007ffff7a03525 in GC_call_with_stack_base (fn=<value optimized out>, arg=<value optimized out>) at misc.c:1493
#10 0x00007ffff7d48867 in on_thread_exit (v=0x949600) at threads.c:580
#11 0x0000003fc86077f9 in __nptl_deallocate_tsd (arg=0x7ffff740b700) at pthread_create.c:154
#12 start_thread (arg=0x7ffff740b700) at pthread_create.c:308
#13 0x0000003fc7ee098d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Thread 1 (Thread 0x7ffff76ce700 (LWP 23027)):
#0 0x0000003fc8607fbd in pthread_join (threadid=140737341601536, thread_return=0x7fffffffe018) at pthread_join.c:89
#1 0x00007ffff7a09e47 in GC_pthread_join (thread=140737341601536, retval=0x7fffffffe018) at pthread_support.c:1219
#2 0x00000000004008e7 in main () at many_threads.c:31
What is happening is that I have registered a pthread_key cleanup
handler, and I need to call a Guile function in that handler. But quite
possibly, libgc has already torn down this thread. So I trampoline
through a GC_call_with_stack_base, then register the current thread. If
that returns GC_SUCCESS, later I unregister it. In between there is the
scm_i_with_guile_and_parent call you see at frame #7.
However as you see, there is unhappiness there. I don't know what
exactly is causing the segfault, but here's a disassembly:
(gdb) disassemble $pc
Dump of assembler code for function GC_malloc:
0x00007ffff7a09180 <+0>: mov %rbx,-0x30(%rsp)
0x00007ffff7a09185 <+5>: mov %r13,-0x18(%rsp)
0x00007ffff7a0918a <+10>: mov %rdi,%rbx
0x00007ffff7a0918d <+13>: mov %rbp,-0x28(%rsp)
0x00007ffff7a09192 <+18>: mov %r12,-0x20(%rsp)
0x00007ffff7a09197 <+23>: mov %r14,-0x10(%rsp)
0x00007ffff7a0919c <+28>: mov %r15,-0x8(%rsp)
0x00007ffff7a091a1 <+33>: sub $0x48,%rsp
0x00007ffff7a091a5 <+37>: mov 0x20a104(%rip),%rax # 0x7ffff7c132b0
0x00007ffff7a091ac <+44>: mov (%rax),%r12d
0x00007ffff7a091af <+47>: data32 lea 0x20a3e9(%rip),%rdi # 0x7ffff7c135a0
0x00007ffff7a091b7 <+55>: data32 data32 callq 0x7ffff79f6da0 <__tls_get_addr@plt>
0x00007ffff7a091bf <+63>: mov (%rax),%r13
0x00007ffff7a091c2 <+66>: test %r13,%r13
0x00007ffff7a091c5 <+69>: je 0x7ffff7a0923f <GC_malloc+191>
0x00007ffff7a091c7 <+71>: add $0xf,%r12d
0x00007ffff7a091cb <+75>: movslq %r12d,%r12
0x00007ffff7a091ce <+78>: lea (%rbx,%r12,1),%r12
0x00007ffff7a091d2 <+82>: shr $0x4,%r12
0x00007ffff7a091d6 <+86>: cmp $0x18,%r12
0x00007ffff7a091da <+90>: ja 0x7ffff7a0923f <GC_malloc+191>
0x00007ffff7a091dc <+92>: lea 0x18(%r12),%r14
0x00007ffff7a091e1 <+97>: mov %r12,%rbp
0x00007ffff7a091e4 <+100>: mov $0x10,%r15d
0x00007ffff7a091ea <+106>: shl $0x4,%rbp
0x00007ffff7a091ee <+110>: mov 0x8(%r13,%r14,8),%rax
0x00007ffff7a091f3 <+115>: lea 0x8(%r13,%r14,8),%rcx
0x00007ffff7a091f8 <+120>: cmp $0x11a,%rax
0x00007ffff7a091fe <+126>: jbe 0x7ffff7a09269 <GC_malloc+233>
=> 0x00007ffff7a09200 <+128>: mov (%rax),%rdx
0x00007ffff7a09203 <+131>: mov %rdx,0x8(%r13,%r14,8)
0x00007ffff7a09208 <+136>: prefetcht0 (%rdx)
0x00007ffff7a0920b <+139>: movq $0x0,(%rax)
0x00007ffff7a09212 <+146>: mov 0x18(%rsp),%rbx
0x00007ffff7a09217 <+151>: mov 0x20(%rsp),%rbp
0x00007ffff7a0921c <+156>: mov 0x28(%rsp),%r12
0x00007ffff7a09221 <+161>: mov 0x30(%rsp),%r13
0x00007ffff7a09226 <+166>: mov 0x38(%rsp),%r14
0x00007ffff7a0922b <+171>: mov 0x40(%rsp),%r15
0x00007ffff7a09230 <+176>: add $0x48,%rsp
0x00007ffff7a09234 <+180>: retq
(gdb) info registers
rax 0x1000 4096
rbx 0x10 16
rcx 0x6b1650 7018064
rdx 0x2 2
rsi 0x0 0
rdi 0x6027e0 6301664
rbp 0x10 0x10
rsp 0x7ffff740acc0 0x7ffff740acc0
r8 0x7ffff7d490b0 140737351291056
r9 0x0 0
r10 0x7ffff740aba0 140737341598624
r11 0x7ffff7a097c0 140737347884992
r12 0x1 1
r13 0x6b1580 7017856
r14 0x19 25
r15 0x10 16
rip 0x7ffff7a09200 0x7ffff7a09200 <GC_malloc+128>
eflags 0x10212 [ AF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
This is with current CVS.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: further pthread foo
[not found] ` <m339mkxdyh.fsf-CaTCM8lwFkgB9AHHLWeGtNQXobZC6xk2@public.gmane.org>
@ 2011-03-19 21:25 ` Ivan Maidanski
2011-03-19 23:34 ` [Gc] " Andy Wingo
0 siblings, 1 reply; 3+ messages in thread
From: Ivan Maidanski @ 2011-03-19 21:25 UTC (permalink / raw)
To: Andy Wingo; +Cc: bug-guile, gc-V9/bV5choksm30D7ZfaTJw
Hi Andy,
Try to compile libgc with -DGC_ASSERTIONS but without -DTHREAD_LOCAL_ALLOC -DPARALLEL_MARK.
I cannot help you more in figuring out what's the problem (due to lack of time). Probably someone else could help you more...
BTW. Why do you use GC_call_with_gc_active()? It should be no-op in your case - the thread is stopped and scanned after you call GC_register_my_thread. (GC_call_with_gc_active is used primarily inside GC_do_blocking calls).
Fri, 18 Mar 2011 23:26:30 +0100 Andy Wingo <wingo-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>:
> Hello again!
>
> Continuing on the same topic as my previous mail, which you may read
> here:
>
> http://thread.gmane.org/gmane.lisp.guile.bugs/5340
>
> I modified the program to do a `scm_init_guile ()' before creating any
> threads. This initializes libgc from the main thread, so all should be
> well. But then I run into a problem:
>
> (gdb) r
> Starting program: /tmp/many_threads
> [Thread debugging using libthread_db enabled]
> 0: create[New Thread 0x7ffff740b700 (LWP 23030)]
> join
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff740b700 (LWP 23030)]
> 0x00007ffff7a09200 in GC_malloc (bytes=16) at thread_local_alloc.c:161
> 161 GC_FAST_MALLOC_GRANS(result, granules, tiny_fl, DIRECT_GRANULES,
> (gdb) thr apply all bt
> Thread 2 (Thread 0x7ffff740b700 (LWP 23030)):
> #0 0x00007ffff7a09200 in GC_malloc (bytes=16) at thread_local_alloc.c:161
> #1 0x00007ffff7d1208e in scm_cell (x=<value optimized out>, y=<value
> optimized out>) at ../libguile/inline.h:124
> #2 scm_cons (x=<value optimized out>, y=<value optimized out>) at pairs.c:77
> #3 0x00007ffff7cd05b7 in scm_i_with_continuation_barrier (body=0x7ffff7cd03a0
> <c_body>, body_data=0x7ffff740adb0,
> handler=0x7ffff7cd03c0 <c_handler>, handler_data=0x7ffff740adb0,
> pre_unwind_handler=<value optimized out>,
> pre_unwind_handler_data=<value optimized out>) at continuations.c:444
> #4 0x00007ffff7cd0690 in scm_c_with_continuation_barrier (func=<value
> optimized out>, data=<value optimized out>)
> at continuations.c:491
> #5 0x00007ffff7a0982f in GC_call_with_gc_active (fn=0x7ffff7d48740
> <with_guile_trampoline>, client_data=0x7ffff740ae20)
> at pthread_support.c:1127
> #6 0x00007ffff7d48aed in with_gc_active (func=0x7ffff7d488e0
> <do_thread_exit>, data=0x949600, parent=<value optimized out>)
> at threads.c:97
> #7 scm_i_with_guile_and_parent (func=0x7ffff7d488e0 <do_thread_exit>,
> data=0x949600, parent=<value optimized out>)
> at threads.c:826
> #8 0x00007ffff7d48bc7 in do_thread_exit_trampoline (sb=<value optimized out>,
> v=0x949600) at threads.c:549
> #9 0x00007ffff7a03525 in GC_call_with_stack_base (fn=<value optimized out>,
> arg=<value optimized out>) at misc.c:1493
> #10 0x00007ffff7d48867 in on_thread_exit (v=0x949600) at threads.c:580
> #11 0x0000003fc86077f9 in __nptl_deallocate_tsd (arg=0x7ffff740b700) at
> pthread_create.c:154
> #12 start_thread (arg=0x7ffff740b700) at pthread_create.c:308
> #13 0x0000003fc7ee098d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Thread 1 (Thread 0x7ffff76ce700 (LWP 23027)):
> #0 0x0000003fc8607fbd in pthread_join (threadid=140737341601536,
> thread_return=0x7fffffffe018) at pthread_join.c:89
> #1 0x00007ffff7a09e47 in GC_pthread_join (thread=140737341601536,
> retval=0x7fffffffe018) at pthread_support.c:1219
> #2 0x00000000004008e7 in main () at many_threads.c:31
>
> What is happening is that I have registered a pthread_key cleanup
> handler, and I need to call a Guile function in that handler. But quite
> possibly, libgc has already torn down this thread. So I trampoline
> through a GC_call_with_stack_base, then register the current thread. If
> that returns GC_SUCCESS, later I unregister it. In between there is the
> scm_i_with_guile_and_parent call you see at frame #7.
>
> However as you see, there is unhappiness there. I don't know what
> exactly is causing the segfault, but here's a disassembly:
>
> (gdb) disassemble $pc
> Dump of assembler code for function GC_malloc:
> 0x00007ffff7a09180 <+0>: mov %rbx,-0x30(%rsp)
> 0x00007ffff7a09185 <+5>: mov %r13,-0x18(%rsp)
> 0x00007ffff7a0918a <+10>: mov %rdi,%rbx
> 0x00007ffff7a0918d <+13>: mov %rbp,-0x28(%rsp)
> 0x00007ffff7a09192 <+18>: mov %r12,-0x20(%rsp)
> 0x00007ffff7a09197 <+23>: mov %r14,-0x10(%rsp)
> 0x00007ffff7a0919c <+28>: mov %r15,-0x8(%rsp)
> 0x00007ffff7a091a1 <+33>: sub $0x48,%rsp
> 0x00007ffff7a091a5 <+37>: mov 0x20a104(%rip),%rax # 0x7ffff7c132b0
> 0x00007ffff7a091ac <+44>: mov (%rax),%r12d
> 0x00007ffff7a091af <+47>: data32 lea 0x20a3e9(%rip),%rdi #
> 0x7ffff7c135a0
> 0x00007ffff7a091b7 <+55>: data32 data32 callq 0x7ffff79f6da0
> <__tls_get_addr@plt>
> 0x00007ffff7a091bf <+63>: mov (%rax),%r13
> 0x00007ffff7a091c2 <+66>: test %r13,%r13
> 0x00007ffff7a091c5 <+69>: je 0x7ffff7a0923f <GC_malloc+191>
> 0x00007ffff7a091c7 <+71>: add $0xf,%r12d
> 0x00007ffff7a091cb <+75>: movslq %r12d,%r12
> 0x00007ffff7a091ce <+78>: lea (%rbx,%r12,1),%r12
> 0x00007ffff7a091d2 <+82>: shr $0x4,%r12
> 0x00007ffff7a091d6 <+86>: cmp $0x18,%r12
> 0x00007ffff7a091da <+90>: ja 0x7ffff7a0923f <GC_malloc+191>
> 0x00007ffff7a091dc <+92>: lea 0x18(%r12),%r14
> 0x00007ffff7a091e1 <+97>: mov %r12,%rbp
> 0x00007ffff7a091e4 <+100>: mov $0x10,%r15d
> 0x00007ffff7a091ea <+106>: shl $0x4,%rbp
> 0x00007ffff7a091ee <+110>: mov 0x8(%r13,%r14,8),%rax
> 0x00007ffff7a091f3 <+115>: lea 0x8(%r13,%r14,8),%rcx
> 0x00007ffff7a091f8 <+120>: cmp $0x11a,%rax
> 0x00007ffff7a091fe <+126>: jbe 0x7ffff7a09269 <GC_malloc+233>
> => 0x00007ffff7a09200 <+128>: mov (%rax),%rdx
> 0x00007ffff7a09203 <+131>: mov %rdx,0x8(%r13,%r14,8)
> 0x00007ffff7a09208 <+136>: prefetcht0 (%rdx)
> 0x00007ffff7a0920b <+139>: movq $0x0,(%rax)
> 0x00007ffff7a09212 <+146>: mov 0x18(%rsp),%rbx
> 0x00007ffff7a09217 <+151>: mov 0x20(%rsp),%rbp
> 0x00007ffff7a0921c <+156>: mov 0x28(%rsp),%r12
> 0x00007ffff7a09221 <+161>: mov 0x30(%rsp),%r13
> 0x00007ffff7a09226 <+166>: mov 0x38(%rsp),%r14
> 0x00007ffff7a0922b <+171>: mov 0x40(%rsp),%r15
> 0x00007ffff7a09230 <+176>: add $0x48,%rsp
> 0x00007ffff7a09234 <+180>: retq
>
> (gdb) info registers
> rax 0x1000 4096
> rbx 0x10 16
> rcx 0x6b1650 7018064
> rdx 0x2 2
> rsi 0x0 0
> rdi 0x6027e0 6301664
> rbp 0x10 0x10
> rsp 0x7ffff740acc0 0x7ffff740acc0
> r8 0x7ffff7d490b0 140737351291056
> r9 0x0 0
> r10 0x7ffff740aba0 140737341598624
> r11 0x7ffff7a097c0 140737347884992
> r12 0x1 1
> r13 0x6b1580 7017856
> r14 0x19 25
> r15 0x10 16
> rip 0x7ffff7a09200 0x7ffff7a09200 <GC_malloc+128>
> eflags 0x10212 [ AF IF RF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
>
> This is with current CVS.
>
> Regards,
>
> Andy
> --
> http://wingolog.org/
> _______________________________________________
> Gc mailing list
> Gc-V9/bV5choksm30D7ZfaTJw@public.gmane.org
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Gc] further pthread foo
2011-03-19 21:25 ` Ivan Maidanski
@ 2011-03-19 23:34 ` Andy Wingo
0 siblings, 0 replies; 3+ messages in thread
From: Andy Wingo @ 2011-03-19 23:34 UTC (permalink / raw)
To: Ivan Maidanski; +Cc: bug-guile, gc
Hi Ivan,
On Sat 19 Mar 2011 22:25, Ivan Maidanski <ivmai@mail.ru> writes:
> Try to compile libgc with -DGC_ASSERTIONS but without
> -DTHREAD_LOCAL_ALLOC -DPARALLEL_MARK.
OK, will do. Thanks for the suggestion, and sorry for the burden. You
must get the worst bugs!
> BTW. Why do you use GC_call_with_gc_active()? It should be no-op in your
> case - the thread is stopped and scanned after you call
> GC_register_my_thread. (GC_call_with_gc_active is used primarily inside
> GC_do_blocking calls).
We have scm_with_guile and scm_without_guile, which invoke a procedure
in and out of Guile mode. scm_with_guile nests as you would think it
would, and scm_without_guile can only be called in Guile mode.
If a thread is not in Guile mode, it shouldn't be active for GC purposes
-- shouldn't be in the thread set to stop -- so it goes through a
do_blocking. scm_with_guile therefore goes through a
GC_call_with_gc_active, even in the case that it's not in the extent of
a GC_do_blocking context call.
So yes, it's a no-op, and harmles in this case.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-03-19 23:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-18 22:26 further pthread foo Andy Wingo
[not found] ` <m339mkxdyh.fsf-CaTCM8lwFkgB9AHHLWeGtNQXobZC6xk2@public.gmane.org>
2011-03-19 21:25 ` Ivan Maidanski
2011-03-19 23:34 ` [Gc] " Andy Wingo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).