* crash in gc with upside-down stack
@ 2008-11-12 4:04 Linas Vepstas
2008-11-13 5:21 ` Linas Vepstas
0 siblings, 1 reply; 3+ messages in thread
From: Linas Vepstas @ 2008-11-12 4:04 UTC (permalink / raw)
To: guile-user, bug-guile; +Cc: Neil Jerram
Here's another one, I'm trying to dig into this:
Its more or less the same crash as the one reported at:
http://bugs.gentoo.org/228097
and
http://www.mail-archive.com/bug-guile@gnu.org/msg04568.html
My stack below.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf5333b90 (LWP 20587)]
0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
435 SCM obj = * (SCM *) &x[m];
Current language: auto; currently c
(gdb) bt
#0 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
at gc-mark.c:435
#1 0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
#2 0xf7711d38 in scm_mark_all () at gc-mark.c:82
#3 0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
#4 0xf7710f4d in scm_gc_for_newcell (freelist=0xf779b76c,
free_cells=0x1228e9b0)
at gc.c:509
#5 0xf7768bd8 in scm_c_catch (tag=0x104, body=0xf76f3830 <c_body>,
body_data=0xf5333328, handler=0xf76f3850 <c_handler>,
handler_data=0xf5333328,
pre_unwind_handler=0xf77683e0 <scm_handle_by_message_noexit>,
pre_unwind_handler_data=0x0) at ../libguile/inline.h:186
#6 0xf76f3cf2 in scm_i_with_continuation_barrier (body=0xf76f3830 <c_body>,
body_data=0xf5333328, handler=0xf76f3850 <c_handler>,
handler_data=0xf5333328,
pre_unwind_handler=0xf77683e0 <scm_handle_by_message_noexit>,
pre_unwind_handler_data=0x0) at continuations.c:326
#7 0xf76f3dd3 in scm_c_with_continuation_barrier (
func=0xf7767ab0 <do_thread_exit>, data=0x1228e938) at continuations.c:368
---Type <return> to continue, or q <return> to quit---
#8 0xf77678f9 in scm_i_with_guile_and_parent (func=0xf7767ab0
<do_thread_exit>,
data=0x1228e938, parent=0x19f63670) at threads.c:695
#9 0xf77679ee in scm_with_guile (func=0xf7767ab0 <do_thread_exit>,
data=0x1228e938) at threads.c:683
#10 0xf7767a43 in on_thread_exit (v=0x1228e938) at threads.c:505
#11 0xf7d7abb0 in __nptl_deallocate_tsd ()
from /lib/tls/i686/cmov/libpthread.so.0
#12 0xf7d7b509 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#13 0xf7b79e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb)
I've seen this twice now in two days, but its not readily reproducible.
By plugging in the insanely large n into a hex calc, you'll see its actually
0xfffsomething. Looking carefully near threads.c:1375 seems to imply
that stack top and stack bottom are reversed. So I added a printf at that
location, and tried to reproduce the crash. Several gazzilion print
statements later, no crash.
I suspect that this is some sort of thread-race condition; I think it
happens when I am defining some functions from several different
threads at once. It seems *not* to occur once I get into hard-core
computations-- i.e. it happens no later than the first few dozen gc's.
This is on guile-1.8.5, --with-threads, on Ubuntu, Intel (actually AMD64 cpu.)
--linas
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: crash in gc with upside-down stack
2008-11-12 4:04 crash in gc with upside-down stack Linas Vepstas
@ 2008-11-13 5:21 ` Linas Vepstas
2008-11-13 19:27 ` Linas Vepstas
0 siblings, 1 reply; 3+ messages in thread
From: Linas Vepstas @ 2008-11-13 5:21 UTC (permalink / raw)
To: bug-guile
Some minor updates:
2008/11/11 Linas Vepstas <linasvepstas@gmail.com>:
>
> My stack below.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xf5333b90 (LWP 20587)]
> 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
> 435 SCM obj = * (SCM *) &x[m];
> Current language: auto; currently c
> (gdb) bt
> #0 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
> at gc-mark.c:435
> #1 0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
> #2 0xf7711d38 in scm_mark_all () at gc-mark.c:82
> #3 0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
My current code reproduces this fairly readily, I am seeing
it many dozens/hundreds of times a day.
I tweaked guile to check that the stack bounds are in order,
and to print an error message when they are, and then to
just troop on -- and so I see dozens/hundreds of prints.
When the stack bounds are reversed, the difference
is *always* 58 bytes; and in fact, the two bad stack
bounds are always the same.
It appears to happen *only* when I have multiple threads
all trying to define functions at the same time, it never
happens when one thread goes off to do some heavy
computing.
--linas
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: crash in gc with upside-down stack
2008-11-13 5:21 ` Linas Vepstas
@ 2008-11-13 19:27 ` Linas Vepstas
0 siblings, 0 replies; 3+ messages in thread
From: Linas Vepstas @ 2008-11-13 19:27 UTC (permalink / raw)
To: bug-guile
Attached below is a debugging patch, and its output,
which shows that the stack bounds are frequently
up-side-down, and are sometimes upside-down
when the GC runs, thus leading to a crash.
In the next email, I'll propose a patch that fixes the
the problem.
The original problem report:
> 2008/11/11 Linas Vepstas <linasvepstas@gmail.com>:
>>
>> My stack below.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xf5333b90 (LWP 20587)]
>> 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
>> 435 SCM obj = * (SCM *) &x[m];
>> Current language: auto; currently c
>> (gdb) bt
>> #0 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
>> at gc-mark.c:435
>> #1 0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
>> #2 0xf7711d38 in scm_mark_all () at gc-mark.c:82
>> #3 0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
>
A debugging patch. Yes, its ugly, its intentionally ugly.
More of an eye-catcher that way.
Index: guile-1.8.5/libguile/threads.c
===================================================================
--- guile-1.8.5.orig/libguile/threads.c 2008-11-13 07:58:22.000000000 -0600
+++ guile-1.8.5/libguile/threads.c 2008-11-13 13:14:00.000000000 -0600
@@ -395,6 +395,10 @@ static scm_t_guile_ticket
scm_leave_guile ()
{
scm_i_thread *t = suspend ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_leav_guile backwards stack %d\n", sz);
+}
scm_i_pthread_mutex_unlock (&t->heap_mutex);
return (scm_t_guile_ticket) t;
}
@@ -694,7 +698,15 @@ scm_i_with_guile_and_parent (void *(*fun
really_entered = scm_i_init_thread_for_guile (&base_item, parent);
res = scm_c_with_continuation_barrier (func, data);
if (really_entered)
- scm_leave_guile ();
+{
+ // scm_leave_guile ();
+scm_i_thread * t = (scm_i_thread *) scm_leave_guile ();
+int sz=t->base - t->top;
+int szb=t->base - &base_item;
+if(0>sz) {
+printf("duuude scm_leav_guile and parent %d %d\n", sz, szb);
+}
+}
return res;
}
@@ -704,6 +716,11 @@ scm_without_guile (void *(*func)(void *)
void *res;
scm_t_guile_ticket t;
t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_wo guile %d\n", sz);
+}
res = func (data);
scm_enter_guile (t);
return res;
@@ -1371,8 +1388,15 @@ scm_threads_mark_stacks (void)
#if SCM_STACK_GROWS_UP
scm_mark_locations (t->base, t->top - t->base);
+xxxx
#else
+int sz=t->base - t->top;
+if(0<=sz) {
scm_mark_locations (t->top, t->base - t->top);
+} else {
+printf ("duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!\n");
+printf ("duude stack top=%p base=%p sz=%d\n", t->top, t->base,
t->base - t->top);
+}
#endif
scm_mark_locations ((SCM_STACKITEM *) t->regs,
((size_t) sizeof(t->regs)
@@ -1441,6 +1465,11 @@ int
scm_pthread_mutex_lock (scm_i_pthread_mutex_t *mutex)
{
scm_t_guile_ticket t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_mutexe %d\n", sz);
+}
int res = scm_i_pthread_mutex_lock (mutex);
scm_enter_guile (t);
return res;
@@ -1463,6 +1492,11 @@ int
scm_pthread_cond_wait (scm_i_pthread_cond_t *cond,
scm_i_pthread_mutex_t *mutex)
{
scm_t_guile_ticket t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_conde %d\n", sz);
+}
int res = scm_i_pthread_cond_wait (cond, mutex);
scm_enter_guile (t);
return res;
@@ -1578,7 +1612,12 @@ scm_i_thread_put_to_sleep ()
{
scm_i_thread *t;
- scm_leave_guile ();
+ // scm_leave_guile ();
+ t = (scm_i_thread *) scm_leave_guile ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_leav_guile backwards was scm_i_thread_put_to_sleep
%d\n", sz);
+}
scm_i_pthread_mutex_lock (&thread_admin_mutex);
/* Signal all threads to go to sleep
@@ -1620,6 +1659,10 @@ void
scm_i_thread_sleep_for_gc ()
{
scm_i_thread *t = suspend ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_i_thread_sleep_for_gc backwards stack %d\n", sz);
+}
scm_i_pthread_cond_wait (&wake_up_cond, &t->heap_mutex);
resume (t);
}
Here is an example of the output generated:
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf355b9e0 base=0xf355b908 sz=-54
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf355b9e0 base=0xf355b908 sz=-54
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
etc... lots of the above. Also, here's a typical
thread exit:
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duude thread
going away on f5f09b90
duuude scm_leav_guile backwards stack -518
duuude scm_mutexe -518
duuude scm_leav_guile backwards stack -522
duuude scm_mutexe -522
duuude scm_leav_guile backwards stack -638
duuude scm_leav_guile and parent -638 -660
[Thread 0xf5f09b90 (LWP 25812) exited]
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
Sometimes the gc happens while the thread is going away:
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duude thread going away on f5708b90
duuude scm_leav_guile backwards stack -518
duuude scm_mutexe -518
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf2d5a9e0 base=0xf2d5a908 sz=-54
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf5708120 base=0xf5707908 sz=-518
duuude scm_leav_guile backwards stack -522
duuude scm_mutexe -522
duuude scm_leav_guile backwards stack -638
duuude scm_leav_guile and parent -638 -660
[Thread 0xf5708b90 (LWP 25826) exited]
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-11-13 19:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 4:04 crash in gc with upside-down stack Linas Vepstas
2008-11-13 5:21 ` Linas Vepstas
2008-11-13 19:27 ` Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).