unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* crash in gc with upside-down stack
@ 2008-11-12  4:04 Linas Vepstas
  2008-11-13  5:21 ` Linas Vepstas
  0 siblings, 1 reply; 3+ messages in thread
From: Linas Vepstas @ 2008-11-12  4:04 UTC (permalink / raw)
  To: guile-user, bug-guile; +Cc: Neil Jerram

Here's another one, I'm trying to dig into this:

Its more or less the same crash as the one  reported at:

http://bugs.gentoo.org/228097
and
http://www.mail-archive.com/bug-guile@gnu.org/msg04568.html

My stack below.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf5333b90 (LWP 20587)]
0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
435	      SCM obj = * (SCM *) &x[m];
Current language:  auto; currently c
(gdb) bt
#0  0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
    at gc-mark.c:435
#1  0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
#2  0xf7711d38 in scm_mark_all () at gc-mark.c:82
#3  0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
#4  0xf7710f4d in scm_gc_for_newcell (freelist=0xf779b76c,
free_cells=0x1228e9b0)
    at gc.c:509
#5  0xf7768bd8 in scm_c_catch (tag=0x104, body=0xf76f3830 <c_body>,
    body_data=0xf5333328, handler=0xf76f3850 <c_handler>,
    handler_data=0xf5333328,
    pre_unwind_handler=0xf77683e0 <scm_handle_by_message_noexit>,
    pre_unwind_handler_data=0x0) at ../libguile/inline.h:186
#6  0xf76f3cf2 in scm_i_with_continuation_barrier (body=0xf76f3830 <c_body>,
    body_data=0xf5333328, handler=0xf76f3850 <c_handler>,
    handler_data=0xf5333328,
    pre_unwind_handler=0xf77683e0 <scm_handle_by_message_noexit>,
    pre_unwind_handler_data=0x0) at continuations.c:326
#7  0xf76f3dd3 in scm_c_with_continuation_barrier (
    func=0xf7767ab0 <do_thread_exit>, data=0x1228e938) at continuations.c:368
---Type <return> to continue, or q <return> to quit---
#8  0xf77678f9 in scm_i_with_guile_and_parent (func=0xf7767ab0
<do_thread_exit>,
    data=0x1228e938, parent=0x19f63670) at threads.c:695
#9  0xf77679ee in scm_with_guile (func=0xf7767ab0 <do_thread_exit>,
    data=0x1228e938) at threads.c:683
#10 0xf7767a43 in on_thread_exit (v=0x1228e938) at threads.c:505
#11 0xf7d7abb0 in __nptl_deallocate_tsd ()
   from /lib/tls/i686/cmov/libpthread.so.0
#12 0xf7d7b509 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#13 0xf7b79e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb)

I've seen this twice now in two days, but its not readily reproducible.
By plugging in the insanely large n into a hex calc, you'll see its actually
0xfffsomething. Looking carefully near  threads.c:1375 seems to imply
that stack top and stack bottom are reversed. So I added a printf at that
location, and tried to reproduce the crash. Several gazzilion print
statements later, no crash.

I suspect that this is some sort of thread-race condition; I think it
happens when I am defining some functions from several different
threads at once. It seems *not* to occur once I get into hard-core
computations-- i.e. it happens no later than the first few dozen gc's.

This is on guile-1.8.5, --with-threads, on Ubuntu, Intel (actually AMD64 cpu.)

--linas




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: crash in gc with upside-down stack
  2008-11-12  4:04 crash in gc with upside-down stack Linas Vepstas
@ 2008-11-13  5:21 ` Linas Vepstas
  2008-11-13 19:27   ` Linas Vepstas
  0 siblings, 1 reply; 3+ messages in thread
From: Linas Vepstas @ 2008-11-13  5:21 UTC (permalink / raw)
  To: bug-guile

Some minor updates:

2008/11/11 Linas Vepstas <linasvepstas@gmail.com>:
>
> My stack below.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xf5333b90 (LWP 20587)]
> 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
> 435           SCM obj = * (SCM *) &x[m];
> Current language:  auto; currently c
> (gdb) bt
> #0  0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
>    at gc-mark.c:435
> #1  0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
> #2  0xf7711d38 in scm_mark_all () at gc-mark.c:82
> #3  0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598

My current code reproduces this fairly readily, I am seeing
it many dozens/hundreds of times a day.

I tweaked guile to check that the stack bounds are in order,
and to print an error message when they are, and then to
just troop on -- and so I see dozens/hundreds of prints.
When the stack bounds are reversed, the difference
is *always* 58 bytes; and in fact, the two bad stack
bounds are always the same.

It appears to happen *only* when I have multiple threads
all trying to define functions at the same time, it never
happens when one thread goes off to do some heavy
computing.

--linas




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: crash in gc with upside-down stack
  2008-11-13  5:21 ` Linas Vepstas
@ 2008-11-13 19:27   ` Linas Vepstas
  0 siblings, 0 replies; 3+ messages in thread
From: Linas Vepstas @ 2008-11-13 19:27 UTC (permalink / raw)
  To: bug-guile

Attached below is a debugging patch, and its output,
which shows that the stack bounds are frequently
up-side-down, and are sometimes upside-down
when the GC runs, thus leading to a crash.

In the next email, I'll propose a patch that fixes the
the problem.

The original problem report:

> 2008/11/11 Linas Vepstas <linasvepstas@gmail.com>:
>>
>> My stack below.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xf5333b90 (LWP 20587)]
>> 0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782) at gc-mark.c:435
>> 435           SCM obj = * (SCM *) &x[m];
>> Current language:  auto; currently c
>> (gdb) bt
>> #0  0xf7711ce3 in scm_mark_locations (x=0xf5333110, n=4294966782)
>>    at gc-mark.c:435
>> #1  0xf7766a12 in scm_threads_mark_stacks () at threads.c:1375
>> #2  0xf7711d38 in scm_mark_all () at gc-mark.c:82
>> #3  0xf7710d33 in scm_i_gc (what=0xf778602e "cells") at gc.c:598
>

A debugging patch. Yes, its ugly, its intentionally ugly.
More of an eye-catcher that way.

Index: guile-1.8.5/libguile/threads.c
===================================================================
--- guile-1.8.5.orig/libguile/threads.c	2008-11-13 07:58:22.000000000 -0600
+++ guile-1.8.5/libguile/threads.c	2008-11-13 13:14:00.000000000 -0600
@@ -395,6 +395,10 @@ static scm_t_guile_ticket
 scm_leave_guile ()
 {
   scm_i_thread *t = suspend ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_leav_guile backwards stack %d\n", sz);
+}
   scm_i_pthread_mutex_unlock (&t->heap_mutex);
   return (scm_t_guile_ticket) t;
 }
@@ -694,7 +698,15 @@ scm_i_with_guile_and_parent (void *(*fun
   really_entered = scm_i_init_thread_for_guile (&base_item, parent);
   res = scm_c_with_continuation_barrier (func, data);
   if (really_entered)
-    scm_leave_guile ();
+{
+    // scm_leave_guile ();
+scm_i_thread * t = (scm_i_thread *) scm_leave_guile ();
+int sz=t->base - t->top;
+int szb=t->base - &base_item;
+if(0>sz) {
+printf("duuude scm_leav_guile and parent %d %d\n", sz, szb);
+}
+}
   return res;
 }

@@ -704,6 +716,11 @@ scm_without_guile (void *(*func)(void *)
   void *res;
   scm_t_guile_ticket t;
   t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_wo guile %d\n", sz);
+}
   res = func (data);
   scm_enter_guile (t);
   return res;
@@ -1371,8 +1388,15 @@ scm_threads_mark_stacks (void)

 #if SCM_STACK_GROWS_UP
       scm_mark_locations (t->base, t->top - t->base);
+xxxx
 #else
+int sz=t->base - t->top;
+if(0<=sz) {
       scm_mark_locations (t->top, t->base - t->top);
+} else {
+printf ("duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!\n");
+printf ("duude stack top=%p base=%p sz=%d\n", t->top, t->base,
t->base - t->top);
+}
 #endif
       scm_mark_locations ((SCM_STACKITEM *) t->regs,
 			  ((size_t) sizeof(t->regs)
@@ -1441,6 +1465,11 @@ int
 scm_pthread_mutex_lock (scm_i_pthread_mutex_t *mutex)
 {
   scm_t_guile_ticket t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_mutexe %d\n", sz);
+}
   int res = scm_i_pthread_mutex_lock (mutex);
   scm_enter_guile (t);
   return res;
@@ -1463,6 +1492,11 @@ int
 scm_pthread_cond_wait (scm_i_pthread_cond_t *cond,
scm_i_pthread_mutex_t *mutex)
 {
   scm_t_guile_ticket t = scm_leave_guile ();
+scm_i_thread * s = (scm_i_thread *) t;
+int sz=s->base - s->top;
+if(0>sz) {
+printf("duuude scm_conde %d\n", sz);
+}
   int res = scm_i_pthread_cond_wait (cond, mutex);
   scm_enter_guile (t);
   return res;
@@ -1578,7 +1612,12 @@ scm_i_thread_put_to_sleep ()
     {
       scm_i_thread *t;

-      scm_leave_guile ();
+      // scm_leave_guile ();
+   t = (scm_i_thread *) scm_leave_guile ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_leav_guile backwards was scm_i_thread_put_to_sleep
%d\n", sz);
+}
       scm_i_pthread_mutex_lock (&thread_admin_mutex);

       /* Signal all threads to go to sleep
@@ -1620,6 +1659,10 @@ void
 scm_i_thread_sleep_for_gc ()
 {
   scm_i_thread *t = suspend ();
+int sz=t->base - t->top;
+if(0>sz) {
+printf("duuude scm_i_thread_sleep_for_gc backwards stack %d\n", sz);
+}
   scm_i_pthread_cond_wait (&wake_up_cond, &t->heap_mutex);
   resume (t);
 }


Here is an example of the output generated:

duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf355b9e0 base=0xf355b908 sz=-54
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf355b9e0 base=0xf355b908 sz=-54
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54

etc... lots of the above. Also, here's a typical
thread exit:

duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duude thread
 going away on f5f09b90
duuude scm_leav_guile backwards stack -518
duuude scm_mutexe -518
duuude scm_leav_guile backwards stack -522
duuude scm_mutexe -522
duuude scm_leav_guile backwards stack -638
duuude scm_leav_guile and parent -638 -660
[Thread 0xf5f09b90 (LWP 25812) exited]
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76

Sometimes the gc happens while the thread is going away:

duuude scm_leav_guile and parent -54 -76
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duude thread going away on f5708b90
duuude scm_leav_guile backwards stack -518
duuude scm_mutexe -518
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf2d5a9e0 base=0xf2d5a908 sz=-54
duuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuude bugg!!
duude stack top=0xf5708120 base=0xf5707908 sz=-518
duuude scm_leav_guile backwards stack -522
duuude scm_mutexe -522
duuude scm_leav_guile backwards stack -638
duuude scm_leav_guile and parent -638 -660
[Thread 0xf5708b90 (LWP 25826) exited]
duuude scm_leav_guile backwards stack -54
duuude scm_leav_guile and parent -54 -76




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-13 19:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12  4:04 crash in gc with upside-down stack Linas Vepstas
2008-11-13  5:21 ` Linas Vepstas
2008-11-13 19:27   ` Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).