intermittent segfaults in master

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* intermittent segfaults in master
@ 2009-10-24 13:30 Andy Wingo, n
  2009-10-24 21:56 ` Ken Raeburn
  0 siblings, 1 reply; 3+ messages in thread
From: Andy Wingo, n @ 2009-10-24 13:30 UTC (permalink / raw)
  To: guile-devel

Hello,

I have been experiencing intermittent segfaults recently, as I worked on
wip-case-lambda. They would almost always go away immediately -- as in,
while rebuilding guile, the process would stop because of a segfault,
but I could type make again and it would succeed.

Here is one core dump I was investigating:

    http://paste.lisp.org/display/88926

The odd thing is that we have a NULL value in there, as the car of a
cell. Here's the top of the backtrace:

#0  scm_is_pair (x=<value optimized out>) at ../libguile/inline.h:293
#1  scm_sloppy_assq (x=<value optimized out>) at alist.c:58
#2  0x00e113c4 in scm_assq_ref (alist=0x98f8458, key=0x976edf0) at alist.c:209
#3  0x00e6c843 in scm_procedure_property (proc=0x989ee00, key=0x976edf0) at procprop.c:207

Now in wip-case-lambda, some things changed regarding procedure
properties. Instead of having a strange "standin closure" thing, for
non-closure procedures, properties get stored in a weak hash table. So
that assq is in a value that we (probably; there is another case there)
just pulled out of a doubly-weak hash table. So could it be somehow that
one of those links just got nulled by a call to GC_malloc, perhaps by
another thread?

For the meantime I could just make this a key-weak hash table. But this
seems like the kind of problem that could hit user code. Ludovic I think
you will start to see these crashes now that case-lambda was merged (and
specifically 56164a5a). Would you be on the lookout for this kind of
problem, and in contact with the libgc list? If this analysis is correct
anyway, it's very possibly I have misinterpreted things.

Regards,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: intermittent segfaults in master
  2009-10-24 13:30 intermittent segfaults in master Andy Wingo, n
@ 2009-10-24 21:56 ` Ken Raeburn
  2009-11-03 23:58   ` Neil Jerram
  0 siblings, 1 reply; 3+ messages in thread
From: Ken Raeburn @ 2009-10-24 21:56 UTC (permalink / raw)
  To: guile-devel

On Oct 24, 2009, at 09:30, Andy Wingo , n@a-pb-sasl-sd.pobox.com wrote:
> I have been experiencing intermittent segfaults recently, as I  
> worked on
> wip-case-lambda. They would almost always go away immediately -- as  
> in,
> while rebuilding guile, the process would stop because of a segfault,
> but I could type make again and it would succeed.

I've been seeing intermittent faults too, while working on the trunk  
and building with -DSCM_DEBUG=1.

> For the meantime I could just make this a key-weak hash table. But  
> this
> seems like the kind of problem that could hit user code. Ludovic I  
> think
> you will start to see these crashes now that case-lambda was merged  
> (and
> specifically 56164a5a). Would you be on the lookout for this kind of
> problem, and in contact with the libgc list? If this analysis is  
> correct
> anyway, it's very possibly I have misinterpreted things.

My guess is we want key-weak for that hash table anyways.

But, I've been able to generate a crash even with this patch in.  This  
is on Mac OS X (10.5.8), libgc 7.1 (as installed by macports), guile  
commit id 15ab466, plus the SCM_DEBUG patches I submitted before.   
(This particular set of stack traces is from binaries built without  
SCM_DEBUG, though the SCM_DEBUG version also shows the bug  
intermittently.)

The code:

(call-with-new-thread (lambda () (while #t (gc))) (lambda () #f))
(let ((h (make-doubly-weak-hash-table 0)))
   (while #t
	 (hashq-set! h 'proc
		     (assq-set! (hashq-ref h 'proc '()) 'akey (list 1)))
	 (hashq-set! h 'proc
		     (assq-set! (hashq-ref h 'proc '()) 'akey2 (list 1)))
	 (assq-ref (hashq-ref h 'proc '()) 'akey)
	 (assq-ref (hashq-ref h 'proc '()) 'akey3)
	 (display ".")))

It can take a while to trigger the problem, and I'm not sure it even  
happens on every invocation; I usually quit the test after several  
minutes if it hasn't shown the problem, but sometimes simply starting  
it again triggers it fairly quickly.  It wouldn't surprise me if it's  
also OS-, CPU-, and compiler-dependent.

I don't know if the separate GC thread is necessary.  It wasn't in my  
original test case, but simplifying the test case seems to have made  
it harder to actually trigger the problem; I thought forcing excessive  
GC invocations might help, and I think it has, though that's just a  
subjective impression.

A trace of the crashing thread:

(gdb) bt 10
#0  0x0014af5b in scm_is_pair [inlined] () at inline.h:61
#1  0x0014af5b in scm_sloppy_assq (key=0x10b78d0, alist=0x0) at ../../ 
guile/libguile/alist.c:61
#2  0x0014b3b4 in scm_is_pair [inlined] () at inline.h:272
#3  0x0014b3b4 in scm_assq_set_x (alist=0x10858e0, key=0x10b78d0,  
val=0x1085fd0) at ../../guile/libguile/alist.c:61
#4  0x0016dcda in scm_dapply (proc=<value temporarily unavailable, due  
to optimizations>, arg1=0x1085fb0, args=<value temporarily  
unavailable, due to optimizations>) at ../../guile/libguile/alist.c:61
#5  0x001e90ac in vm_debug_engine (vp=0x597fa0, program=<value  
temporarily unavailable, due to optimizations>, argv=0x0, nargs=<value  
temporarily unavailable, due to optimizations>) at ../../guile/ 
libguile/alist.c:61
#6  0x001732f1 in scm_call_0 (proc=0x10b7790) at ../../guile/libguile/ 
alist.c:61
#7  0x001d9ad5 in scm_c_catch (tag=0x10858d8, body=0x1da070  
<scm_body_thunk>, body_data=0xbfffe9b8, handler=0x1da090  
<scm_handle_by_proc>, handler_data=0xbfffe9d8,  
pre_unwind_handler=0x10858d8, pre_unwind_handler_data=0x10858d8)  
at ../../guile/libguile/alist.c:61
#8  0x001da229 in scm_catch_with_pre_unwind_handler (key=0x10b7950,  
thunk=0x10b7790, handler=0x10b7740, pre_unwind_handler=0x204) at ../../ 
guile/libguile/alist.c:61
#9  0x00182261 in gsubr_apply_raw (proc=0x56ff50, argc=<value  
temporarily unavailable, due to optimizations>, argv=0xbfffea5c)  
at ../../guile/libguile/alist.c:61
[...]

The data being examined:

(gdb) fr 3
#3  scm_assq_set_x (alist=0x10858e0, key=0x10b78d0, val=0x1085fd0)  
at ../../guile/libguile/alist.c:61
273	  if (scm_is_pair (handle))
(gdb) p alist
$5 = (SCM) 0x10858e0
(gdb) p (SCM*)$5
$6 = (SCM *) 0x10858e0
(gdb) p $6[0]
$7 = (SCM) 0x10858d8
(gdb) p $6[1]
$8 = (SCM) 0x0

The garbage collection thread:

#0  0x9186729e in semaphore_signal_trap ()
#1  0x9186f04d in pthread_mutex_unlock ()
#2  0x0029707e in GC_try_to_collect ()
#3  0x002970db in GC_gcollect ()
#4  0x00178a0e in scm_gc () at ../../guile/libguile/gc.c:390
#5  0x0016d7c2 in scm_dapply (proc=<value temporarily unavailable, due  
to optimizations>, arg1=0xa01cd584, args=<value temporarily  
unavailable, due to optimizations>) at eval.i.c:1754
#6  0x001e96c0 in vm_debug_engine (vp=0x597dc0, program=<value  
temporarily unavailable, due to optimizations>, argv=0x0, nargs=<value  
temporarily unavailable, due to optimizations>) at vm-i-system.c:919
#7  0x001732f1 in scm_call_0 (proc=0x1000a30) at ../../guile/libguile/ 
eval.c:3113
#8  0x001d9ad5 in scm_c_catch (tag=0x0, body=0x1da070  
<scm_body_thunk>, body_data=0xb00807c8, handler=0x1da090  
<scm_handle_by_proc>, handler_data=0xb00807e8, pre_unwind_handler=0,  
pre_unwind_handler_data=0x0) at ../../guile/libguile/throw.c:243
#9  0x001da229 in scm_catch_with_pre_unwind_handler (key=0x1000a40,  
thunk=0x1000a30, handler=0x10b7ae0, pre_unwind_handler=0x204) at ../../ 
guile/libguile/throw.c:627
[...]

Ken




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: intermittent segfaults in master
  2009-10-24 21:56 ` Ken Raeburn
@ 2009-11-03 23:58   ` Neil Jerram
  0 siblings, 0 replies; 3+ messages in thread
From: Neil Jerram @ 2009-11-03 23:58 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: guile-devel

Ken Raeburn <raeburn@raeburn.org> writes:

> On Oct 24, 2009, at 09:30, Andy Wingo , n@a-pb-sasl-sd.pobox.com wrote:
>> I have been experiencing intermittent segfaults recently, as I
>> worked on
>> wip-case-lambda. They would almost always go away immediately -- as
>> in,
>> while rebuilding guile, the process would stop because of a segfault,
>> but I could type make again and it would succeed.
>
> I've been seeing intermittent faults too, while working on the trunk
> and building with -DSCM_DEBUG=1.

FWIW I got a segfault that could be this same problem, in my build on
Monday morning:

cat alist.doc arbiters.doc [...] regex-posix.doc | GUILE_AUTO_COMPILE=0 ../meta/uninstalled-env guile-tools snarf-check-and-output-texi          > guile-procedures.texi || { rm guile-procedures.texi; false; }
/bin/sh: line 1:  6408 Broken pipe             cat alist.doc arbiters.doc [...] regex-posix.doc
      6409 Segmentation fault      | GUILE_AUTO_COMPILE=0 ../meta/uninstalled-env guile-tools snarf-check-and-output-texi > guile-procedures.texi
make[3]: *** [guile-procedures.texi] Error 1
make[3]: Leaving directory `/home/neil/SW/Guile/ovnight/libguile'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/neil/SW/Guile/ovnight/libguile'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/neil/SW/Guile/ovnight'
make: *** [all] Error 2

Interestingly, the build then tries to run the built guile again almost
immediately, for a different purpose, and that failed too:

=== API listing
../maint/make-snap: line 3:  6445 Segmentation fault      $uninstalled_env ../maint/objd.scm

Maybe it was just unlucky to fault twice, but maybe there's a hint here
that tendency-to-fault could be a property of a particular built guile.

      Neil




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-11-03 23:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-24 13:30 intermittent segfaults in master Andy Wingo, n
2009-10-24 21:56 ` Ken Raeburn
2009-11-03 23:58   ` Neil Jerram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).