Below was for guile (GNU Guile) 2.9.2.14-1fb399 --linas On Sun, Jul 14, 2019 at 4:59 PM Linas Vepstas wrote: > > So, here's my next installment on using guile-2.9.2. The first installment > said that I'd piled up CPU-months of guile 2.9.2 experience without any > crashes. Well, now, a different workload crashes in minutes. Below is a > highly simplified, edited gdb session -- it crashes because it unexpectedly > aborts, during an abort(!) because `get_callee_vcode()` failed. Harrumpf. > > Background: there are 140 threads, half in guile, the other half waiting > for guile to finish. Yes, that's too many, but anyways ... 70 threads in > guile and one crashed: > > #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( > current_registers=, dynstack=, > saved_registers=, saved_mra=, > saved_fp=, vp=) at > ../../libguile/vm.c:1327 > #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) > at ../../libguile/vm.c:1454 > > Both frames are interesting, because libguile/vm.c:1327 shows > if (SCM_FRAME_DYNAMIC_LINK (base_fp) != saved_fp) > abort(); > hey!? who called this? line 1454 is in the middle of abort_to_prompt () > Yow! an unexpected abort during an abort... > > How did we get here? > #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, > subr=, message=message@entry=0x1607c9380, > args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) > at ../../libguile/error.c:90 > #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, > message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", > args=0x15af130e0, > rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 > #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) > at ../../libguile/vm.c:1527 > > and libguile/vm.c:1527 tells me that get_callee_vcode () is very unhappy. > But why? I cannot tell .. after that, things peter out in boring stack > frames that started with my call scm_c_catch() ... the same seemingly > harmless call that is pending in 70 other threads. (the same call that has > survived several CPU month of pounding with a different collection of > scheme code) > > My best guess is that the current workload, by unintentionally launching > gobs of threads is exposing a race condition that has been hithertho > hidden. I don't know how to debug any further. I will try a slightly > newer guile shortly, to see if I get lucky. > > -- Linas > > p.s. here's the whole stack trace. But really, its boring, except for the > above highlights. > > > > (gdb) bt > #0 0x00007f85d38ef428 in __GI_raise (sig=sig@entry=6) > at ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x00007f85d38f102a in __GI_abort () at abort.c:89 > #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( > current_registers=, dynstack=, > saved_registers=, saved_mra=, > saved_fp=, vp=) at > ../../libguile/vm.c:1327 > #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) > at ../../libguile/vm.c:1454 > #4 0x00007f85ac539041 in ?? () > #5 0x00007f85ac37f040 in ?? () > #6 0x00007f85d41e10c0 in jump_table_ () from > /usr/local/lib/libguile-3.0.so.0 > #7 0x000000015f692dc0 in ?? () > #8 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, > mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 > #9 0x00007f85d3f70600 in vm_debug_engine (thread=0x7f85ac539000) > at ../../libguile/vm-engine.c:370 > #10 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0xe45a20, > argv=, > nargs=5) at ../../libguile/vm.c:1605 > #11 0x00007f85d3eefdcb in scm_apply_0 (proc=0xe45a20, args=0x304) > at ../../libguile/eval.c:603 > #12 0x00007f85d3ef0a0d in scm_apply_1 (proc=, > arg1=arg1@entry=0xdc5420, args=args@entry=0x15af130a0) > at ../../libguile/eval.c:609 > #13 0x00007f85d3f6c546 in scm_throw (key=key@entry=0xdc5420, > args=0x15af130a0) > at ../../libguile/throw.c:272 > #14 0x00007f85d3f6caf9 in scm_ithrow (key=key@entry=0xdc5420, > args=, > no_return=no_return@entry=1) at ../../libguile/throw.c:619 > #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, > subr=, message=message@entry=0x1607c9380, > args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) > at ../../libguile/error.c:90 > ---Type to continue, or q to quit--- > #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, > message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", > args=0x15af130e0, > rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 > #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) > at ../../libguile/vm.c:1527 > #18 0x00007f85b4314805 in ?? () > #19 0x00007f85b428a000 in ?? () > #20 0x00007f85d41e10c0 in jump_table_ () from > /usr/local/lib/libguile-3.0.so.0 > #21 0x000000015f692dc0 in ?? () > #22 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, > mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 > #23 0x00007f85d3f70600 in vm_debug_engine (thread=0x2) > at ../../libguile/vm-engine.c:370 > #24 0x00007f85d3f76db2 in scm_call_n (proc=, > argv=argv@entry=0x7f7e85fe2600, nargs=nargs@entry=3) at > ../../libguile/vm.c:1605 > #25 0x00007f85d3eef97f in scm_call_3 (proc=, > arg1=, > arg2=, arg3=) at > ../../libguile/eval.c:510 > #26 0x00007f85d4262b6f in ?? () > #27 0x00007f85d4262a80 in ?? () > #28 0x00007f85d41e10c0 in jump_table_ () from > /usr/local/lib/libguile-3.0.so.0 > #29 0x000000015f692dc0 in ?? () > #30 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, > mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 > #31 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) > at ../../libguile/vm-engine.c:370 > #32 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341ee0, > argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 > #33 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341ee0) > at ../../libguile/eval.c:490 > #34 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341ee0, > handler=0x15b341ec0, pre_unwind_handler=0x15b341ea0) at > ../../libguile/throw.c:146 > #35 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, > > ---Type to continue, or q to quit--- > thunk=, handler=, > pre_unwind_handler=) at ../../libguile/throw.c:260 > #36 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, > body=, > body_data=, > handler=handler@entry=0x7f85c95d0f00 > scm_unused_struct*)>, > handler_data=handler_data@entry=0x7f7e60000980, > pre_unwind_handler=pre_unwind_handler@entry=0x7f85c95d0c40 > scm_unused_struct*)>, > pre_unwind_handler_data=0x7f7e60000980) at ../../libguile/throw.c:385 > #37 0x00007f85c95d122a in opencog::SchemeEval::do_eval > (this=0x7f7e60000980, > expr="(observe-mpg \"The countess, with her loving heart, felt that > her children were being ruined, that it was not the count's fault for he > could not help being what he was -- that (though he tried to hide "...) > at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:590 > #38 0x00007f85c95d12aa in opencog::SchemeEval::c_wrap_eval > (p=0x7f7e60000980) > at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507 > #39 0x00007f85d3eeb47a in c_body (d=0x7f7e85fe2d40) > at ../../libguile/continuations.c:430 > #40 0x00007f85d4262b6f in ?? () > #41 0x00007f85d4262a80 in ?? () > #42 0x00007f85d41e10c0 in jump_table_ () from > /usr/local/lib/libguile-3.0.so.0 > #43 0x000000015f692dc0 in ?? () > #44 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, > mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 > #45 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) > at ../../libguile/vm-engine.c:370 > #46 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341fe0, > argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 > #47 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341fe0) > at ../../libguile/eval.c:490 > #48 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341fe0, > ---Type to continue, or q to quit--- > handler=0x15b341fc0, pre_unwind_handler=0x15b341fa0) at > ../../libguile/throw.c:146 > #49 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, > > thunk=, handler=, > pre_unwind_handler=) at ../../libguile/throw.c:260 > #50 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, > body=body@entry=0x7f85d3eeb470 , > body_data=body_data@entry=0x7f7e85fe2d40, > handler=handler@entry=0x7f85d3eeb720 , > handler_data=handler_data@entry=0x7f7e85fe2d40, > pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 > , > pre_unwind_handler_data=0xe174a0) at ../../libguile/throw.c:385 > #51 0x00007f85d3eeb9e3 in scm_i_with_continuation_barrier ( > body=body@entry=0x7f85d3eeb470 , > body_data=body_data@entry=0x7f7e85fe2d40, > handler=handler@entry=0x7f85d3eeb720 , > handler_data=handler_data@entry=0x7f7e85fe2d40, > pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 > , > pre_unwind_handler_data=0xe174a0) at ../../libguile/continuations.c:368 > #52 0x00007f85d3eebac5 in scm_c_with_continuation_barrier (func= out>, > data=) at ../../libguile/continuations.c:464 > #53 0x00007f85d3575127 in GC_call_with_gc_active ( > fn=fn@entry=0x7f85d3f6a070 , > client_data=client_data@entry=0x7f7e85fe2e20) at > ../pthread_support.c:1343 > #54 0x00007f85d3f6ac4f in with_guile (base=base@entry=0x7f7e85fe2df0, > data=data@entry=0x7f7e85fe2e20) at ../../libguile/threads.c:683 > #55 0x00007f85d356f132 in GC_call_with_stack_base ( > fn=fn@entry=0x7f85d3f6abb0 , arg=arg@entry=0x7f7e85fe2e20) > at ../misc.c:1941 > #56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=, > data=0x7f7e60000980, > func=0x7f85c95d1290 ) > at ../../libguile/threads.c:698 > ---Type to continue, or q to quit--- > #57 scm_with_guile ( > func=func@entry=0x7f85c95d1290 > , > data=data@entry=0x7f7e60000980) at ../../libguile/threads.c:704 > #58 0x00007f85c95d126e in opencog::SchemeEval::eval_expr > (this=0x7f7e60000980, > expr=...) at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:479 > #59 0x00007f85bc783439 in opencog::GenericShell::eval_loop > (this=0x7f7ef0001e90) > at /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588 > #60 0x00007f85c6e5ac80 in ?? () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #61 0x00007f85d3c916ba in start_thread (arg=0x7f7e85fe3700) at > pthread_create.c:333 > #62 0x00007f85d39c141d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > (gdb) > > > > -- > cassette tapes - analog TV - film cameras - you > -- cassette tapes - analog TV - film cameras - you