Exactly the same crash, same stack trace (slightly different line numbers), with a fresh pull today: commit 89e28df1c9069dcb65188fe7b3973c333d87d7e2 Author: Andy Wingo Date: Thu Jun 20 14:02:05 2019 +0200 which is the current HEAD on master. FWIW, 60-odd guile threads waiting here: #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f45ff4fbdbd in __GI___pthread_mutex_lock (mutex=mutex@entry =0x16e1f68) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f45ff7e085f in scm_c_weak_set_lookup (set=, raw_hash=raw_hash@entry=551753256168943069, pred=pred@entry=0x7f45ff7d1660 , closure=closure@entry=0x7f3e44ff7ac0, dflt=dflt@entry=0x4) at ../../libguile/weak-set.c:760 #3 0x00007f45ff7d11e9 in lookup_interned_symbol (raw_hash=551753256168943069, name=0x2965ca0) at ../../libguile/symbols.c:112 #4 scm_i_str2symbol (str=0x2965ca0) at ../../libguile/symbols.c:244 The parallelism is low because of this one lock. This appears to be the primary bottleneck for my workload. -- Linas On Sun, Jul 14, 2019 at 5:03 PM Linas Vepstas wrote: > Below was for > guile (GNU Guile) 2.9.2.14-1fb399 > > --linas > > On Sun, Jul 14, 2019 at 4:59 PM Linas Vepstas > wrote: > >> >> So, here's my next installment on using guile-2.9.2. The first >> installment said that I'd piled up CPU-months of guile 2.9.2 experience >> without any crashes. Well, now, a different workload crashes in minutes. >> Below is a highly simplified, edited gdb session -- it crashes because it >> unexpectedly aborts, during an abort(!) because `get_callee_vcode()` >> failed. Harrumpf. >> >> Background: there are 140 threads, half in guile, the other half waiting >> for guile to finish. Yes, that's too many, but anyways ... 70 threads in >> guile and one crashed: >> >> #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( >> current_registers=, dynstack=, >> saved_registers=, saved_mra=, >> saved_fp=, vp=) at >> ../../libguile/vm.c:1327 >> #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) >> at ../../libguile/vm.c:1454 >> >> Both frames are interesting, because libguile/vm.c:1327 shows >> if (SCM_FRAME_DYNAMIC_LINK (base_fp) != saved_fp) >> abort(); >> hey!? who called this? line 1454 is in the middle of abort_to_prompt () >> Yow! an unexpected abort during an abort... >> >> How did we get here? >> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, >> subr=, message=message@entry=0x1607c9380, >> args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) >> at ../../libguile/error.c:90 >> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, >> message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", >> args=0x15af130e0, >> rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 >> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) >> at ../../libguile/vm.c:1527 >> >> and libguile/vm.c:1527 tells me that get_callee_vcode () is very unhappy. >> But why? I cannot tell .. after that, things peter out in boring stack >> frames that started with my call scm_c_catch() ... the same seemingly >> harmless call that is pending in 70 other threads. (the same call that has >> survived several CPU month of pounding with a different collection of >> scheme code) >> >> My best guess is that the current workload, by unintentionally launching >> gobs of threads is exposing a race condition that has been hithertho >> hidden. I don't know how to debug any further. I will try a slightly >> newer guile shortly, to see if I get lucky. >> >> -- Linas >> >> p.s. here's the whole stack trace. But really, its boring, except for the >> above highlights. >> >> >> >> (gdb) bt >> #0 0x00007f85d38ef428 in __GI_raise (sig=sig@entry=6) >> at ../sysdeps/unix/sysv/linux/raise.c:54 >> #1 0x00007f85d38f102a in __GI_abort () at abort.c:89 >> #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( >> current_registers=, dynstack=, >> saved_registers=, saved_mra=, >> saved_fp=, vp=) at >> ../../libguile/vm.c:1327 >> #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) >> at ../../libguile/vm.c:1454 >> #4 0x00007f85ac539041 in ?? () >> #5 0x00007f85ac37f040 in ?? () >> #6 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #7 0x000000015f692dc0 in ?? () >> #8 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #9 0x00007f85d3f70600 in vm_debug_engine (thread=0x7f85ac539000) >> at ../../libguile/vm-engine.c:370 >> #10 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0xe45a20, >> argv=, >> nargs=5) at ../../libguile/vm.c:1605 >> #11 0x00007f85d3eefdcb in scm_apply_0 (proc=0xe45a20, args=0x304) >> at ../../libguile/eval.c:603 >> #12 0x00007f85d3ef0a0d in scm_apply_1 (proc=, >> arg1=arg1@entry=0xdc5420, args=args@entry=0x15af130a0) >> at ../../libguile/eval.c:609 >> #13 0x00007f85d3f6c546 in scm_throw (key=key@entry=0xdc5420, >> args=0x15af130a0) >> at ../../libguile/throw.c:272 >> #14 0x00007f85d3f6caf9 in scm_ithrow (key=key@entry=0xdc5420, >> args=, >> no_return=no_return@entry=1) at ../../libguile/throw.c:619 >> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, >> subr=, message=message@entry=0x1607c9380, >> args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) >> at ../../libguile/error.c:90 >> ---Type to continue, or q to quit--- >> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, >> message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", >> args=0x15af130e0, >> rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 >> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) >> at ../../libguile/vm.c:1527 >> #18 0x00007f85b4314805 in ?? () >> #19 0x00007f85b428a000 in ?? () >> #20 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #21 0x000000015f692dc0 in ?? () >> #22 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #23 0x00007f85d3f70600 in vm_debug_engine (thread=0x2) >> at ../../libguile/vm-engine.c:370 >> #24 0x00007f85d3f76db2 in scm_call_n (proc=, >> argv=argv@entry=0x7f7e85fe2600, nargs=nargs@entry=3) at >> ../../libguile/vm.c:1605 >> #25 0x00007f85d3eef97f in scm_call_3 (proc=, >> arg1=, >> arg2=, arg3=) at >> ../../libguile/eval.c:510 >> #26 0x00007f85d4262b6f in ?? () >> #27 0x00007f85d4262a80 in ?? () >> #28 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #29 0x000000015f692dc0 in ?? () >> #30 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #31 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) >> at ../../libguile/vm-engine.c:370 >> #32 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341ee0, >> argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 >> #33 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341ee0) >> at ../../libguile/eval.c:490 >> #34 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341ee0, >> handler=0x15b341ec0, pre_unwind_handler=0x15b341ea0) at >> ../../libguile/throw.c:146 >> #35 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, >> >> ---Type to continue, or q to quit--- >> thunk=, handler=, >> pre_unwind_handler=) at ../../libguile/throw.c:260 >> #36 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, >> body=, >> body_data=, >> handler=handler@entry=0x7f85c95d0f00 >> > scm_unused_struct*)>, >> handler_data=handler_data@entry=0x7f7e60000980, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85c95d0c40 >> > scm_unused_struct*)>, >> pre_unwind_handler_data=0x7f7e60000980) at ../../libguile/throw.c:385 >> #37 0x00007f85c95d122a in opencog::SchemeEval::do_eval >> (this=0x7f7e60000980, >> expr="(observe-mpg \"The countess, with her loving heart, felt that >> her children were being ruined, that it was not the count's fault for he >> could not help being what he was -- that (though he tried to hide "...) >> at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:590 >> #38 0x00007f85c95d12aa in opencog::SchemeEval::c_wrap_eval >> (p=0x7f7e60000980) >> at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507 >> #39 0x00007f85d3eeb47a in c_body (d=0x7f7e85fe2d40) >> at ../../libguile/continuations.c:430 >> #40 0x00007f85d4262b6f in ?? () >> #41 0x00007f85d4262a80 in ?? () >> #42 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #43 0x000000015f692dc0 in ?? () >> #44 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #45 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) >> at ../../libguile/vm-engine.c:370 >> #46 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341fe0, >> argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 >> #47 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341fe0) >> at ../../libguile/eval.c:490 >> #48 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341fe0, >> ---Type to continue, or q to quit--- >> handler=0x15b341fc0, pre_unwind_handler=0x15b341fa0) at >> ../../libguile/throw.c:146 >> #49 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, >> >> thunk=, handler=, >> pre_unwind_handler=) at ../../libguile/throw.c:260 >> #50 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, >> body=body@entry=0x7f85d3eeb470 , >> body_data=body_data@entry=0x7f7e85fe2d40, >> handler=handler@entry=0x7f85d3eeb720 , >> handler_data=handler_data@entry=0x7f7e85fe2d40, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 >> , >> pre_unwind_handler_data=0xe174a0) at ../../libguile/throw.c:385 >> #51 0x00007f85d3eeb9e3 in scm_i_with_continuation_barrier ( >> body=body@entry=0x7f85d3eeb470 , >> body_data=body_data@entry=0x7f7e85fe2d40, >> handler=handler@entry=0x7f85d3eeb720 , >> handler_data=handler_data@entry=0x7f7e85fe2d40, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 >> , >> pre_unwind_handler_data=0xe174a0) at >> ../../libguile/continuations.c:368 >> #52 0x00007f85d3eebac5 in scm_c_with_continuation_barrier >> (func=, >> data=) at ../../libguile/continuations.c:464 >> #53 0x00007f85d3575127 in GC_call_with_gc_active ( >> fn=fn@entry=0x7f85d3f6a070 , >> client_data=client_data@entry=0x7f7e85fe2e20) at >> ../pthread_support.c:1343 >> #54 0x00007f85d3f6ac4f in with_guile (base=base@entry=0x7f7e85fe2df0, >> data=data@entry=0x7f7e85fe2e20) at ../../libguile/threads.c:683 >> #55 0x00007f85d356f132 in GC_call_with_stack_base ( >> fn=fn@entry=0x7f85d3f6abb0 , arg=arg@entry >> =0x7f7e85fe2e20) >> at ../misc.c:1941 >> #56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=> out>, >> data=0x7f7e60000980, >> func=0x7f85c95d1290 ) >> at ../../libguile/threads.c:698 >> ---Type to continue, or q to quit--- >> #57 scm_with_guile ( >> func=func@entry=0x7f85c95d1290 >> , >> data=data@entry=0x7f7e60000980) at ../../libguile/threads.c:704 >> #58 0x00007f85c95d126e in opencog::SchemeEval::eval_expr >> (this=0x7f7e60000980, >> expr=...) at >> /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:479 >> #59 0x00007f85bc783439 in opencog::GenericShell::eval_loop >> (this=0x7f7ef0001e90) >> at >> /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588 >> #60 0x00007f85c6e5ac80 in ?? () from >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #61 0x00007f85d3c916ba in start_thread (arg=0x7f7e85fe3700) at >> pthread_create.c:333 >> #62 0x00007f85d39c141d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> (gdb) >> >> >> >> -- >> cassette tapes - analog TV - film cameras - you >> > > > -- > cassette tapes - analog TV - film cameras - you > -- cassette tapes - analog TV - film cameras - you