From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user Subject: Re: Now crashing [was Re: guile-2.9.2 and threading Date: Sun, 14 Jul 2019 22:03:55 -0500 Message-ID: References: <87h892ault.fsf@netris.org> Reply-To: linasvepstas@gmail.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000071caa6058daf837b" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="109659"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Guile User , Guile Development To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Jul 15 05:04:23 2019 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hmrHp-000SH8-Pe for guile-devel@m.gmane.org; Mon, 15 Jul 2019 05:04:22 +0200 Original-Received: from localhost ([::1]:34940 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hmrHo-0003kf-1g for guile-devel@m.gmane.org; Sun, 14 Jul 2019 23:04:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36317) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hmrHh-0003kK-9u for guile-devel@gnu.org; Sun, 14 Jul 2019 23:04:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hmrHe-0006Pe-BW for guile-devel@gnu.org; Sun, 14 Jul 2019 23:04:13 -0400 Original-Received: from mail-lj1-x229.google.com ([2a00:1450:4864:20::229]:39744) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hmrHd-0006Og-TA; Sun, 14 Jul 2019 23:04:10 -0400 Original-Received: by mail-lj1-x229.google.com with SMTP id v18so14561093ljh.6; Sun, 14 Jul 2019 20:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=FToD539vcjs4O2JsC5bTkKGjmFR1BBGXY++PouPJDCI=; b=t3KXEsn8PAtYO7cbF5MblcAAU9wArWgoE9MwC7k+DStsTS+QyNum/ECFVQzJzUWhrg cJyntlRkzbJLVFMFSnUkSj2/8dLeIZnWNFsMZ2x3qGXdBpH26eNoDugJGBRZtHvuLi7y ExIyh5lvYew53oTs4fbYTcXTVvNIkuYaOa7sncDE3kLLocfsi0qqRQtXgKtpFTQjOion Sf6kABpxlElzYoS70C1lKY8GVT++cizBvIjPQmLsOoOGQTcA4g4XiAEeuCCEiIS6eWhp MOYYcSNt2nvl9iOpsni7xe3DzwiMk3B2gZd/Vrn+x0QIHrAiNlmR5v8cLNAuNHKzCf+Y k+mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=FToD539vcjs4O2JsC5bTkKGjmFR1BBGXY++PouPJDCI=; b=NMEyxDMDxtmBDu8qAZa23m0INm7PWu7XzIVpSmdITdZbH4rK4XON8DuLWePt55IzCs sLzb9zOeSltrUbxWnORpWxBr9kr+zjlv/3vQ65RNQrHUPUJ0T+7vdWkt7S8Sonog+k1x NNnQ7ATsFacDEeGddjdQ2QBrYiSUb829PvnRXuH0aNnGcKLQB0adkvP88hwpd6NY2QoG OJoY08rMN+ND2kKUxTod/zh3a6v5weFShLwWRP9I7oldBqt1yzRGkGJychRoWq3PCByI 6Tz1n2cvPPuHckmRaLWHaOdEmTOInPX9CFm/xtluBswKb9wgwBmvFG6p+pjq/Vy9Pqy4 beOg== X-Gm-Message-State: APjAAAX3YZsf+NKwkfBfSH7+Sq5PUFwnBp4KBY1bX97suKG1G6o/55QP lX/gPr3IuuzBrmMIqr4Jy5YOanQna6VDg610nJOtuk1f X-Google-Smtp-Source: APXvYqz5TyOT3QlHlJBZpKKlUapToTsXTFOcbIHNYt9EhaNUa2K4e84Bwoxuf1IV8XnjAEIEWwV5pKRBVjrzDY+5s+U= X-Received: by 2002:a2e:6c14:: with SMTP id h20mr12522169ljc.38.1563159847617; Sun, 14 Jul 2019 20:04:07 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::229 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:20006 gmane.lisp.guile.user:15631 Archived-At: --00000000000071caa6058daf837b Content-Type: text/plain; charset="UTF-8" Exactly the same crash, same stack trace (slightly different line numbers), with a fresh pull today: commit 89e28df1c9069dcb65188fe7b3973c333d87d7e2 Author: Andy Wingo Date: Thu Jun 20 14:02:05 2019 +0200 which is the current HEAD on master. FWIW, 60-odd guile threads waiting here: #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f45ff4fbdbd in __GI___pthread_mutex_lock (mutex=mutex@entry =0x16e1f68) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f45ff7e085f in scm_c_weak_set_lookup (set=, raw_hash=raw_hash@entry=551753256168943069, pred=pred@entry=0x7f45ff7d1660 , closure=closure@entry=0x7f3e44ff7ac0, dflt=dflt@entry=0x4) at ../../libguile/weak-set.c:760 #3 0x00007f45ff7d11e9 in lookup_interned_symbol (raw_hash=551753256168943069, name=0x2965ca0) at ../../libguile/symbols.c:112 #4 scm_i_str2symbol (str=0x2965ca0) at ../../libguile/symbols.c:244 The parallelism is low because of this one lock. This appears to be the primary bottleneck for my workload. -- Linas On Sun, Jul 14, 2019 at 5:03 PM Linas Vepstas wrote: > Below was for > guile (GNU Guile) 2.9.2.14-1fb399 > > --linas > > On Sun, Jul 14, 2019 at 4:59 PM Linas Vepstas > wrote: > >> >> So, here's my next installment on using guile-2.9.2. The first >> installment said that I'd piled up CPU-months of guile 2.9.2 experience >> without any crashes. Well, now, a different workload crashes in minutes. >> Below is a highly simplified, edited gdb session -- it crashes because it >> unexpectedly aborts, during an abort(!) because `get_callee_vcode()` >> failed. Harrumpf. >> >> Background: there are 140 threads, half in guile, the other half waiting >> for guile to finish. Yes, that's too many, but anyways ... 70 threads in >> guile and one crashed: >> >> #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( >> current_registers=, dynstack=, >> saved_registers=, saved_mra=, >> saved_fp=, vp=) at >> ../../libguile/vm.c:1327 >> #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) >> at ../../libguile/vm.c:1454 >> >> Both frames are interesting, because libguile/vm.c:1327 shows >> if (SCM_FRAME_DYNAMIC_LINK (base_fp) != saved_fp) >> abort(); >> hey!? who called this? line 1454 is in the middle of abort_to_prompt () >> Yow! an unexpected abort during an abort... >> >> How did we get here? >> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, >> subr=, message=message@entry=0x1607c9380, >> args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) >> at ../../libguile/error.c:90 >> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, >> message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", >> args=0x15af130e0, >> rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 >> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) >> at ../../libguile/vm.c:1527 >> >> and libguile/vm.c:1527 tells me that get_callee_vcode () is very unhappy. >> But why? I cannot tell .. after that, things peter out in boring stack >> frames that started with my call scm_c_catch() ... the same seemingly >> harmless call that is pending in 70 other threads. (the same call that has >> survived several CPU month of pounding with a different collection of >> scheme code) >> >> My best guess is that the current workload, by unintentionally launching >> gobs of threads is exposing a race condition that has been hithertho >> hidden. I don't know how to debug any further. I will try a slightly >> newer guile shortly, to see if I get lucky. >> >> -- Linas >> >> p.s. here's the whole stack trace. But really, its boring, except for the >> above highlights. >> >> >> >> (gdb) bt >> #0 0x00007f85d38ef428 in __GI_raise (sig=sig@entry=6) >> at ../sysdeps/unix/sysv/linux/raise.c:54 >> #1 0x00007f85d38f102a in __GI_abort () at abort.c:89 >> #2 0x00007f85d3f6ecdb in capture_delimited_continuation ( >> current_registers=, dynstack=, >> saved_registers=, saved_mra=, >> saved_fp=, vp=) at >> ../../libguile/vm.c:1327 >> #3 abort_to_prompt (thread=0x15f692dc0, saved_mra=) >> at ../../libguile/vm.c:1454 >> #4 0x00007f85ac539041 in ?? () >> #5 0x00007f85ac37f040 in ?? () >> #6 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #7 0x000000015f692dc0 in ?? () >> #8 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #9 0x00007f85d3f70600 in vm_debug_engine (thread=0x7f85ac539000) >> at ../../libguile/vm-engine.c:370 >> #10 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0xe45a20, >> argv=, >> nargs=5) at ../../libguile/vm.c:1605 >> #11 0x00007f85d3eefdcb in scm_apply_0 (proc=0xe45a20, args=0x304) >> at ../../libguile/eval.c:603 >> #12 0x00007f85d3ef0a0d in scm_apply_1 (proc=, >> arg1=arg1@entry=0xdc5420, args=args@entry=0x15af130a0) >> at ../../libguile/eval.c:609 >> #13 0x00007f85d3f6c546 in scm_throw (key=key@entry=0xdc5420, >> args=0x15af130a0) >> at ../../libguile/throw.c:272 >> #14 0x00007f85d3f6caf9 in scm_ithrow (key=key@entry=0xdc5420, >> args=, >> no_return=no_return@entry=1) at ../../libguile/throw.c:619 >> #15 0x00007f85d3eedeb5 in scm_error_scm (key=key@entry=0xdc5420, >> subr=, message=message@entry=0x1607c9380, >> args=args@entry=0x15af130e0, data=data@entry=0x15af130f0) >> at ../../libguile/error.c:90 >> ---Type to continue, or q to quit--- >> #16 0x00007f85d3eedf4f in scm_error (key=0xdc5420, subr=subr@entry=0x0, >> message=message@entry=0x7f85d3fa228c "Wrong type to apply: ~S", >> args=0x15af130e0, >> rest=rest@entry=0x15af130f0) at ../../libguile/error.c:62 >> #17 0x00007f85d3f6f913 in get_callee_vcode (thread=0x15f692dc0) >> at ../../libguile/vm.c:1527 >> #18 0x00007f85b4314805 in ?? () >> #19 0x00007f85b428a000 in ?? () >> #20 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #21 0x000000015f692dc0 in ?? () >> #22 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #23 0x00007f85d3f70600 in vm_debug_engine (thread=0x2) >> at ../../libguile/vm-engine.c:370 >> #24 0x00007f85d3f76db2 in scm_call_n (proc=, >> argv=argv@entry=0x7f7e85fe2600, nargs=nargs@entry=3) at >> ../../libguile/vm.c:1605 >> #25 0x00007f85d3eef97f in scm_call_3 (proc=, >> arg1=, >> arg2=, arg3=) at >> ../../libguile/eval.c:510 >> #26 0x00007f85d4262b6f in ?? () >> #27 0x00007f85d4262a80 in ?? () >> #28 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #29 0x000000015f692dc0 in ?? () >> #30 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #31 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) >> at ../../libguile/vm-engine.c:370 >> #32 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341ee0, >> argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 >> #33 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341ee0) >> at ../../libguile/eval.c:490 >> #34 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341ee0, >> handler=0x15b341ec0, pre_unwind_handler=0x15b341ea0) at >> ../../libguile/throw.c:146 >> #35 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, >> >> ---Type to continue, or q to quit--- >> thunk=, handler=, >> pre_unwind_handler=) at ../../libguile/throw.c:260 >> #36 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, >> body=, >> body_data=, >> handler=handler@entry=0x7f85c95d0f00 >> > scm_unused_struct*)>, >> handler_data=handler_data@entry=0x7f7e60000980, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85c95d0c40 >> > scm_unused_struct*)>, >> pre_unwind_handler_data=0x7f7e60000980) at ../../libguile/throw.c:385 >> #37 0x00007f85c95d122a in opencog::SchemeEval::do_eval >> (this=0x7f7e60000980, >> expr="(observe-mpg \"The countess, with her loving heart, felt that >> her children were being ruined, that it was not the count's fault for he >> could not help being what he was -- that (though he tried to hide "...) >> at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:590 >> #38 0x00007f85c95d12aa in opencog::SchemeEval::c_wrap_eval >> (p=0x7f7e60000980) >> at /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507 >> #39 0x00007f85d3eeb47a in c_body (d=0x7f7e85fe2d40) >> at ../../libguile/continuations.c:430 >> #40 0x00007f85d4262b6f in ?? () >> #41 0x00007f85d4262a80 in ?? () >> #42 0x00007f85d41e10c0 in jump_table_ () from >> /usr/local/lib/libguile-3.0.so.0 >> #43 0x000000015f692dc0 in ?? () >> #44 0x00007f85d3f19581 in scm_jit_enter_mcode (thread=0x15f692dc0, >> mcode=0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:4796 >> #45 0x00007f85d3f70600 in vm_debug_engine (thread=0x304) >> at ../../libguile/vm-engine.c:370 >> #46 0x00007f85d3f76db2 in scm_call_n (proc=proc@entry=0x15b341fe0, >> argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1605 >> #47 0x00007f85d3eef8d9 in scm_call_0 (proc=proc@entry=0x15b341fe0) >> at ../../libguile/eval.c:490 >> #48 0x00007f85d3f6c1aa in catch (tag=tag@entry=0x404, thunk=0x15b341fe0, >> ---Type to continue, or q to quit--- >> handler=0x15b341fc0, pre_unwind_handler=0x15b341fa0) at >> ../../libguile/throw.c:146 >> #49 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, >> >> thunk=, handler=, >> pre_unwind_handler=) at ../../libguile/throw.c:260 >> #50 0x00007f85d3f6c6bf in scm_c_catch (tag=tag@entry=0x404, >> body=body@entry=0x7f85d3eeb470 , >> body_data=body_data@entry=0x7f7e85fe2d40, >> handler=handler@entry=0x7f85d3eeb720 , >> handler_data=handler_data@entry=0x7f7e85fe2d40, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 >> , >> pre_unwind_handler_data=0xe174a0) at ../../libguile/throw.c:385 >> #51 0x00007f85d3eeb9e3 in scm_i_with_continuation_barrier ( >> body=body@entry=0x7f85d3eeb470 , >> body_data=body_data@entry=0x7f7e85fe2d40, >> handler=handler@entry=0x7f85d3eeb720 , >> handler_data=handler_data@entry=0x7f7e85fe2d40, >> pre_unwind_handler=pre_unwind_handler@entry=0x7f85d3eeb580 >> , >> pre_unwind_handler_data=0xe174a0) at >> ../../libguile/continuations.c:368 >> #52 0x00007f85d3eebac5 in scm_c_with_continuation_barrier >> (func=, >> data=) at ../../libguile/continuations.c:464 >> #53 0x00007f85d3575127 in GC_call_with_gc_active ( >> fn=fn@entry=0x7f85d3f6a070 , >> client_data=client_data@entry=0x7f7e85fe2e20) at >> ../pthread_support.c:1343 >> #54 0x00007f85d3f6ac4f in with_guile (base=base@entry=0x7f7e85fe2df0, >> data=data@entry=0x7f7e85fe2e20) at ../../libguile/threads.c:683 >> #55 0x00007f85d356f132 in GC_call_with_stack_base ( >> fn=fn@entry=0x7f85d3f6abb0 , arg=arg@entry >> =0x7f7e85fe2e20) >> at ../misc.c:1941 >> #56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=> out>, >> data=0x7f7e60000980, >> func=0x7f85c95d1290 ) >> at ../../libguile/threads.c:698 >> ---Type to continue, or q to quit--- >> #57 scm_with_guile ( >> func=func@entry=0x7f85c95d1290 >> , >> data=data@entry=0x7f7e60000980) at ../../libguile/threads.c:704 >> #58 0x00007f85c95d126e in opencog::SchemeEval::eval_expr >> (this=0x7f7e60000980, >> expr=...) at >> /home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:479 >> #59 0x00007f85bc783439 in opencog::GenericShell::eval_loop >> (this=0x7f7ef0001e90) >> at >> /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588 >> #60 0x00007f85c6e5ac80 in ?? () from >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #61 0x00007f85d3c916ba in start_thread (arg=0x7f7e85fe3700) at >> pthread_create.c:333 >> #62 0x00007f85d39c141d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> (gdb) >> >> >> >> -- >> cassette tapes - analog TV - film cameras - you >> > > > -- > cassette tapes - analog TV - film cameras - you > -- cassette tapes - analog TV - film cameras - you --00000000000071caa6058daf837b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Exactly the same crash, same stack trace (slightly di= fferent line numbers),
with a fresh pull today:
= =C2=A0 =C2=A0 commit 89e28df1c9069dcb65188fe7b3973c333d87d7e2
=C2=A0 =C2= =A0 Author: Andy Wingo <wingo@pobox.c= om>
=C2=A0 =C2=A0 Date: =C2=A0 Thu Jun 20 14:02:05 2019 +0200
which is the current HEAD on master.

FWIW, = 60-odd guile threads waiting here:

#0 =C2=A0__lll_= lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 = =C2=A00x00007f45ff4fbdbd in __GI___pthread_mutex_lock (mutex=3Dmutex@entry= =3D0x16e1f68)
=C2=A0 =C2=A0 at ../nptl/pthread_mutex_lock.c:80
#2 =C2= =A00x00007f45ff7e085f in scm_c_weak_set_lookup (set=3D<optimized out>= ,
=C2=A0 =C2=A0 raw_hash=3Draw_hash@entry=3D551753256168943069,
=C2= =A0 =C2=A0 pred=3Dpred@entry=3D0x7f45ff7d1660 <string_lookup_predicate_f= n>,
=C2=A0 =C2=A0 closure=3Dclosure@entry=3D0x7f3e44ff7ac0, dflt=3Dd= flt@entry=3D0x4)
=C2=A0 =C2=A0 at ../../libguile/weak-set.c:760
#3 = =C2=A00x00007f45ff7d11e9 in lookup_interned_symbol (raw_hash=3D551753256168= 943069,
=C2=A0 =C2=A0 name=3D0x2965ca0) at ../../libguile/symbols.c:112=
#4 =C2=A0scm_i_str2symbol (str=3D0x2965ca0) at ../../libguile/symbols.c= :244

The parallelism is low because of this one lo= ck.=C2=A0 This appears to be the
primary bottleneck for my w= orkload.

-- Linas

On Sun, = Jul 14, 2019 at 5:03 PM Linas Vepstas <linasvepstas@gmail.com> wrote:
Below was for
guile (GNU Guile) 2.9.2.14-1fb399

--linas=

On Sun, Jul 14, 2019 at 4:59 PM Linas Vepstas <linasvepstas@gmail.com>= wrote:

So, here's my next installment on using gu= ile-2.9.2. The first installment said that I'd piled up CPU-months of g= uile 2.9.2 experience without any crashes. Well, now, a different workload = crashes in minutes.=C2=A0 Below is a highly simplified, edited gdb session = -- it crashes because it unexpectedly aborts, during an abort(!) because `g= et_callee_vcode()` failed. Harrumpf.=C2=A0

Ba= ckground: there are 140 threads, half in guile, the other half waiting for = guile to finish. Yes, that's too many, but anyways ... 70 threads in gu= ile and one crashed:

#2 =C2=A00x00007f85d3f6ecdb i= n capture_delimited_continuation (
=C2=A0 =C2=A0 current_registers=3D<= ;optimized out>, dynstack=3D<optimized out>,
=C2=A0 =C2=A0 save= d_registers=3D<optimized out>, saved_mra=3D<optimized out>,
= =C2=A0 =C2=A0 saved_fp=3D<optimized out>, vp=3D<optimized out>)= at ../../libguile/vm.c:1327
#3 =C2=A0abort_to_prompt (thread=3D0x15f692= dc0, saved_mra=3D<optimized out>)
=C2=A0 =C2=A0 at ../../libguile/= vm.c:1454

Both frames are interesting, because lib= guile/vm.c:1327 shows
=C2=A0 if (SCM_FRAME_DYNAMIC_LINK (base_fp)= !=3D saved_fp)
=C2=A0 =C2=A0=C2=A0 abort();
hey!? who = called this? line 1454 is in the middle of abort_to_prompt ()=C2=A0 Yow! an= unexpected abort during an abort...

How did we ge= t here?
#15 0x00007f85d3eedeb5 in scm_error_scm (key=3Dkey@entry= =3D0xdc5420,
=C2=A0 =C2=A0 subr=3D<optimized out>, message=3Dmessa= ge@entry=3D0x1607c9380,
=C2=A0 =C2=A0 args=3Dargs@entry=3D0x15af130e0, d= ata=3Ddata@entry=3D0x15af130f0)
=C2=A0 =C2=A0 at ../../libguile/error.c:= 90
#16 0x00007f85d3eedf4f in scm_error (key=3D0xdc5420, subr=3Dsubr@entr= y=3D0x0,
=C2=A0 =C2=A0 message=3Dmessage@entry=3D0x7f85d3fa228c "Wr= ong type to apply: ~S", args=3D0x15af130e0,
=C2=A0 =C2=A0 rest=3Dre= st@entry=3D0x15af130f0) at ../../libguile/error.c:62
#17 0x00007f85d3f6f= 913 in get_callee_vcode (thread=3D0x15f692dc0)
=C2=A0 =C2=A0 at ../../li= bguile/vm.c:1527

and libguile/vm.c:1527 tells me t= hat get_callee_vcode () is very unhappy. But why? I cannot tell .. after th= at, things peter out in boring stack frames that started with my call scm_c= _catch() ... the same seemingly harmless call that is pending in 70 other t= hreads. (the same call that has survived several CPU month of pounding with= a different collection of scheme code)=C2=A0

My best guess is that the current workload, by unintentionally launching g= obs of threads is exposing a race condition that has been hithertho hidden.= =C2=A0 I don't know how to debug any further.=C2=A0 I will try a slight= ly newer guile shortly, to see if I get lucky.

-- Linas

p.s. here's the whole stack trace. B= ut really, its boring, except for the above highlights.



(gdb) bt
#0 =C2=A00x00007f85d38ef428 in __GI_ra= ise (sig=3Dsig@entry=3D6)
=C2=A0 =C2=A0 at ../sysdeps/unix/sysv/linux/ra= ise.c:54
#1 =C2=A00x00007f85d38f102a in __GI_abort () at abort.c:89
#= 2 =C2=A00x00007f85d3f6ecdb in capture_delimited_continuation (
=C2=A0 = =C2=A0 current_registers=3D<optimized out>, dynstack=3D<optimized = out>,
=C2=A0 =C2=A0 saved_registers=3D<optimized out>, saved_m= ra=3D<optimized out>,
=C2=A0 =C2=A0 saved_fp=3D<optimized out&= gt;, vp=3D<optimized out>) at ../../libguile/vm.c:1327
#3 =C2=A0ab= ort_to_prompt (thread=3D0x15f692dc0, saved_mra=3D<optimized out>)
= =C2=A0 =C2=A0 at ../../libguile/vm.c:1454
#4 =C2=A00x00007f85ac539041 in= ?? ()
#5 =C2=A00x00007f85ac37f040 in ?? ()
#6 =C2=A00x00007f85d41e10= c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#7 =C2=A00x00= 0000015f692dc0 in ?? ()
#8 =C2=A00x00007f85d3f19581 in scm_jit_enter_mco= de (thread=3D0x15f692dc0,
=C2=A0 =C2=A0 mcode=3D0x15f692dc0 "\200\= 205\267_\001") at ../../libguile/jit.c:4796
#9 =C2=A00x00007f85d3f7= 0600 in vm_debug_engine (thread=3D0x7f85ac539000)
=C2=A0 =C2=A0 at ../..= /libguile/vm-engine.c:370
#10 0x00007f85d3f76db2 in scm_call_n (proc=3Dp= roc@entry=3D0xe45a20, argv=3D<optimized out>,
=C2=A0 =C2=A0 nargs= =3D5) at ../../libguile/vm.c:1605
#11 0x00007f85d3eefdcb in scm_apply_0 = (proc=3D0xe45a20, args=3D0x304)
=C2=A0 =C2=A0 at ../../libguile/eval.c:6= 03
#12 0x00007f85d3ef0a0d in scm_apply_1 (proc=3D<optimized out>, =
=C2=A0 =C2=A0 arg1=3Darg1@entry=3D0xdc5420, args=3Dargs@entry=3D0x15af1= 30a0)
=C2=A0 =C2=A0 at ../../libguile/eval.c:609
#13 0x00007f85d3f6c5= 46 in scm_throw (key=3Dkey@entry=3D0xdc5420, args=3D0x15af130a0)
=C2=A0 = =C2=A0 at ../../libguile/throw.c:272
#14 0x00007f85d3f6caf9 in scm_ithro= w (key=3Dkey@entry=3D0xdc5420, args=3D<optimized out>,
=C2=A0 =C2= =A0 no_return=3Dno_return@entry=3D1) at ../../libguile/throw.c:619
#15 0= x00007f85d3eedeb5 in scm_error_scm (key=3Dkey@entry=3D0xdc5420,
=C2=A0 = =C2=A0 subr=3D<optimized out>, message=3Dmessage@entry=3D0x1607c9380,=
=C2=A0 =C2=A0 args=3Dargs@entry=3D0x15af130e0, data=3Ddata@entry=3D0x1= 5af130f0)
=C2=A0 =C2=A0 at ../../libguile/error.c:90
---Type <retu= rn> to continue, or q <return> to quit---
#16 0x00007f85d3eedf4= f in scm_error (key=3D0xdc5420, subr=3Dsubr@entry=3D0x0,
=C2=A0 =C2=A0 = message=3Dmessage@entry=3D0x7f85d3fa228c "Wrong type to apply: ~S"= ;, args=3D0x15af130e0,
=C2=A0 =C2=A0 rest=3Drest@entry=3D0x15af130f0) a= t ../../libguile/error.c:62
#17 0x00007f85d3f6f913 in get_callee_vcode (= thread=3D0x15f692dc0)
=C2=A0 =C2=A0 at ../../libguile/vm.c:1527
#18 0= x00007f85b4314805 in ?? ()
#19 0x00007f85b428a000 in ?? ()
#20 0x0000= 7f85d41e10c0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0
#21= 0x000000015f692dc0 in ?? ()
#22 0x00007f85d3f19581 in scm_jit_enter_mco= de (thread=3D0x15f692dc0,
=C2=A0 =C2=A0 mcode=3D0x15f692dc0 "\200\= 205\267_\001") at ../../libguile/jit.c:4796
#23 0x00007f85d3f70600 = in vm_debug_engine (thread=3D0x2)
=C2=A0 =C2=A0 at ../../libguile/vm-eng= ine.c:370
#24 0x00007f85d3f76db2 in scm_call_n (proc=3D<optimized out= >,
=C2=A0 =C2=A0 argv=3Dargv@entry=3D0x7f7e85fe2600, nargs=3Dnargs@e= ntry=3D3) at ../../libguile/vm.c:1605
#25 0x00007f85d3eef97f in scm_call= _3 (proc=3D<optimized out>, arg1=3D<optimized out>,
=C2=A0 = =C2=A0 arg2=3D<optimized out>, arg3=3D<optimized out>) at ../..= /libguile/eval.c:510
#26 0x00007f85d4262b6f in ?? ()
#27 0x00007f85d4= 262a80 in ?? ()
#28 0x00007f85d41e10c0 in jump_table_ () from /usr/local= /lib/libguile-3.0.so.0
#29 0x000000015f692dc0 in ?? ()
#30 0x00007f85= d3f19581 in scm_jit_enter_mcode (thread=3D0x15f692dc0,
=C2=A0 =C2=A0 mc= ode=3D0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:47= 96
#31 0x00007f85d3f70600 in vm_debug_engine (thread=3D0x304)
=C2=A0 = =C2=A0 at ../../libguile/vm-engine.c:370
#32 0x00007f85d3f76db2 in scm_c= all_n (proc=3Dproc@entry=3D0x15b341ee0,
=C2=A0 =C2=A0 argv=3Dargv@entry= =3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.c:1605
#33 0x00007= f85d3eef8d9 in scm_call_0 (proc=3Dproc@entry=3D0x15b341ee0)
=C2=A0 =C2= =A0 at ../../libguile/eval.c:490
#34 0x00007f85d3f6c1aa in catch (tag=3D= tag@entry=3D0x404, thunk=3D0x15b341ee0,
=C2=A0 =C2=A0 handler=3D0x15b34= 1ec0, pre_unwind_handler=3D0x15b341ea0) at ../../libguile/throw.c:146
#3= 5 0x00007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=3Dkey@entry= =3D0x404,
---Type <return> to continue, or q <return> to qu= it---
=C2=A0 =C2=A0 thunk=3D<optimized out>, handler=3D<optimiz= ed out>,
=C2=A0 =C2=A0 pre_unwind_handler=3D<optimized out>) a= t ../../libguile/throw.c:260
#36 0x00007f85d3f6c6bf in scm_c_catch (tag= =3Dtag@entry=3D0x404, body=3D<optimized out>,
=C2=A0 =C2=A0 body_= data=3D<optimized out>,
=C2=A0 =C2=A0 handler=3Dhandler@entry=3D0= x7f85c95d0f00 <opencog::SchemeEval::catch_handler_wrapper(void*, scm_unu= sed_struct*, scm_unused_struct*)>,
=C2=A0 =C2=A0 handler_data=3Dhand= ler_data@entry=3D0x7f7e60000980,
=C2=A0 =C2=A0 pre_unwind_handler=3Dpre= _unwind_handler@entry=3D0x7f85c95d0c40 <opencog::SchemeEval::preunwind_h= andler_wrapper(void*, scm_unused_struct*, scm_unused_struct*)>,
=C2= =A0 =C2=A0 pre_unwind_handler_data=3D0x7f7e60000980) at ../../libguile/thro= w.c:385
#37 0x00007f85c95d122a in opencog::SchemeEval::do_eval (this=3D0= x7f7e60000980,
=C2=A0 =C2=A0 expr=3D"(observe-mpg \"The count= ess, with her loving heart, felt that her children were being ruined, that = it was not the count's fault for he could not help being what he was --= that (though he tried to hide "...)
=C2=A0 =C2=A0 at /home/ubuntu/= src/atomspace/opencog/guile/SchemeEval.cc:590
#38 0x00007f85c95d12aa in = opencog::SchemeEval::c_wrap_eval (p=3D0x7f7e60000980)
=C2=A0 =C2=A0 at /= home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:507
#39 0x00007f85= d3eeb47a in c_body (d=3D0x7f7e85fe2d40)
=C2=A0 =C2=A0 at ../../libguile/= continuations.c:430
#40 0x00007f85d4262b6f in ?? ()
#41 0x00007f85d42= 62a80 in ?? ()
#42 0x00007f85d41e10c0 in jump_table_ () from /usr/local/= lib/libguile-3.0.so.0
#43 0x000000015f692dc0 in ?? ()
#44 0x00007f85d= 3f19581 in scm_jit_enter_mcode (thread=3D0x15f692dc0,
=C2=A0 =C2=A0 mco= de=3D0x15f692dc0 "\200\205\267_\001") at ../../libguile/jit.c:479= 6
#45 0x00007f85d3f70600 in vm_debug_engine (thread=3D0x304)
=C2=A0 = =C2=A0 at ../../libguile/vm-engine.c:370
#46 0x00007f85d3f76db2 in scm_c= all_n (proc=3Dproc@entry=3D0x15b341fe0,
=C2=A0 =C2=A0 argv=3Dargv@entry= =3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.c:1605
#47 0x00007= f85d3eef8d9 in scm_call_0 (proc=3Dproc@entry=3D0x15b341fe0)
=C2=A0 =C2= =A0 at ../../libguile/eval.c:490
#48 0x00007f85d3f6c1aa in catch (tag=3D= tag@entry=3D0x404, thunk=3D0x15b341fe0,
---Type <return> to conti= nue, or q <return> to quit---
=C2=A0 =C2=A0 handler=3D0x15b341fc0,= pre_unwind_handler=3D0x15b341fa0) at ../../libguile/throw.c:146
#49 0x0= 0007f85d3f6c505 in scm_catch_with_pre_unwind_handler (key=3Dkey@entry=3D0x4= 04,
=C2=A0 =C2=A0 thunk=3D<optimized out>, handler=3D<optimize= d out>,
=C2=A0 =C2=A0 pre_unwind_handler=3D<optimized out>) at= ../../libguile/throw.c:260
#50 0x00007f85d3f6c6bf in scm_c_catch (tag= =3Dtag@entry=3D0x404,
=C2=A0 =C2=A0 body=3Dbody@entry=3D0x7f85d3eeb470 = <c_body>,
=C2=A0 =C2=A0 body_data=3Dbody_data@entry=3D0x7f7e85fe2= d40,
=C2=A0 =C2=A0 handler=3Dhandler@entry=3D0x7f85d3eeb720 <c_handl= er>,
=C2=A0 =C2=A0 handler_data=3Dhandler_data@entry=3D0x7f7e85fe2d4= 0,
=C2=A0 =C2=A0 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x7f85= d3eeb580 <pre_unwind_handler>,
=C2=A0 =C2=A0 pre_unwind_handler_d= ata=3D0xe174a0) at ../../libguile/throw.c:385
#51 0x00007f85d3eeb9e3 in = scm_i_with_continuation_barrier (
=C2=A0 =C2=A0 body=3Dbody@entry=3D0x7f= 85d3eeb470 <c_body>,
=C2=A0 =C2=A0 body_data=3Dbody_data@entry=3D= 0x7f7e85fe2d40,
=C2=A0 =C2=A0 handler=3Dhandler@entry=3D0x7f85d3eeb720 = <c_handler>,
=C2=A0 =C2=A0 handler_data=3Dhandler_data@entry=3D0x= 7f7e85fe2d40,
=C2=A0 =C2=A0 pre_unwind_handler=3Dpre_unwind_handler@ent= ry=3D0x7f85d3eeb580 <pre_unwind_handler>,
=C2=A0 =C2=A0 pre_unwin= d_handler_data=3D0xe174a0) at ../../libguile/continuations.c:368
#52 0x0= 0007f85d3eebac5 in scm_c_with_continuation_barrier (func=3D<optimized ou= t>,
=C2=A0 =C2=A0 data=3D<optimized out>) at ../../libguile/co= ntinuations.c:464
#53 0x00007f85d3575127 in GC_call_with_gc_active (
= =C2=A0 =C2=A0 fn=3Dfn@entry=3D0x7f85d3f6a070 <with_guile_trampoline>,=
=C2=A0 =C2=A0 client_data=3Dclient_data@entry=3D0x7f7e85fe2e20) at ../= pthread_support.c:1343
#54 0x00007f85d3f6ac4f in with_guile (base=3Dbase= @entry=3D0x7f7e85fe2df0,
=C2=A0 =C2=A0 data=3Ddata@entry=3D0x7f7e85fe2e= 20) at ../../libguile/threads.c:683
#55 0x00007f85d356f132 in GC_call_wi= th_stack_base (
=C2=A0 =C2=A0 fn=3Dfn@entry=3D0x7f85d3f6abb0 <with_gu= ile>, arg=3Darg@entry=3D0x7f7e85fe2e20)
=C2=A0 =C2=A0 at ../misc.c:19= 41
#56 0x00007f85d3f6aff8 in scm_i_with_guile (dynamic_state=3D<optim= ized out>,
=C2=A0 =C2=A0 data=3D0x7f7e60000980,
=C2=A0 =C2=A0 fu= nc=3D0x7f85c95d1290 <opencog::SchemeEval::c_wrap_eval(void*)>)
=C2= =A0 =C2=A0 at ../../libguile/threads.c:698
---Type <return> to con= tinue, or q <return> to quit---
#57 scm_with_guile (
=C2=A0 =C2= =A0 func=3Dfunc@entry=3D0x7f85c95d1290 <opencog::SchemeEval::c_wrap_eval= (void*)>,
=C2=A0 =C2=A0 data=3Ddata@entry=3D0x7f7e60000980) at ../..= /libguile/threads.c:704
#58 0x00007f85c95d126e in opencog::SchemeEval::e= val_expr (this=3D0x7f7e60000980,
=C2=A0 =C2=A0 expr=3D...) at /home/ubu= ntu/src/atomspace/opencog/guile/SchemeEval.cc:479
#59 0x00007f85bc783439= in opencog::GenericShell::eval_loop (this=3D0x7f7ef0001e90)
=C2=A0 =C2= =A0 at /home/ubuntu/src/opencog/opencog/cogserver/shell/GenericShell.cc:588=
#60 0x00007f85c6e5ac80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc+= +.so.6
#61 0x00007f85d3c916ba in start_thread (arg=3D0x7f7e85fe3700) at = pthread_create.c:333
#62 0x00007f85d39c141d in clone () at ../sysdeps/un= ix/sysv/linux/x86_64/clone.S:109
(gdb)


<= br>--
cassette tapes - analog TV = - film cameras - you


--
cassette tapes = - analog TV - film cameras - you


--
cassette tapes - analog TV - film cameras = - you
--00000000000071caa6058daf837b--