From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.devel Subject: Re: Now crashing [was Re: guile-2.9.2 and threading Date: Sun, 21 Jul 2019 16:10:48 -0500 Message-ID: References: <87h892ault.fsf@netris.org> <87k1cgwo20.fsf@netris.org> Reply-To: linasvepstas@gmail.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000079ed09058e3765ae" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="118088"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Guile Development To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Jul 21 23:11:10 2019 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hpJ6s-000UcM-Iy for guile-devel@m.gmane.org; Sun, 21 Jul 2019 23:11:10 +0200 Original-Received: from localhost ([::1]:57748 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hpJ6r-0003mK-Kb for guile-devel@m.gmane.org; Sun, 21 Jul 2019 17:11:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54554) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hpJ6n-0003aw-N1 for guile-devel@gnu.org; Sun, 21 Jul 2019 17:11:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hpJ6l-0001ZI-9T for guile-devel@gnu.org; Sun, 21 Jul 2019 17:11:05 -0400 Original-Received: from mail-lj1-x244.google.com ([2a00:1450:4864:20::244]:32849) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hpJ6k-0001XG-S4 for guile-devel@gnu.org; Sun, 21 Jul 2019 17:11:03 -0400 Original-Received: by mail-lj1-x244.google.com with SMTP id h10so35543855ljg.0 for ; Sun, 21 Jul 2019 14:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=u01H4nsip75KdE/gm/lFCvwwZoN7nTbqHqYWqx23yno=; b=R0cvUx6Rc8iy+i5pvWoU4285TlSAl0toWyHgGkRRs2mcoAfjQ42UKgAAtJmg0k/vde 1Ay+HGChTr0RXgt6Pzatp1BX06nsnpl1hkzWk8SiQYu1seFqYBagvSWK5Ns40PkvnuQl Ee0xx5TOWIXiDv8Ee9dRkAD6FkBs3mnkclWLADGC2LDdftL8aRoUvbkClUEg5qBioZ9l litHyfKaVHp7Q7tZmpzTZGyR91pZ9EUtQlTHEsKtkLHD1X0cYtwyy3vGcGxzPchNZfg8 Pa8bEgfYsCFsNBwmw8JQnyGlv/aYBID0zWAx+ywQYscmUvHYLM04QsuRNZFDoWVC0lFe 3KzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=u01H4nsip75KdE/gm/lFCvwwZoN7nTbqHqYWqx23yno=; b=A6DIQW4todzLXHUyDhryywwgaZ65YSlHBbRmP72/FQAvflQKSMvHDE+5pQXvkWZHpM TIH+HQywHMH4amm8Z91ixCClPCR4lSS5H1WYPtrptHjERACoj7U8YOp6fc/9O1YuZ78G 217d64om8qDldaxHSoasfZ7M4zTw2DW7BCyYj+y+uKiV5w5/GveoFW3YU6lpb+UTuPF3 rkLjbcgwHXvmBiAiUoh7VmTV5T/ZJLNJLdbSv4QvFSSIpteIKpVwmWC72RYl9spE2LYJ UFOveUlzLU7ncwBOiJLs3FaJeviCX1CO3FG0JSPv6aZzCFTIF1S3zrlv82+WCa/v44LB XZPA== X-Gm-Message-State: APjAAAUYYnN9kKWBNoROa2yZnl1UXoaqPe+NAV9ewJbgsf9osrRnDwNo nnZ/1OrWNCfy+lQlhz5uEfM7wuzF+JserC5e8iT/UQ== X-Google-Smtp-Source: APXvYqwg12uecfFHwBdAjVO1AvBp3osA+/YpMjS/Y5NZ5b/rAuJ4MqTYm8ocBP7J0nqCE9JHtQO8ByE5pz2ga4OVIG4= X-Received: by 2002:a2e:8650:: with SMTP id i16mr34556455ljj.178.1563743460386; Sun, 21 Jul 2019 14:11:00 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::244 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:20018 Archived-At: --00000000000079ed09058e3765ae Content-Type: text/plain; charset="UTF-8" How utterly embarrassing. Please ignore most of this verbose and difficult email chain. Yes, guile-2.9.2 is still crashing, but almost all of my analysis was wrong. Turns out that my scheme code was calling `(10)` i.e. taking an integer, and treating it as a function, and attempting to call it. So the call to `scm_error` was exactly right. It was invisible to me because ... it was ignored in my code. However -- if one does call `scm_error` fairly rapidly, from multiple threads, one will eventually hit a race condition and get a crash. I'm not sure how to create a mini-test-case for this within guile; my code is creating threads outside of guile, and launching `scm_eval` in each (and ignoring the resulting error). This was leading to a crash after 5-10 minutes. -- Linas On Wed, Jul 17, 2019 at 10:52 PM Linas Vepstas wrote: > Oh, I get it. I think the bug is this: VM_DEFINE_OP (7, > return_values,...) > finds some mcode, and calls it. What it found was the > emit_get_callee_vcode > but it is totally pointless to call this mcode, since we're returning, and > not > calling. So its just not useful. > > Worse, it gets called with garbage values, which are then silenced by > ignoring > the resulting scm_error, and everything appears to run smoothly ... for a > while. > Until some later time, (millions of calls later), when there is a > completely unrelated > race condition that causes the scm_error to get tangled and die. The > ideal > solution would be simply to not call the mcode for get_callee; that would > save > time and trouble. > > That's my hypothesis. I tried to test a mock-up of this solution with the > patch > below, but it is too simplistic t actually work (null pointer-deref.) I > con't find > a beter solution > > If you've got a better idea, let me know... > > -- Linas > > --- a/libguile/vm-engine.c > +++ b/libguile/vm-engine.c > @@ -553,6 +553,7 @@ VM_NAME (scm_thread *thread) > mcode = SCM_FRAME_MACHINE_RETURN_ADDRESS (old_fp); > if (mcode && mcode != scm_jit_return_to_interpreter_trampoline) > { > + VP->unused = 1; > scm_jit_enter_mcode (thread, mcode); > CACHE_REGISTER (); > NEXT (0); > diff --git a/libguile/vm.c b/libguile/vm.c > index d7b1788..8e178c7 100644 > --- a/libguile/vm.c > +++ b/libguile/vm.c > @@ -620,6 +620,7 @@ scm_i_vm_prepare_stack (struct scm_vm *vp) > vp->compare_result = SCM_F_COMPARE_NONE; > vp->engine = vm_default_engine; > vp->trace_level = 0; > + vp->unused = 0; > #define INIT_HOOK(h) vp->h##_hook = SCM_BOOL_F; > FOR_EACH_HOOK (INIT_HOOK) > #undef INIT_HOOK > @@ -1515,6 +1516,7 @@ get_callee_vcode (scm_thread *thread) > > vp->ip = SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp); > > + if (vp->unused) { vp->unused = 0; return 0; } > scm_error (scm_arg_type_key, NULL, "Wrong type to apply: ~S", > scm_list_1 (proc), scm_list_1 (proc)); > } > > On Wed, Jul 17, 2019 at 8:42 PM Linas Vepstas > wrote: > >> Seem to be narrowing it down ... or at least, I have more details ... >> >> On Wed, Jul 17, 2019 at 4:44 PM Linas Vepstas >> wrote: >> >>> >>> >>> On Wed, Jul 17, 2019 at 12:49 PM Mark H Weaver wrote: >>> >>>> Hi Linas, >>>> >>>> > Investigating the crash with good-old printf's in libguile/vm.c >>>> produces >>>> > a vast ocean of prints ... that should have not been printed, and/or >>>> should >>>> > have been actual errors, but somehow were not handled by scm_error. >>>> > Using today's git pull of master, here's the diff containing a printf: >>>> > >>>> > --- a/libguile/vm.c >>>> > +++ b/libguile/vm.c >>>> > @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0); } >>>> > >>>> > proc = SCM_SMOB_DESCRIPTOR (proc).apply_trampoline; >>>> > SCM_FRAME_LOCAL (vp->fp, 0) = proc; >>>> > return SCM_PROGRAM_CODE (proc); >>>> > } >>>> > >>>> > +printf("duuude wrong type to apply!\n" >>>> > +"proc=%lx\n" >>>> > +"ip=%p\n" >>>> > +"sp=%p\n" >>>> > +"fp=%p\n" >>>> > +"sp_min=%p\n" >>>> > +"stack_lim=%p\n", >>>> > +SCM_FRAME_SLOT(vp->fp, 0)->as_u64, >>>> > +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack_limit); >>>> > +fflush(stdout); >>>> > + >>>> > vp->ip = SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp); >>>> > >>>> > scm_error (scm_arg_type_key, NULL, "Wrong type to apply: ~S", >>>> > scm_list_1 (proc), scm_list_1 (proc)); >>>> > } >>>> > >>>> > As you can see, shortly after my printf, there should have been an >>>> > error report. >>>> >>>> Not necessarily. Note that what 'scm_error' actually does is to raise >>>> an exception. What happens next depends on what exception handlers are >>>> installed at the time of the error. >>>> >>> >>> OK, but... when I look at what get_callee_vcode() actually does, it seems >>> to be earnestly trying to fish out the location of a callable function >>> from the >>> frame pointer, and it does so three plausible ways. If those three don't >>> work >>> out, then it sets the instruction pointer (to the garbage value), >>> followed by >>> scm_error(Wrong type to apply). This also looks like an earnest, honest >>> attempt to report a real error. But lets double-check. >>> >>> So who calls get_callee_vcode(), and why, and what did they expect to >>> happen? >>> Well, that's in three places: one in scm_call_n which is a plausible >>> place where >>> one might expect the instruction pointer to be set to a valid value. >>> Then there's two >>> places in vm-engine.c -- "tail-call" and "call" both of which one might >>> plausibly expect >>> to have a valid instruction pointer. I can't imagine any valid scenario >>> where anyone >>> was expecting get_callee_vcode() to actually fail in the normal course >>> of operations. >>> >> >> There is one more place where get_callee_vcode() can get called -- via >> the jump_table, >> via a call to scm_jit_enter_mcode() which issues the code emitted by >> emit_get_callee_vcode >> >> There are four calls to scm_jit_enter_mcode() The one that immediately >> preceeds >> the bug is always the one made here, in vm-engine.c: >> VM_DEFINE_OP (7, return_values, "return-values", OP1 (X32)) >> >> Right before the call to scm_jit_enter_mcode(), I can printf VP->fp and >> SCM_FRAME_LOCAL(VP->fp, 0), >> and they are... fp=0x7fffe000caf8 fpslot=d33b00 (typical) >> >> the mcode is of course some bytecode that bounces through lightning, and >> a few insns >> later, it arrives at get_callee_vcode() but now the fp is different, (it >> changes by 0x20, >> always) and the slot is different: fp=0x7fffe000cad8 and >> SCM_FRAME_LOCAL(fp,0) >> is 0x32 and the 0x32 triggers the scm_error(). (because 0x32 is not any >> of >> SCM_PROGRAM_P or SCM_STRUCTP or a smob) >> >> (but also, the fpslot=d33b00 is never a SCM_PROGRAM_P or SCM_STRUCTP or >> a smob, either... so something got computed along the way ... ) >> >> That's what I've got so far. Its highly reproducible. Quick to happen. >> I'm not sure >> what to do next. I guess I need to examine emit_get_callee_vcode() and >> see what >> it does, and why. Any comments, suggestions would be useful. >> >> -- Linas >> >> >>> That is, I can't think of any valid reason why anyone would want to >>> suppress >>> the scm_error(). And even if I could -- calling scm_error() hundreds of >>> times >>> per second, as fast as possible, does not seem like efficient coding for >>> dealing >>> with a call to an invalid address. >>> >>> Anyway I'm trying to track down where the invalid value gets set. No >>> luck so far. >>> There are 6 or 8 places in vm-engine.c where the frame pointer is set to >>> something >>> that isn't a pointer (which seems like cheating to me: passing >>> non-pointer values >>> in something called "pointer" is .. well, knee jerk reaction is that >>> it's not wise, but >>> there may be a deeper reason.) >>> >>> >>>> >>>> > There is no error report... until 5-10 minutes later, when the error >>>> > report itself causes a crash. Before then, I get an endless >>>> > high-speed spew of prints: >>>> >>>> It looks like another error is happening within the exception handler. >>>> >>> >>> Well, yes, that also. But given that the instruction pointer contains >>> garbage >>> its perhaps not entirely surprising... at best, the question is, why >>> didn't it fail >>> sooner? >>> >>> -- Linas >>> >>>> >>>> Mark >>>> >>>> PS: It would be good to pick either 'guile-devel' or 'guile-user' for >>>> continuation of this thread. I don't see a reason why it should be >>>> sent to both lists. >>>> >>> >>> >>> -- >>> cassette tapes - analog TV - film cameras - you >>> >> >> >> -- >> cassette tapes - analog TV - film cameras - you >> > > > -- > cassette tapes - analog TV - film cameras - you > -- cassette tapes - analog TV - film cameras - you --00000000000079ed09058e3765ae Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
How utterly embarrassing.=C2=A0 Please ignore most of= this verbose and difficult email chain. Yes, guile-2.9.2 is still crashing= , but almost all of my analysis was wrong. Turns out that my scheme code wa= s calling `(10)` i.e. taking an integer, and treating it as a function, and= attempting to call it. So the call to `scm_error` was exactly right. It wa= s invisible to me because ... it was ignored in my code.
However -- if one does call `scm_error` fairly rapidly, from mu= ltiple threads, one will eventually hit a race condition and get a crash.= =C2=A0 I'm not sure how to create a mini-test-case for this within guil= e; my code is creating threads outside of guile, and launching `scm_eval` i= n each (and ignoring the resulting error).=C2=A0 This was leading to a cras= h after 5-10 minutes.

-- Linas
=
On Wed= , Jul 17, 2019 at 10:52 PM Linas Vepstas <linasvepstas@gmail.com> wrote:
Oh, I get it.=C2=A0= I think the bug is this:=C2=A0 VM_DEFINE_OP (7, return_values,...)
finds some mcode, and calls it.=C2=A0 What it found was the emit_ge= t_callee_vcode
but it is totally pointless to call this mcode, si= nce we're returning, and not
calling. So its just not use= ful.

Worse, it gets called with garbage values, wh= ich are then silenced by ignoring
the resulting=C2=A0 scm_er= ror, and everything appears to run smoothly ... for a while.
Until some later time, (millions of calls later), when there is a complete= ly unrelated
race condition that causes the scm_error to get= tangled and die.=C2=A0 The ideal
solution would be simply t= o not call the mcode for get_callee; that would save
time an= d trouble.

That's my hypothesis. I tried = to test a mock-up of this solution with the patch
below, but it i= s too simplistic t actually work (null pointer-deref.)=C2=A0 I con't fi= nd
a beter solution

If you've go= t a better idea, let me know...

-- Linas
=

--- a/libguile/vm-engine.c
+++ b/libguile/vm-engine.= c
@@ -553,6 +553,7 @@ VM_NAME (scm_thread *thread)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0mcode =3D SCM_FRAME_MACHINE_RETURN_ADDRESS (old_fp)= ;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (mcode && mcode != =3D scm_jit_return_to_interpreter_trampoline)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0{
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0VP->unused =3D 1;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0scm_jit_enter_mcode (thread, mcode);
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0CACHE_REGISTER ();
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NEXT (0);
diff --git a/libguile/vm.c b= /libguile/vm.c
index d7b1788..8e178c7 100644
--- a/libguile/vm.c
+= ++ b/libguile/vm.c
@@ -620,6 +620,7 @@ scm_i_vm_prepare_stack (struct sc= m_vm *vp)
=C2=A0 =C2=A0vp->compare_result =3D SCM_F_COMPARE_NONE;
= =C2=A0 =C2=A0vp->engine =3D vm_default_engine;
=C2=A0 =C2=A0vp->tr= ace_level =3D 0;
+ =C2=A0vp->unused =3D 0;
=C2=A0#define INIT_HOOK= (h) vp->h##_hook =3D SCM_BOOL_F;
=C2=A0 =C2=A0FOR_EACH_HOOK (INIT_HOO= K)
=C2=A0#undef INIT_HOOK
@@ -1515,6 +1516,7 @@ get_callee_vcode (scm= _thread *thread)
=C2=A0
=C2=A0 =C2=A0vp->ip =3D SCM_FRAME_VIRTUAL_= RETURN_ADDRESS (vp->fp);
=C2=A0
+ =C2=A0if (vp->unused) { vp-&g= t;unused =3D 0; return 0; }
=C2=A0 =C2=A0scm_error (scm_arg_type_key, NU= LL, "Wrong type to apply: ~S",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 scm_list_1 (proc), scm_list_1 (proc));
=C2=A0}

On Wed, Jul 17, 2019 at 8:42 PM Linas Vepstas <linasvepstas@gmail.com> wrote:<= br>
Seem to be narrowing it down ... or at least, I have more= details ...

On Wed, Jul 17, 2019 at 4:44 PM Linas Vepstas <linasvepstas@gmail.com= > wrote:
=


=
On Wed, Jul 17, 2019 at 12:49 PM Mark= H Weaver <mhw@netri= s.org> wrote:
Hi Linas,

> Investigating the crash with good-old printf's in libguile/vm.c pr= oduces
> a vast ocean of prints ... that should have not been printed, and/or s= hould
> have been actual errors, but somehow were not handled by scm_error. > Using today's git pull of master, here's the diff containing a= printf:
>
> --- a/libguile/vm.c
> +++ b/libguile/vm.c
> @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0);= }
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 proc =3D SCM_SMOB_DESCRIPTOR (proc).apply_t= rampoline;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 SCM_FRAME_LOCAL (vp->fp, 0) =3D proc; >=C2=A0 =C2=A0 =C2=A0 =C2=A0 return SCM_PROGRAM_CODE (proc);
>=C2=A0 =C2=A0 =C2=A0 }
>
> +printf("duuude wrong type to apply!\n"
> +"proc=3D%lx\n"
> +"ip=3D%p\n"
> +"sp=3D%p\n"
> +"fp=3D%p\n"
> +"sp_min=3D%p\n"
> +"stack_lim=3D%p\n",
> +SCM_FRAME_SLOT(vp->fp, 0)->as_u64,
> +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack= _limit);
> +fflush(stdout);
> +
>=C2=A0 =C2=A0 vp->ip =3D SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp= );
>
>=C2=A0 =C2=A0 scm_error (scm_arg_type_key, NULL, "Wrong type to ap= ply: ~S",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scm_list_1 (proc= ), scm_list_1 (proc));
>=C2=A0 }
>
> As you can see, shortly after my printf, there should have been an
> error report.

Not necessarily.=C2=A0 Note that what 'scm_error' actually does is = to raise
an exception.=C2=A0 What happens next depends on what exception handlers ar= e
installed at the time of the error.

OK,= but... when I look at what get_callee_vcode() actually does, it seems
to be earnestly trying to fish out the location of a callable functio= n from the
frame pointer, and it does so three plausible way= s. If those three don't work
out, then it sets the instructio= n pointer (to the garbage value), followed by
scm_error(Wron= g type to apply). This also looks like an earnest, honest
attempt= to report a real error.=C2=A0 But lets double-check.

So who calls get_callee_vcode(), and why, and what did they expect = to happen?
Well, that's in three places: one in scm_call_n wh= ich is a plausible place where
one might expect the instruct= ion pointer to be set to a valid value. Then there's two
plac= es in vm-engine.c -- "tail-call" and "call" both of whi= ch one might plausibly expect
to have a valid instruction pointer= .=C2=A0 I can't imagine any valid scenario where anyone
was e= xpecting get_callee_vcode() to actually fail in the normal course of operat= ions.=C2=A0

There is= one more place where=C2=A0 get_callee_vcode() can get called -- via the ju= mp_table,
via a call to scm_jit_enter_mcode()=C2=A0 which issues= the code emitted by emit_get_callee_vcode

There a= re four calls to scm_jit_enter_mcode()=C2=A0 The one that immediately prece= eds
the bug is always the one made here, in vm-engine.c:
VM_DEFINE_OP (7, return_values, "return-values", OP1 (X32)= ) =C2=A0

Right before the call to scm_jit_ent= er_mcode(), I can printf VP->fp and
SCM_FRAME_LOCAL(VP-&g= t;fp, 0),
and they are... fp=3D0x7fffe000caf8 fpslot=3Dd33b00 (ty= pical)

the mcode is of course some bytecode th= at bounces through lightning, and a few insns
later, it arrives a= t get_callee_vcode() but now=C2=A0 the fp is different, (it changes by 0x20= ,
always) and the slot is different:=C2=A0 fp=3D0x7fffe000cad8=C2= =A0 and SCM_FRAME_LOCAL(fp,0)
is 0x32 and the 0x32 triggers = the scm_error(). (because 0x32 is not any of
SCM_PROGRAM_P o= r SCM_STRUCTP or a smob)

(but also, the fpslot=3Dd= 33b00 is never a SCM_PROGRAM_P or SCM_STRUCTP or
a smob, eit= her... so something got computed along the way ... )

That= 's what I've got so far. Its highly reproducible.=C2=A0 Quick to ha= ppen.=C2=A0 I'm not sure
what to do next. I guess I need to e= xamine emit_get_callee_vcode() and see what
it does, and why= .=C2=A0=C2=A0 Any comments, suggestions would be useful.

-- Linas

=
That is, I can't think of any valid reason why anyone wo= uld want to suppress
the scm_error().=C2=A0 And even if I could -= - calling scm_error() hundreds of times
per second, as fast as po= ssible, does not seem like efficient coding for dealing
with a ca= ll to an invalid address.=C2=A0

Anyway I'm try= ing to track down where the invalid value gets set. No luck so far.
There are 6 or 8 places in vm-engine.c where the frame pointer is set to= something
that isn't a pointer (which seems like cheating to= me: passing non-pointer values
in something called "po= inter" is .. well, knee jerk reaction is that it's not wise, but
there may be a deeper reason.)
=C2=A0

> There is no error report... until 5-10 minutes later, when the error > report itself causes a crash.=C2=A0 Before then, I get an endless
> high-speed spew of prints:

It looks like another error is happening within the exception handler.
<= /blockquote>

Well, yes, that also. But given that the in= struction pointer contains garbage
its perhaps not entirely surpr= ising... at best, the question is, why didn't it fail
sooner?=

-- Linas

=C2=A0 =C2=A0 =C2=A0 =C2=A0Mark

PS: It would be good to pick either 'guile-devel' or 'guile-use= r' for
=C2=A0 =C2=A0 continuation of this thread.=C2=A0 I don't see a reason w= hy it should be
=C2=A0 =C2=A0 sent to both lists.


--
cassette tapes - analog TV - film c= ameras - you


--
cassette tapes - analog TV - film cameras - you


--
cassette tapes = - analog TV - film cameras - you


--
cassette tapes - analog TV - film cameras = - you
--00000000000079ed09058e3765ae--