From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.devel Subject: Re: Now crashing [was Re: guile-2.9.2 and threading Date: Wed, 17 Jul 2019 16:44:52 -0500 Message-ID: References: <87h892ault.fsf@netris.org> <87k1cgwo20.fsf@netris.org> Reply-To: linasvepstas@gmail.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000fc923d058de767b9" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="50683"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Guile Development To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Jul 17 23:45:18 2019 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hnrjh-000D3C-M4 for guile-devel@m.gmane.org; Wed, 17 Jul 2019 23:45:18 +0200 Original-Received: from localhost ([::1]:60682 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hnrjf-0000lq-G3 for guile-devel@m.gmane.org; Wed, 17 Jul 2019 17:45:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54659) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hnrjc-0000ka-IJ for guile-devel@gnu.org; Wed, 17 Jul 2019 17:45:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hnrjX-0001pI-G5 for guile-devel@gnu.org; Wed, 17 Jul 2019 17:45:12 -0400 Original-Received: from mail-lj1-x22a.google.com ([2a00:1450:4864:20::22a]:40618) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hnrjW-0001oh-BM for guile-devel@gnu.org; Wed, 17 Jul 2019 17:45:06 -0400 Original-Received: by mail-lj1-x22a.google.com with SMTP id m8so25136338lji.7 for ; Wed, 17 Jul 2019 14:45:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=lNC+kROkJIKwRCZJfyyAwPr3gQEsSSU7qBYmNa7w8mU=; b=h5DofWOQ2oRiyi66rO157ItAzzfrtx72J7DyK9toOCrFmX8zGRARgRxKKPr88KcPHE /nH9w3uGneSgwFMAB7iN/qImF0N9muFhM4c7z9sLMnPFI4lslrc3lJru4Zntx/SjWiB4 BSgRHRFmWqVyPmhIHFxVDya4taESsyFWnL4oxBTcTZt0zxsFwZjC21aNYGXd2P+PO4Sj FvLTn14/nfqHpNe5SDlR5dQF4xHhCD6vrbsY+6SsPkVSyh71CFpNGpqO947gci4A1BRG KoGZQK1Ry2aA3DyMLGS0EuzEChLZXRmyHsP4SITdBVfxqvSyZo2/SDsGqtGZAgrMNXHq is0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=lNC+kROkJIKwRCZJfyyAwPr3gQEsSSU7qBYmNa7w8mU=; b=kVJnTqJkOG4wsz7Rev/sc2wnsrT/3qgHV9AMwwmqhyvq7IX+szgCdvWqLBpEB9Ga23 esv9IWxXQGPBoz2mT5/U5fH9nKfF441HPxM1hQac0BT7iNfEsz8ZyqIGX9ADPTV3atzy bpZ2pp3bPgnRextjjtAXhsSlUFih3vXlF3bvSNJsv1Hy5pNXpzntxREHxD9iEcBnPog+ zaDwW5Jd88UPP1eHD3oEIvu8U5HN2YbEwyriGJXBAQEEQdaB+Uvq1TelmIynS+3vTKCR 1EuF/K7U+jQQEM7jfsSCyiI8WvQ4YKYOc+2i8vmoS4gAN2uR/+fn+jWsQ/mYen69CG5/ 53Zw== X-Gm-Message-State: APjAAAXIIxmReKNytR74+fbFnE8gbQpeSTYcE/DzP8ATLbqrzBySZFSi jO91eav9VZbTBCphrtGAtygbT5oxHzcbR59hLo8QDxfO X-Google-Smtp-Source: APXvYqwLpOEXhNpjkYJOqMpToTYTdv/civbOdOIiTbDnD0my+vUO/e17E0HyjnobR93TpSQTKSCTPXZJbSmKRx5tjuI= X-Received: by 2002:a2e:89d0:: with SMTP id c16mr21645053ljk.219.1563399905119; Wed, 17 Jul 2019 14:45:05 -0700 (PDT) In-Reply-To: <87k1cgwo20.fsf@netris.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::22a X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:20015 Archived-At: --000000000000fc923d058de767b9 Content-Type: text/plain; charset="UTF-8" On Wed, Jul 17, 2019 at 12:49 PM Mark H Weaver wrote: > Hi Linas, > > > Investigating the crash with good-old printf's in libguile/vm.c produces > > a vast ocean of prints ... that should have not been printed, and/or > should > > have been actual errors, but somehow were not handled by scm_error. > > Using today's git pull of master, here's the diff containing a printf: > > > > --- a/libguile/vm.c > > +++ b/libguile/vm.c > > @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0); } > > > > proc = SCM_SMOB_DESCRIPTOR (proc).apply_trampoline; > > SCM_FRAME_LOCAL (vp->fp, 0) = proc; > > return SCM_PROGRAM_CODE (proc); > > } > > > > +printf("duuude wrong type to apply!\n" > > +"proc=%lx\n" > > +"ip=%p\n" > > +"sp=%p\n" > > +"fp=%p\n" > > +"sp_min=%p\n" > > +"stack_lim=%p\n", > > +SCM_FRAME_SLOT(vp->fp, 0)->as_u64, > > +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack_limit); > > +fflush(stdout); > > + > > vp->ip = SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp); > > > > scm_error (scm_arg_type_key, NULL, "Wrong type to apply: ~S", > > scm_list_1 (proc), scm_list_1 (proc)); > > } > > > > As you can see, shortly after my printf, there should have been an > > error report. > > Not necessarily. Note that what 'scm_error' actually does is to raise > an exception. What happens next depends on what exception handlers are > installed at the time of the error. > OK, but... when I look at what get_callee_vcode() actually does, it seems to be earnestly trying to fish out the location of a callable function from the frame pointer, and it does so three plausible ways. If those three don't work out, then it sets the instruction pointer (to the garbage value), followed by scm_error(Wrong type to apply). This also looks like an earnest, honest attempt to report a real error. But lets double-check. So who calls get_callee_vcode(), and why, and what did they expect to happen? Well, that's in three places: one in scm_call_n which is a plausible place where one might expect the instruction pointer to be set to a valid value. Then there's two places in vm-engine.c -- "tail-call" and "call" both of which one might plausibly expect to have a valid instruction pointer. I can't imagine any valid scenario where anyone was expecting get_callee_vcode() to actually fail in the normal course of operations. That is, I can't think of any valid reason why anyone would want to suppress the scm_error(). And even if I could -- calling scm_error() hundreds of times per second, as fast as possible, does not seem like efficient coding for dealing with a call to an invalid address. Anyway I'm trying to track down where the invalid value gets set. No luck so far. There are 6 or 8 places in vm-engine.c where the frame pointer is set to something that isn't a pointer (which seems like cheating to me: passing non-pointer values in something called "pointer" is .. well, knee jerk reaction is that it's not wise, but there may be a deeper reason.) > > > There is no error report... until 5-10 minutes later, when the error > > report itself causes a crash. Before then, I get an endless > > high-speed spew of prints: > > It looks like another error is happening within the exception handler. > Well, yes, that also. But given that the instruction pointer contains garbage its perhaps not entirely surprising... at best, the question is, why didn't it fail sooner? -- Linas > > Mark > > PS: It would be good to pick either 'guile-devel' or 'guile-user' for > continuation of this thread. I don't see a reason why it should be > sent to both lists. > -- cassette tapes - analog TV - film cameras - you --000000000000fc923d058de767b9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Jul 17, 2019 at 12:49 PM Mark= H Weaver <mhw@netris.org> wrot= e:
Hi Linas,

> Investigating the crash with good-old printf's in libguile/vm.c pr= oduces
> a vast ocean of prints ... that should have not been printed, and/or s= hould
> have been actual errors, but somehow were not handled by scm_error. > Using today's git pull of master, here's the diff containing a= printf:
>
> --- a/libguile/vm.c
> +++ b/libguile/vm.c
> @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0);= }
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 proc =3D SCM_SMOB_DESCRIPTOR (proc).apply_t= rampoline;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 SCM_FRAME_LOCAL (vp->fp, 0) =3D proc; >=C2=A0 =C2=A0 =C2=A0 =C2=A0 return SCM_PROGRAM_CODE (proc);
>=C2=A0 =C2=A0 =C2=A0 }
>
> +printf("duuude wrong type to apply!\n"
> +"proc=3D%lx\n"
> +"ip=3D%p\n"
> +"sp=3D%p\n"
> +"fp=3D%p\n"
> +"sp_min=3D%p\n"
> +"stack_lim=3D%p\n",
> +SCM_FRAME_SLOT(vp->fp, 0)->as_u64,
> +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack= _limit);
> +fflush(stdout);
> +
>=C2=A0 =C2=A0 vp->ip =3D SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp= );
>
>=C2=A0 =C2=A0 scm_error (scm_arg_type_key, NULL, "Wrong type to ap= ply: ~S",
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scm_list_1 (proc= ), scm_list_1 (proc));
>=C2=A0 }
>
> As you can see, shortly after my printf, there should have been an
> error report.

Not necessarily.=C2=A0 Note that what 'scm_error' actually does is = to raise
an exception.=C2=A0 What happens next depends on what exception handlers ar= e
installed at the time of the error.

OK,= but... when I look at what get_callee_vcode() actually does, it seems
to be earnestly trying to fish out the location of a callable functio= n from the
frame pointer, and it does so three plausible way= s. If those three don't work
out, then it sets the instructio= n pointer (to the garbage value), followed by
scm_error(Wron= g type to apply). This also looks like an earnest, honest
attempt= to report a real error.=C2=A0 But lets double-check.

So who calls get_callee_vcode(), and why, and what did they expect = to happen?
Well, that's in three places: one in scm_call_n wh= ich is a plausible place where
one might expect the instruct= ion pointer to be set to a valid value. Then there's two
plac= es in vm-engine.c -- "tail-call" and "call" both of whi= ch one might plausibly expect
to have a valid instruction pointer= .=C2=A0 I can't imagine any valid scenario where anyone
was e= xpecting get_callee_vcode() to actually fail in the normal course of operat= ions.=C2=A0

That is, I can't think of any= valid reason why anyone would want to suppress
the scm_error().= =C2=A0 And even if I could -- calling scm_error() hundreds of times
per second, as fast as possible, does not seem like efficient coding for= dealing
with a call to an invalid address.=C2=A0

<= /div>
Anyway I'm trying to track down where the invalid value gets = set. No luck so far.
There are 6 or 8 places in vm-engine.c where= the frame pointer is set to something
that isn't a pointer (= which seems like cheating to me: passing non-pointer values
= in something called "pointer" is .. well, knee jerk reaction is t= hat it's not wise, but
there may be a deeper reason.)
=C2=A0

> There is no error report... until 5-10 minutes later, when the error > report itself causes a crash.=C2=A0 Before then, I get an endless
> high-speed spew of prints:

It looks like another error is happening within the exception handler.
<= /blockquote>

Well, yes, that also. But given that the in= struction pointer contains garbage
its perhaps not entirely surpr= ising... at best, the question is, why didn't it fail
sooner?=

-- Linas

=C2=A0 =C2=A0 =C2=A0 =C2=A0Mark

PS: It would be good to pick either 'guile-devel' or 'guile-use= r' for
=C2=A0 =C2=A0 continuation of this thread.=C2=A0 I don't see a reason w= hy it should be
=C2=A0 =C2=A0 sent to both lists.


--
cassette tapes - analog TV - film cameras = - you
--000000000000fc923d058de767b9--