From: Linas Vepstas <linasvepstas@gmail.com>
To: Mark H Weaver <mhw@netris.org>
Cc: Guile Development <guile-devel@gnu.org>
Subject: Re: Now crashing [was Re: guile-2.9.2 and threading
Date: Wed, 17 Jul 2019 20:42:13 -0500 [thread overview]
Message-ID: <CAHrUA34ECsTRNFecrmB+yDByZ5xMeq8=mwDjKpeO+Lu6vE8ZfQ@mail.gmail.com> (raw)
In-Reply-To: <CAHrUA36vxr+Qnn5uyitmmUOUJ=0LYjeeXjrLZnFQdRiPi6ZHXQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5370 bytes --]
Seem to be narrowing it down ... or at least, I have more details ...
On Wed, Jul 17, 2019 at 4:44 PM Linas Vepstas <linasvepstas@gmail.com>
wrote:
>
>
> On Wed, Jul 17, 2019 at 12:49 PM Mark H Weaver <mhw@netris.org> wrote:
>
>> Hi Linas,
>>
>> > Investigating the crash with good-old printf's in libguile/vm.c produces
>> > a vast ocean of prints ... that should have not been printed, and/or
>> should
>> > have been actual errors, but somehow were not handled by scm_error.
>> > Using today's git pull of master, here's the diff containing a printf:
>> >
>> > --- a/libguile/vm.c
>> > +++ b/libguile/vm.c
>> > @@ -1514,12 +1514,23 @@ thread->guard); fflush(stdout); assert (0); }
>> >
>> > proc = SCM_SMOB_DESCRIPTOR (proc).apply_trampoline;
>> > SCM_FRAME_LOCAL (vp->fp, 0) = proc;
>> > return SCM_PROGRAM_CODE (proc);
>> > }
>> >
>> > +printf("duuude wrong type to apply!\n"
>> > +"proc=%lx\n"
>> > +"ip=%p\n"
>> > +"sp=%p\n"
>> > +"fp=%p\n"
>> > +"sp_min=%p\n"
>> > +"stack_lim=%p\n",
>> > +SCM_FRAME_SLOT(vp->fp, 0)->as_u64,
>> > +vp->ip, vp->sp, vp->fp, vp->sp_min_since_gc, vp->stack_limit);
>> > +fflush(stdout);
>> > +
>> > vp->ip = SCM_FRAME_VIRTUAL_RETURN_ADDRESS (vp->fp);
>> >
>> > scm_error (scm_arg_type_key, NULL, "Wrong type to apply: ~S",
>> > scm_list_1 (proc), scm_list_1 (proc));
>> > }
>> >
>> > As you can see, shortly after my printf, there should have been an
>> > error report.
>>
>> Not necessarily. Note that what 'scm_error' actually does is to raise
>> an exception. What happens next depends on what exception handlers are
>> installed at the time of the error.
>>
>
> OK, but... when I look at what get_callee_vcode() actually does, it seems
> to be earnestly trying to fish out the location of a callable function
> from the
> frame pointer, and it does so three plausible ways. If those three don't
> work
> out, then it sets the instruction pointer (to the garbage value), followed
> by
> scm_error(Wrong type to apply). This also looks like an earnest, honest
> attempt to report a real error. But lets double-check.
>
> So who calls get_callee_vcode(), and why, and what did they expect to
> happen?
> Well, that's in three places: one in scm_call_n which is a plausible place
> where
> one might expect the instruction pointer to be set to a valid value. Then
> there's two
> places in vm-engine.c -- "tail-call" and "call" both of which one might
> plausibly expect
> to have a valid instruction pointer. I can't imagine any valid scenario
> where anyone
> was expecting get_callee_vcode() to actually fail in the normal course of
> operations.
>
There is one more place where get_callee_vcode() can get called -- via the
jump_table,
via a call to scm_jit_enter_mcode() which issues the code emitted by
emit_get_callee_vcode
There are four calls to scm_jit_enter_mcode() The one that immediately
preceeds
the bug is always the one made here, in vm-engine.c:
VM_DEFINE_OP (7, return_values, "return-values", OP1 (X32))
Right before the call to scm_jit_enter_mcode(), I can printf VP->fp and
SCM_FRAME_LOCAL(VP->fp, 0),
and they are... fp=0x7fffe000caf8 fpslot=d33b00 (typical)
the mcode is of course some bytecode that bounces through lightning, and a
few insns
later, it arrives at get_callee_vcode() but now the fp is different, (it
changes by 0x20,
always) and the slot is different: fp=0x7fffe000cad8 and
SCM_FRAME_LOCAL(fp,0)
is 0x32 and the 0x32 triggers the scm_error(). (because 0x32 is not any of
SCM_PROGRAM_P or SCM_STRUCTP or a smob)
(but also, the fpslot=d33b00 is never a SCM_PROGRAM_P or SCM_STRUCTP or
a smob, either... so something got computed along the way ... )
That's what I've got so far. Its highly reproducible. Quick to happen.
I'm not sure
what to do next. I guess I need to examine emit_get_callee_vcode() and see
what
it does, and why. Any comments, suggestions would be useful.
-- Linas
> That is, I can't think of any valid reason why anyone would want to
> suppress
> the scm_error(). And even if I could -- calling scm_error() hundreds of
> times
> per second, as fast as possible, does not seem like efficient coding for
> dealing
> with a call to an invalid address.
>
> Anyway I'm trying to track down where the invalid value gets set. No luck
> so far.
> There are 6 or 8 places in vm-engine.c where the frame pointer is set to
> something
> that isn't a pointer (which seems like cheating to me: passing non-pointer
> values
> in something called "pointer" is .. well, knee jerk reaction is that it's
> not wise, but
> there may be a deeper reason.)
>
>
>>
>> > There is no error report... until 5-10 minutes later, when the error
>> > report itself causes a crash. Before then, I get an endless
>> > high-speed spew of prints:
>>
>> It looks like another error is happening within the exception handler.
>>
>
> Well, yes, that also. But given that the instruction pointer contains
> garbage
> its perhaps not entirely surprising... at best, the question is, why
> didn't it fail
> sooner?
>
> -- Linas
>
>>
>> Mark
>>
>> PS: It would be good to pick either 'guile-devel' or 'guile-user' for
>> continuation of this thread. I don't see a reason why it should be
>> sent to both lists.
>>
>
>
> --
> cassette tapes - analog TV - film cameras - you
>
--
cassette tapes - analog TV - film cameras - you
[-- Attachment #2: Type: text/html, Size: 7792 bytes --]
next prev parent reply other threads:[~2019-07-18 1:42 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-02 23:25 guile-2.9.2 and threading Linas Vepstas
2019-06-07 4:26 ` Mark H Weaver
2019-06-07 5:01 ` Mark H Weaver
2019-07-09 20:46 ` Linas Vepstas
2019-07-14 21:59 ` Now crashing [was " Linas Vepstas
2019-07-14 22:03 ` Linas Vepstas
2019-07-15 3:03 ` Linas Vepstas
2019-07-17 16:27 ` Linas Vepstas
2019-07-17 17:47 ` Mark H Weaver
2019-07-17 21:44 ` Linas Vepstas
2019-07-18 1:42 ` Linas Vepstas [this message]
2019-07-18 3:52 ` Linas Vepstas
2019-07-21 21:10 ` Linas Vepstas
2019-08-05 18:07 ` Mark H Weaver
2019-08-07 16:05 ` Linas Vepstas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHrUA34ECsTRNFecrmB+yDByZ5xMeq8=mwDjKpeO+Lu6vE8ZfQ@mail.gmail.com' \
--to=linasvepstas@gmail.com \
--cc=guile-devel@gnu.org \
--cc=mhw@netris.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).