unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Core dump when throwing an exception from a resumed partial continuation
@ 2013-03-15 21:01 Brent Pinkney
  2013-03-15 21:30 ` Andy Wingo
  2013-03-21  9:43 ` Andy Wingo
  0 siblings, 2 replies; 6+ messages in thread
From: Brent Pinkney @ 2013-03-15 21:01 UTC (permalink / raw)
  To: guile-devel; +Cc: Andrew Gaylard

[-- Attachment #1: Type: text/plain, Size: 508 bytes --]

Hi,

I am using partial continuations to resume a computation when an 
external system returns with an answer.
I am using (call-with-prompt ...) and (abort-to-prompt)

When I resume the continuation in another thread, all works perfectly 
UNLESS the continued execution throws and exception.
Then guile exits with a core dump.

By contrast if I resume the continuation in the same thread and then 
throw and exception all works as expected.

Is this a known issue?

All assistance welcomed.


Thanks

Brent


[-- Attachment #2: Type: text/html, Size: 886 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Core dump when throwing an exception from a resumed partial continuation
  2013-03-15 21:01 Core dump when throwing an exception from a resumed partial continuation Brent Pinkney
@ 2013-03-15 21:30 ` Andy Wingo
  2013-03-19  8:11   ` Andrew Gaylard
  2013-03-21  9:43 ` Andy Wingo
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2013-03-15 21:30 UTC (permalink / raw)
  To: Brent Pinkney; +Cc: Andrew Gaylard, guile-devel

Hi,

On Fri 15 Mar 2013 22:01, Brent Pinkney <brp@4dst.com> writes:

> I am using partial continuations to resume a computation when an
> external system returns with an answer.
> I am using (call-with-prompt ...) and (abort-to-prompt)
>
> When I resume the continuation in another thread, all works perfectly

Neat :)

> UNLESS the continued execution throws and exception.
> Then guile exits with a core dump.

That's not good!  Can you work up a short test case?

Thanks,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Core dump when throwing an exception from a resumed partial continuation
  2013-03-15 21:30 ` Andy Wingo
@ 2013-03-19  8:11   ` Andrew Gaylard
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Gaylard @ 2013-03-19  8:11 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 10280 bytes --]

On 03/15/13 23:30, Andy Wingo wrote:
> On Fri 15 Mar 2013 22:01, Brent Pinkney <brp@4dst.com> writes:
>
>> I am using partial continuations to resume a computation when an
>> external system returns with an answer.
>> I am using (call-with-prompt ...) and (abort-to-prompt)
>>
>> When I resume the continuation in another thread, all works perfectly
> Neat :)
>
>> UNLESS the continued execution throws and exception.
>> Then guile exits with a core dump.
> That's not good!  Can you work up a short test case?

We've tried to create a short test-case.  Unfortunately, it doesn't seem 
to trigger the core.  However, the app we're creating triggers the 
core-dump every time. So, to dig into this problem, I built a debuggable VM.

What we see in the debuggable cores is the first backtrace.  You'll note 
that aside from the stack overflow at frame #3, the pattern of "Abort to 
unknown prompt" is repeated /ad infinitum/. Well, certainly to a stack 
depth of 28,000 :).  So the stack overflow is understandable.  I guess 
the question is,  why does guile get stuck in a loop aborting to an 
unknown prompt?.

This is on Linux x86 Ubuntu 12.04, both 32- and 64-bit.  The same code 
crashes the same VM at the same point on Solaris SPARC 64-bit, but that 
core does not appear to show this repetitive pattern.  When I say the 
"same VM", I mean it: all dependencies except for the kernel and libc 
are built from identical sources, using as near as possible the same 
configure flags:

    gcc-4.7.2
    bdw-gc-7.2d
    libtool-2.2.10
    gmp-5.0.2
    libiconv-1.14
    libunistring-0.9.3
    libffi-3.0.10
    readline-6.1
    guile-2.0.7

Ubuntu's guile also shows the same problem.

To understand how guile gets into this state, I put a breakpoint in the 
VM at the point where it first calls abort. That reveals the second 
backtrace below.  This shows what happens immediately before the VM goes 
bananas, and fills up the stack.  Which is exactly what happens when gdb 
allows guile to continue beyond the breakpoint.

I then tried stepping through the scm_c_abort code in frame #2, and it 
indeed does not find anything in the wind list.  Certainly, the list 
returned by scm_i_dynwinds has 12 entries in it.  It's just that none of 
them match.

I'd be really grateful for any help on this -- as you can tell, I'm not 
a VM hacker!

-- 
Andrew

#0  0x0033b416 in __kernel_vsyscall ()
#1  0x004ff1df in __GI_raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x00502825 in __GI_abort () at abort.c:91
#3  0x00a106b7 in vm_error_stack_overflow (vp=0x9c9cfc0) at vm.c:516
#4  0x00a204a4 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac055e70, nargs=4) at vm-engine.c:166
#5  0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac055e70, nargs=4) at vm.c:741
#6  0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#7  0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd4c8, 
args=0x95dd4c8) at eval.c:748
#8  0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x937df10, 
args=0x95dd4d0) at eval.c:588
#9  0x00a0ba1d in scm_throw (key=0x937df10, args=0x95dd4d0) at throw.c:104
#10 0x00a102ff in vm_error (msg=0xa6631d "VM: Too many arguments", 
arg=0x16) at vm.c:414
#11 0x00a105e6 in vm_error_too_many_args (nargs=5) at vm.c:490
#12 0x00a11a42 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac056770, nargs=5) at vm-engine.c:104
#13 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac056770, nargs=5) at vm.c:741
#14 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#15 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd550, 
args=0x95dd550) at eval.c:748
#16 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x9362130, 
args=0x95dd558) at eval.c:588
#17 0x00a0ba1d in scm_throw (key=0x9362130, args=0x95dd558) at throw.c:104
#18 0x00a0c097 in scm_ithrow (key=0x9362130, args=0x95dd558, noreturn=1) 
at throw.c:441
#19 0x009735bf in scm_error_scm (key=0x9362130, subr=0x99105b0, 
message=0x99105c0, args=0x95dd5c8, data=0x4) at error.c:95
#20 0x00973576 in scm_error (key=0x9362130, subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd5c8, rest=0x4) at 
error.c:62
#21 0x00973b6b in scm_misc_error (subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd5c8) at error.c:316
#22 0x0096aef5 in scm_c_abort (vm=0x9cb29e8, tag=0x9c08af0, n=5, 
argv=0xac056960, cookie=6614) at control.c:209
#23 0x00a0fe36 in vm_abort (vm=0x9cb29e8, n=0, vm_cookie=6614) at vm.c:264
#24 0x00a18942 in vm_regular_engine (vm=0x9cb29e8, program=0x93b5260, 
argv=0xac0571f4, nargs=6) at vm-i-system.c:1528
#25 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac0571e0, nargs=5) at vm.c:741
#26 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#27 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd678, 
args=0x95dd678) at eval.c:748
#28 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x9362130, 
args=0x95dd680) at eval.c:588
#29 0x00a0ba1d in scm_throw (key=0x9362130, args=0x95dd680) at throw.c:104
#30 0x00a0c097 in scm_ithrow (key=0x9362130, args=0x95dd680, noreturn=1) 
at throw.c:441
#31 0x009735bf in scm_error_scm (key=0x9362130, subr=0x99105d0, 
message=0x99105e0, args=0x95dd6a0, data=0x4) at error.c:95
#32 0x00973576 in scm_error (key=0x9362130, subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd6a0, rest=0x4) at 
error.c:62
#33 0x00973b6b in scm_misc_error (subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd6a0) at error.c:316
#34 0x0096aef5 in scm_c_abort (vm=0x9cb29e8, tag=0x9c08af0, n=5, 
argv=0xac0573d0, cookie=6613) at control.c:209
#35 0x00a0fe36 in vm_abort (vm=0x9cb29e8, n=0, vm_cookie=6613) at vm.c:264
#36 0x00a18942 in vm_regular_engine (vm=0x9cb29e8, program=0x93b5260, 
argv=0xac057c64, nargs=6) at vm-i-system.c:1528
#37 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac057c50, nargs=5) at vm.c:741
#38 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#39 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd730, 
args=0x95dd730) at eval.c:748
#40 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x9362130, 
args=0x95dd738) at eval.c:588
#41 0x00a0ba1d in scm_throw (key=0x9362130, args=0x95dd738) at throw.c:104
#42 0x00a0c097 in scm_ithrow (key=0x9362130, args=0x95dd738, noreturn=1) 
at throw.c:441
#43 0x009735bf in scm_error_scm (key=0x9362130, subr=0x99105f0, 
message=0x9910600, args=0x95dd758, data=0x4) at error.c:95
#44 0x00973576 in scm_error (key=0x9362130, subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd758, rest=0x4) at 
error.c:62
#45 0x00973b6b in scm_misc_error (subr=0xa4cd3b "abort", 
message=0xa4cd23 "Abort to unknown prompt", args=0x95dd758) at error.c:316
#46 0x0096aef5 in scm_c_abort (vm=0x9cb29e8, tag=0x9c08af0, n=5, 
argv=0xac057e40, cookie=6612) at control.c:209
#47 0x00a0fe36 in vm_abort (vm=0x9cb29e8, n=0, vm_cookie=6612) at vm.c:264


#0  scm_error (key=0x80a7130, subr=0x25cd5b "abort", message=0x25cd43 
"Abort to unknown prompt", args=0x857cf50, rest=0x4) at error.c:62
#1  0x00183b6b in scm_misc_error (subr=0x25cd5b "abort", 
message=0x25cd43 "Abort to unknown prompt", args=0x857cf50) at error.c:316
#2  0x0017aef5 in scm_c_abort (vm=0x8531ca8, tag=0x8800a20, n=1, 
argv=0xa99f8f30, cookie=1) at control.c:209
#3  0x0021fe4a in vm_abort (vm=0x8531ca8, n=0, vm_cookie=1) at vm.c:264
#4  0x00228956 in vm_regular_engine (vm=0x8531ca8, program=0x815a750, 
argv=0xa99f97b0, nargs=2) at vm-i-system.c:1528
#5  0x00244d14 in scm_c_vm_run (vm=0x8531ca8, program=0x80c95a0, 
argv=0xa99f97a4, nargs=3) at vm.c:741
#6  0x00185baa in scm_call_3 (proc=0x80c95a0, arg1=0x404, 
arg2=0x850b240, arg3=0x84255e8) at eval.c:500
#7  0x0021b866 in scm_catch (key=0x404, thunk=0x850b240, 
handler=0x84255e8) at throw.c:73
#8  0x00218dad in really_launch (d=0xb19fcf34) at threads.c:1007
#9  0x0017ba40 in c_body (d=0xa99fa194) at continuations.c:511
#10 0x0021bb28 in apply_catch_closure (clo=0x892ac50, args=0x304) at 
throw.c:146
#11 0x001f03cb in apply_1 (smob=0x892ac50, a=0x304) at smob.c:141
#12 0x002253e2 in vm_regular_engine (vm=0x8531ca8, program=0x80f94b0, 
argv=0xa99fa0b0, nargs=2) at vm-i-system.c:853
#13 0x00244d14 in scm_c_vm_run (vm=0x8531ca8, program=0x80c95a0, 
argv=0xa99fa0a0, nargs=4) at vm.c:741
#14 0x00185bfd in scm_call_4 (proc=0x80c95a0, arg1=0x404, 
arg2=0x892ac50, arg3=0x892ac30, arg4=0x892ac20) at eval.c:507
#15 0x0021b91e in scm_catch_with_pre_unwind_handler (key=0x404, 
thunk=0x892ac50, handler=0x892ac30, pre_unwind_handler=0x892ac20) at 
throw.c:86
#16 0x0021bbee in scm_c_catch (tag=0x404, body=0x17ba24 <c_body>, 
body_data=0xa99fa194, handler=0x17ba4d <c_handler>, 
handler_data=0xa99fa194, pre_unwind_handler=0x17baa0 
<pre_unwind_handler>, pre_unwind_handler_data=0x80fdf80) at throw.c:213
#17 0x0017b8ab in scm_i_with_continuation_barrier (body=0x17ba24 
<c_body>, body_data=0xa99fa194, handler=0x17ba4d <c_handler>, 
handler_data=0xa99fa194, pre_unwind_handler=0x17baa0 
<pre_unwind_handler>, pre_unwind_handler_data=0x80fdf80) at 
continuations.c:449
#18 0x0017bb40 in scm_c_with_continuation_barrier (func=0x218d07 
<really_launch>, data=0xb19fcf34) at continuations.c:545
#19 0x00218b85 in with_guile_and_parent (base=0xa99fa1f8, 
data=0xa99fa224) at threads.c:906
#20 0x002e248c in GC_call_with_stack_base () from 
/opt/cs/components/3rd/bdw-gc/7.2.7e16553s16377h0398/lib/libgc.so.1
#21 0x00218c48 in scm_i_with_guile_and_parent (func=0x218d07 
<really_launch>, data=0xb19fcf34, parent=0x881e5d8) at threads.c:949
#22 0x00218e01 in launch_thread (d=0xb19fcf34) at threads.c:1017
#23 0x002e79b7 in GC_inner_start_routine () from 
/opt/cs/components/3rd/bdw-gc/7.2.7e16553s16377h0398/lib/libgc.so.1
#24 0x002e248c in GC_call_with_stack_base () from 
/opt/cs/components/3rd/bdw-gc/7.2.7e16553s16377h0398/lib/libgc.so.1
#25 0x002e8d69 in GC_start_routine () from 
/opt/cs/components/3rd/bdw-gc/7.2.7e16553s16377h0398/lib/libgc.so.1
#26 0x00321d4c in start_thread (arg=0xa99fab40) at pthread_create.c:308
#27 0x00722d3e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130


[-- Attachment #2: Type: text/html, Size: 12861 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Core dump when throwing an exception from a resumed partial continuation
  2013-03-15 21:01 Core dump when throwing an exception from a resumed partial continuation Brent Pinkney
  2013-03-15 21:30 ` Andy Wingo
@ 2013-03-21  9:43 ` Andy Wingo
  2013-03-21 13:53   ` Andrew Gaylard
  1 sibling, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2013-03-21  9:43 UTC (permalink / raw)
  To: Brent Pinkney; +Cc: Andrew Gaylard, guile-devel

On Fri 15 Mar 2013 22:01, Brent Pinkney <brp@4dst.com> writes:

> When I resume the continuation in another thread, all works perfectly
> UNLESS the continued execution throws and exception.
> Then guile exits with a core dump.
>
> By contrast if I resume the continuation in the same thread and then
> throw and exception all works as expected.

I think I know what this is.

So, a delimited continuation should capture that part of the dynamic
environment made in its extent.  (See Oleg Kiselyov and Chung-Chieh
Shan's "Delimited Dynamic Binding" paper.)  That is what Guile does, for
fluids, prompts, and dynamic-wind blocks.

Our implementation of exception handling uses a fluid,
%exception-handler (boot-9.scm:86).  However that fluid references a
stack of exception handlers on the heap.  There is the problem: an
exception in a reinstated delimited continuation continuation will walk
the captured exception handler stack from the heap, not from its own
dynamic environment.  Therefore it could abort to a continuation that is
not present on the new thread.

The solution is to have the exception handler find the next handler from
the dynamic environment.  This will need a new primitive to walk the
dynamic stack, I think.

I can't look at this atm as I broke my arm (!) and so typing is tough.
For now as a workaround I suggest you put a catch #t in each of your
delimited continuations.  This way all throws will be handled by catches
established by the continuation.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Core dump when throwing an exception from a resumed partial continuation
  2013-03-21  9:43 ` Andy Wingo
@ 2013-03-21 13:53   ` Andrew Gaylard
  2013-03-21 15:34     ` Andy Wingo
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Gaylard @ 2013-03-21 13:53 UTC (permalink / raw)
  To: guile-devel, Andy Wingo

On 03/21/13 11:43, Andy Wingo wrote:
> On Fri 15 Mar 2013 22:01, Brent Pinkney <brp@4dst.com> writes:
>
>> When I resume the continuation in another thread, all works perfectly
>> UNLESS the continued execution throws and exception.
>> Then guile exits with a core dump.
>>
>> By contrast if I resume the continuation in the same thread and then
>> throw and exception all works as expected.
> I think I know what this is.
>
> So, a delimited continuation should capture that part of the dynamic
> environment made in its extent.  (See Oleg Kiselyov and Chung-Chieh
> Shan's "Delimited Dynamic Binding" paper.)  That is what Guile does, for
> fluids, prompts, and dynamic-wind blocks.
>
> Our implementation of exception handling uses a fluid,
> %exception-handler (boot-9.scm:86).  However that fluid references a
> stack of exception handlers on the heap.  There is the problem: an
> exception in a reinstated delimited continuation continuation will walk
> the captured exception handler stack from the heap, not from its own
> dynamic environment.  Therefore it could abort to a continuation that is
> not present on the new thread.
>
> The solution is to have the exception handler find the next handler from
> the dynamic environment.  This will need a new primitive to walk the
> dynamic stack, I think.
>
> I can't look at this atm as I broke my arm (!) and so typing is tough.
> For now as a workaround I suggest you put a catch #t in each of your
> delimited continuations.  This way all throws will be handled by catches
> established by the continuation.
>
> Regards,
>
> Andy
Andy,

Thanks for giving this some thought -- sorry to hear about your arm!

This does shed some light on things. If I change this:

(throw 'oops) ; should not crash the vm

to this:

(catch #t
(λ ()
(throw 'oops)) ; should not crash the vm
(λ ()
(display "Success!")(newline))) ; never reached

the VM still cores; "Success" is never shown. However, you've probably
spotted my mistake: the handler should be (λ (key . args) ... ).

But this core shows up differently in the stack-trace in gdb:

#0 scm_error (key=0x1001854c0, subr=0x0, message=0xffffffff7e7ef518 
"Wrong number of arguments to ~A", args=0x100db95b0, rest=0x4) at 
error.c:62

... which is exactly the exception one would expect. Fixing the handler 
thus:

(catch #t
(λ ()
(throw 'oops)) ; should not crash the vm
(λ (key . args)
(display "Success!")(newline))) ; works!

...solves the problem, and the VM doesn't core any more.

So it seems that although we *did* have a catch around our resumption,
there must have been some (different) error in its handler, which caused a
second exception, which caused the VM to crash.

Unfortunately, the test-case we made handles this second exception fine.
It'd be great to be able to distill this problem down to a pithy test-case.
(Our app is 4500 lines and still growing, so it's not really a candidate to
send to the list.)

The same problem happens (VM cores) if I do this:

(catch 'not-oops
(λ ()
(throw 'oops)) ; should not crash the vm
(λ (key . args)
(display "Success!")(newline))); never reached

So your answer to surround the resumption with a (catch #t ...) is a
good workaround. For our code, anyway.

(I'm now off to go read 
http://www.cs.indiana.edu/~sabry/papers/delim-dyn-bind.pdf :)
-- 
Andrew




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Core dump when throwing an exception from a resumed partial continuation
  2013-03-21 13:53   ` Andrew Gaylard
@ 2013-03-21 15:34     ` Andy Wingo
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Wingo @ 2013-03-21 15:34 UTC (permalink / raw)
  To: Andrew Gaylard; +Cc: guile-devel

On Thu 21 Mar 2013 14:53, Andrew Gaylard <ag@computer.org> writes:

> (catch #t
> (λ ()
> (throw 'oops)) ; should not crash the vm
> (λ ()
> (display "Success!")(newline))) ; never reached
>
> the VM still cores; "Success" is never shown. However, you've probably
> spotted my mistake: the handler should be (λ (key . args) ... ).

The core dump is another bug. but fixing the handler is the key thing:

> (catch #t
> (λ ()
> (throw 'oops)) ; should not crash the vm
> (λ (key . args)
> (display "Success!")(newline))) ; works!
>
> ...solves the problem, and the VM doesn't core any more.

Yep

Happy hacking :)

A
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-21 15:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-15 21:01 Core dump when throwing an exception from a resumed partial continuation Brent Pinkney
2013-03-15 21:30 ` Andy Wingo
2013-03-19  8:11   ` Andrew Gaylard
2013-03-21  9:43 ` Andy Wingo
2013-03-21 13:53   ` Andrew Gaylard
2013-03-21 15:34     ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).