* guile 3 update, june 2018 edition @ 2018-06-29 8:13 Andy Wingo 2018-06-30 0:40 ` dsmich 2018-07-02 9:28 ` Ludovic Courtès 0 siblings, 2 replies; 6+ messages in thread From: Andy Wingo @ 2018-06-29 8:13 UTC (permalink / raw) To: guile-devel Hi, Just wanted to give an update on Guile 3 developments. Last note was here: https://lists.gnu.org/archive/html/guile-devel/2018-04/msg00004.html The news is that the VM has been completely converted over to call out to the Guile runtime through an "intrinsics" vtable. For some intrinsics, the compiler will emit specialized call-intrinsic opcodes. (There's one of these opcodes for each intrinsic function type.) For others that are a bit more specialized, like the intrinsic used in call-with-prompt, the VM calls out directly to the intrinsic. The upshot is that we're now ready to do JIT compilation. JIT-compiled code will use the intrinsics vtable to embed references to runtime routines. In some future, AOT-compiled code can keep the intrinsics vtable in a register, and call indirectly through that register. My current plan is that the frame overhead will still be two slots: the saved previous FP, and the saved return address. Right now the return address is always a bytecode address. In the future it will be bytecode or native code. Guile will keep a runtime routine marking regions of native code so it can know if it needs to if an RA is bytecode or native code, for debugging reasons; but in most operation, Guile won't need to know. The interpreter will tier up to JIT code through an adapter frame that will do impedance matching over virtual<->physical addresses. To tier down to the interpreter (e.g. when JIT code calls interpreted code), the JIT will simply return to the interpreter, which will pick up state from the virtual IP, SP, and FP saved in the VM state. We do walk the stack from Scheme sometimes, notably when making a backtrace. So, we'll make the runtime translate the JIT return addresses to virtual return addresses in the frame API. To Scheme, it will be as if all things were interpreted. This strategy relies on the JIT being a simple code generator, not an optimizer -- the state of the stack whether JIT or interpreted is the same. We can consider relaxing this in the future. My current problem is knowing when a callee has JIT code. Say you're in JITted function F which calls G. Can you directly jump to G's native code, or is G not compiled yet and you need to use the interpreter? I haven't solved this yet. "Known calls" that use call-label and similar can of course eagerly ensure their callees are JIT-compiled, at compilation time. Unknown calls are the problem. I don't know whether to consider reserving another word in scm_tc7_program objects for JIT code. I have avoided JIT overhead elsewhere and would like to do so here as well! For actual JIT code generation, I think my current plan is to import a copy of GNU lightning into Guile's source, using git-subtree merges. Lightning is fine for our purposes as we only need code generation, not optimization, and it supports lots of architectures: ARM, MIPS, PPC, SPARC, x86 / x86-64, IA64, HPPA, AArch64, S390, and Alpha. Lightning will be built statically into libguile. This has the advantage that we always know the version being used, and we are able to extend lightning without waiting for distros to pick up a new version. Already we will need to extend it to support atomic ops. Subtree merges should allow us to pick up upstream improvements without too much pain. This strategy also allows us to drop lightning in the future if that's the right thing. Basically from the user POV it should be transparent. The whole thing will be behind an --enable-jit / --disable-jit configure option. When it is working we can consider enabling shared lightning usage. Happy hacking, Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: guile 3 update, june 2018 edition 2018-06-29 8:13 guile 3 update, june 2018 edition Andy Wingo @ 2018-06-30 0:40 ` dsmich 2018-07-02 4:02 ` dsmich 2018-07-02 9:28 ` Ludovic Courtès 1 sibling, 1 reply; 6+ messages in thread From: dsmich @ 2018-06-30 0:40 UTC (permalink / raw) To: guile-devel, Andy Wingo Greetings Andy! ---- Andy Wingo <wingo@pobox.com> wrote: > Hi, > > Just wanted to give an update on Guile 3 developments. Last note was > here: > > https://lists.gnu.org/archive/html/guile-devel/2018-04/msg00004.html > > The news is that the VM has been completely converted over to call out > to the Guile runtime through an "intrinsics" vtable. For some > intrinsics, the compiler will emit specialized call-intrinsic opcodes. > (There's one of these opcodes for each intrinsic function type.) For > others that are a bit more specialized, like the intrinsic used in > call-with-prompt, the VM calls out directly to the intrinsic. Very exciting! However, master is not building for me. :( git clean -dxf; ./autogen.sh && ./configure && make -j5 gives me SNARF atomic.x SNARF backtrace.x SNARF boolean.x In file included from atomic.c:29:0: extensions.h:26:30: fatal error: libguile/libpath.h: No such file or directory #include "libguile/libpath.h" ^ compilation terminated. Makefile:3893: recipe for target 'atomic.x' failed make[2]: *** [atomic.x] Error 1 Maybe some dependency tuning is needed? So. Building without -j : make clean; make gives gives a segfault when generating the docs SNARF regex-posix.doc GEN guile-procedures.texi Uncaught exception: Backtrace: /bin/bash: line 1: 13428 Broken pipe cat alist.doc array-handle.doc array-map.doc arrays.doc async.doc atomic.doc backtrace.doc boolean.doc bitvectors.doc bytevectors.doc chars.doc control.doc continuations.doc debug.doc deprecated.doc deprecation.doc dynl.doc dynwind.doc eq.doc error.doc eval.doc evalext.doc expand.doc extensions.doc fdes-finalizers.doc feature.doc filesys.doc fluids.doc foreign.doc fports.doc gc-malloc.doc gc.doc gettext.doc generalized-arrays.doc generalized-vectors.doc goops.doc gsubr.doc guardians.doc hash.doc hashtab.doc hooks.doc i18n.doc init.doc ioext.doc keywords.doc list.doc load.doc macros.doc mallocs.doc memoize.doc modules.doc numbers.doc objprop.doc options.doc pairs.doc ports.doc print.doc procprop.doc procs.doc promises.doc r6rs-ports.doc random.doc rdelim.doc read.doc rw.doc scmsigs.doc script.doc simpos.doc smob.doc sort.doc srcprop.doc srfi-1.doc srfi-4.doc srfi-13.doc srfi-14.doc srfi-60.doc stackchk.doc stacks.doc stime.doc strings.doc strorder.doc strports.doc struct.doc symbols.doc syntax.doc threads.doc throw.doc trees.doc unicode.doc uniform.doc values.doc variable.doc vectors.doc version.doc vports.doc weak-set.doc weak-table.doc weak-vector.doc dynl.doc posix.doc net_db.doc socket.doc regex-posix.doc 13429 Segmentation fault | GUILE_AUTO_COMPILE=0 ../meta/build-env guild snarf-check-and-output-texi > guile-procedures.texi Makefile:3910: recipe for target 'guile-procedures.texi' failed This is $ git describe v2.2.2-504-gb5dcdf2e2 And gcc is $ gcc --version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 On an up to date Debian 9.4 system: $ uname -a Linux debmetrix 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux -Dale ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: guile 3 update, june 2018 edition 2018-06-30 0:40 ` dsmich @ 2018-07-02 4:02 ` dsmich 2018-07-17 10:32 ` dsmich 0 siblings, 1 reply; 6+ messages in thread From: dsmich @ 2018-07-02 4:02 UTC (permalink / raw) To: guile-devel, dsmich, Andy Wingo Ok! now getting past the "make -j" issue, but I'm still getting a segfault. Here is a backtrace from the core dump. Line 25: #25 0x00007efeb518b09f in scm_error (key=0x563599bbb120, subr=subr@entry=0x0, message=message@entry=0x7efeb521c0cd "Unbound variable: ~S", args=0x563599f8f260, rest=rest@entry=0x4) at error.c:62 Looks kinda suspicious. Should subr be 0x0 there? Thread 4 (Thread 0x7efeb282c700 (LWP 10059)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #2 0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #3 0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #4 0x00007efeb49f6494 in start_thread (arg=0x7efeb282c700) at pthread_create.c:333 #5 0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Thread 3 (Thread 0x7efeb382e700 (LWP 10057)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #2 0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #3 0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #4 0x00007efeb49f6494 in start_thread (arg=0x7efeb382e700) at pthread_create.c:333 #5 0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Thread 2 (Thread 0x7efeb302d700 (LWP 10058)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #2 0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #3 0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #4 0x00007efeb49f6494 in start_thread (arg=0x7efeb302d700) at pthread_create.c:333 #5 0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Thread 1 (Thread 0x7efeb565d740 (LWP 10034)): #0 0x00007efeb51afd26 in scm_maybe_resolve_module (name=name@entry=0x563599f8f140) at modules.c:195 #1 0x00007efeb51b01bf in scm_public_variable (module_name=0x563599f8f140, name=0x563599da50e0) at modules.c:656 #2 0x00007efeb518036a in init_print_frames_var_and_frame_to_stack_vector_var () at backtrace.c:103 #3 0x00007efeb49fd739 in __pthread_once_slow (once_control=0x7efeb545d828 <once>, init_routine=0x7efeb5180340 <init_print_frames_var_and_frame_to_stack_vector_var>) at pthread_once.c:116 #4 0x00007efeb49fd7e5 in __GI___pthread_once (once_control=once_control@entry=0x7efeb545d828 <once>, init_routine=init_routine@entry=0x7efeb5180340 <init_print_frames_var_and_frame_to_stack_vector_var>) at pthread_once.c:143 #5 0x00007efeb51801b0 in display_backtrace_body (a=0x7ffe2b3b7ea0) at backtrace.c:218 #6 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #7 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da5aa0, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440 #8 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da5aa0) at eval.c:489 #9 0x00007efeb51f8cd6 in catch (tag=tag@entry=0x404, thunk=0x563599da5aa0, handler=0x563599da5940, pre_unwind_handler=0x4) at throw.c:144 #10 0x00007efeb51f9015 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>) at throw.c:262 #11 0x00007efeb51f91cf in scm_c_catch (tag=tag@entry=0x404, body=body@entry=0x7efeb5180190 <display_backtrace_body>, body_data=body_data@entry=0x7ffe2b3b7ea0, handler=handler@entry=0x7efeb5180580 <error_during_backtrace>, handler_data=handler_data@entry=0x563599baf000, pre_unwind_handler=pre_unwind_handler@entry=0x0, pre_unwind_handler_data=0x0) at throw.c:387 #12 0x00007efeb51f91de in scm_internal_catch (tag=tag@entry=0x404, body=body@entry=0x7efeb5180190 <display_backtrace_body>, body_data=body_data@entry=0x7ffe2b3b7ea0, handler=handler@entry=0x7efeb5180580 <error_during_backtrace>, handler_data=handler_data@entry=0x563599baf000) at throw.c:396 #13 0x00007efeb5180185 in scm_display_backtrace_with_highlights (stack=stack@entry=0x563599da5b60, port=port@entry=0x563599baf000, first=first@entry=0x4, depth=depth@entry=0x4, highlights=highlights@entry=0x304) at backtrace.c:277 #14 0x00007efeb51f8fec in handler_message (tag=tag@entry=0x563599bbb120, args=args@entry=0x563599c0cdb0, handler_data=<optimized out>) at throw.c:548 #15 0x00007efeb51f93cb in scm_handle_by_message (handler_data=<optimized out>, tag=0x563599bbb120, args=0x563599c0cdb0) at throw.c:585 #16 0x00007efeb51f94fe in default_exception_handler (args=0x563599c0cdb0, k=0x563599bbb120) at throw.c:174 #17 throw_without_pre_unwind (tag=0x563599bbb120, args=0x563599c0cdb0) at throw.c:248 #18 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #19 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599baf9c0, argv=<optimized out>, nargs=5) at vm.c:1440 ---Type <return> to continue, or q <return> to quit--- #20 0x00007efeb518ce4b in scm_apply_0 (proc=0x563599baf9c0, args=0x304) at eval.c:602 #21 0x00007efeb518da4d in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=0x563599bbb120, args=args@entry=0x563599f8f220) at eval.c:608 #22 0x00007efeb51f9056 in scm_throw (key=key@entry=0x563599bbb120, args=0x563599f8f220) at throw.c:274 #23 0x00007efeb51f95d9 in scm_ithrow (key=key@entry=0x563599bbb120, args=<optimized out>, no_return=no_return@entry=1) at throw.c:621 #24 0x00007efeb518b005 in scm_error_scm (key=key@entry=0x563599bbb120, subr=<optimized out>, message=message@entry=0x563599da5c20, args=args@entry=0x563599f8f260, data=data@entry=0x4) at error.c:90 #25 0x00007efeb518b09f in scm_error (key=0x563599bbb120, subr=subr@entry=0x0, message=message@entry=0x7efeb521c0cd "Unbound variable: ~S", args=0x563599f8f260, rest=rest@entry=0x4) at error.c:62 #26 0x00007efeb517a18b in error_unbound_variable (symbol=symbol@entry=0x563599b34440) at memoize.c:842 #27 0x00007efeb51ad66d in scm_sys_resolve_variable (loc=<optimized out>, loc@entry=0x563599f6cc20, mod=<optimized out>) at memoize.c:868 #28 0x00007efeb518d622 in eval (x=<optimized out>, x@entry=0x563599f6cc10, env=<optimized out>) at eval.c:431 #29 0x00007efeb518d4e1 in eval (x=<optimized out>, env=<optimized out>) at eval.c:414 #30 0x00007efeb518d380 in eval (x=<optimized out>, env=<optimized out>) at eval.c:338 #31 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944 #32 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #33 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da3280, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440 #34 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da3280) at eval.c:489 #35 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370 #36 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944 #37 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #38 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da3300, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440 #39 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da3300) at eval.c:489 #40 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370 #41 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944 #42 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #43 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da34a0, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440 #44 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da34a0) at eval.c:489 #45 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370 #46 0x00007efeb518d40a in eval (x=<optimized out>, env=<optimized out>) at eval.c:354 #47 0x00007efeb518d737 in prepare_boot_closure_env_for_eval (inout_env=0x7ffe2b3b93c0, out_body=0x7ffe2b3b93c8, exps=0x563599f65c70, argc=<optimized out>, proc=0x563599da37e0) at eval.c:933 #48 eval (x=<optimized out>, env=<optimized out>) at eval.c:344 #49 0x00007efeb518d40a in eval (x=<optimized out>, env=<optimized out>) at eval.c:354 #50 0x00007efeb518d737 in prepare_boot_closure_env_for_eval (inout_env=0x7ffe2b3b9700, out_body=0x7ffe2b3b9708, exps=0x563599f5cc20, argc=<optimized out>, proc=0x563599bec340) at eval.c:933 #51 eval (x=<optimized out>, env=<optimized out>) at eval.c:344 #52 0x00007efeb518d333 in eval (x=<optimized out>, env=<optimized out>) at eval.c:282 #53 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944 #54 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #55 0x00007efeb52046d3 in scm_call_n (proc=0x563599d3c060, argv=argv@entry=0x7ffe2b3b9bb8, nargs=nargs@entry=1) at vm.c:1440 #56 0x00007efeb518cad8 in scm_call_1 (proc=<optimized out>, arg1=<optimized out>, arg1@entry=0x563599f8b4f0) at eval.c:495 #57 0x00007efeb518d9b8 in scm_c_primitive_eval (exp=0x563599f8b4f0) at eval.c:662 #58 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #59 0x00007efeb52046d3 in scm_call_n (proc=0x563599bbe860, argv=argv@entry=0x7ffe2b3b9dc8, nargs=nargs@entry=1) at vm.c:1440 #60 0x00007efeb518dbb7 in scm_primitive_eval (exp=<optimized out>) at eval.c:670 #61 0x00007efeb51aa11b in scm_primitive_load (filename=filename@entry=0x563599c0dbc0) at load.c:130 #62 0x00007efeb51ab6f0 in scm_primitive_load_path (args=<optimized out>) at load.c:1266 #63 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #64 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599ba03e0, argv=argv@entry=0x7ffe2b3ba160, nargs=nargs@entry=1) at vm.c:1440 #65 0x00007efeb518d43b in eval (x=<optimized out>, env=<optimized out>) at eval.c:356 #66 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610 #67 0x00007efeb52046d3 in scm_call_n (proc=0x563599bbe860, argv=argv@entry=0x7ffe2b3ba4f8, nargs=nargs@entry=1) at vm.c:1440 #68 0x00007efeb518dbb7 in scm_primitive_eval (exp=<optimized out>) at eval.c:670 ---Type <return> to continue, or q <return> to quit--- #69 0x00007efeb51aa11b in scm_primitive_load (filename=filename@entry=0x563599bc0740) at load.c:130 #70 0x00007efeb51ab6f0 in scm_primitive_load_path (args=<optimized out>) at load.c:1266 #71 0x00007efeb51abb15 in scm_c_primitive_load_path (filename=filename@entry=0x7efeb521a4bd "ice-9/boot-9") at load.c:1274 #72 0x00007efeb51a4307 in scm_load_startup_files () at init.c:251 #73 0x00007efeb51a46b1 in scm_i_init_guile (base=<optimized out>) at init.c:531 #74 0x00007efeb51f7888 in scm_i_init_thread_for_guile (base=0x7ffe2b3ba730, dynamic_state=0x0) at threads.c:574 #75 0x00007efeb51f78b9 in with_guile (base=0x7ffe2b3ba730, data=0x7ffe2b3ba760) at threads.c:642 #76 0x00007efeb43fb3c2 in GC_call_with_stack_base () from /usr/lib/x86_64-linux-gnu/libgc.so.1 #77 0x00007efeb51f7c78 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7ffe2b3ba760, func=func@entry=0x7efeb51a41b0 <invoke_main_func>) at threads.c:692 #78 scm_with_guile (func=func@entry=0x7efeb51a41b0 <invoke_main_func>, data=data@entry=0x7ffe2b3ba790) at threads.c:698 #79 0x00007efeb51a4362 in scm_boot_guile (argc=6, argv=0x7ffe2b3ba8e8, main_func=0x563598b4fb40 <inner_main>, closure=0x0) at init.c:319 #80 0x0000563598b4f9a4 in main (argc=6, argv=0x7ffe2b3ba8e8) at guile.c:95 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: guile 3 update, june 2018 edition 2018-07-02 4:02 ` dsmich @ 2018-07-17 10:32 ` dsmich 0 siblings, 0 replies; 6+ messages in thread From: dsmich @ 2018-07-17 10:32 UTC (permalink / raw) To: guile-devel, dsmich, Andy Wingo ---- dsmich@roadrunner.com wrote: > Ok! now getting past the "make -j" issue, but I'm still getting a segfault. And now commit e6461cf1b2b63e3ec9a2867731742db552b61b71 has gotten past the segfault. Wooo! -Dale ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: guile 3 update, june 2018 edition 2018-06-29 8:13 guile 3 update, june 2018 edition Andy Wingo 2018-06-30 0:40 ` dsmich @ 2018-07-02 9:28 ` Ludovic Courtès 2018-07-05 17:05 ` Andy Wingo 1 sibling, 1 reply; 6+ messages in thread From: Ludovic Courtès @ 2018-07-02 9:28 UTC (permalink / raw) To: guile-devel Hello! Andy Wingo <wingo@pobox.com> skribis: > The news is that the VM has been completely converted over to call out > to the Guile runtime through an "intrinsics" vtable. For some > intrinsics, the compiler will emit specialized call-intrinsic opcodes. > (There's one of these opcodes for each intrinsic function type.) For > others that are a bit more specialized, like the intrinsic used in > call-with-prompt, the VM calls out directly to the intrinsic. > > The upshot is that we're now ready to do JIT compilation. JIT-compiled > code will use the intrinsics vtable to embed references to runtime > routines. In some future, AOT-compiled code can keep the intrinsics > vtable in a register, and call indirectly through that register. Exciting! It sounds like a really good strategy because it means that the complex instructions don’t have to be implemented in lightning assembly by hand, which would be a pain. > My current plan is that the frame overhead will still be two slots: the > saved previous FP, and the saved return address. Right now the return > address is always a bytecode address. In the future it will be bytecode > or native code. Guile will keep a runtime routine marking regions of > native code so it can know if it needs to if an RA is bytecode or native > code, for debugging reasons; but in most operation, Guile won't need to > know. The interpreter will tier up to JIT code through an adapter frame > that will do impedance matching over virtual<->physical addresses. To > tier down to the interpreter (e.g. when JIT code calls interpreted > code), the JIT will simply return to the interpreter, which will pick up > state from the virtual IP, SP, and FP saved in the VM state. What will the “adapter frame” look like? > We do walk the stack from Scheme sometimes, notably when making a > backtrace. So, we'll make the runtime translate the JIT return > addresses to virtual return addresses in the frame API. To Scheme, it > will be as if all things were interpreted. Currently you can inspect the locals of a stack frame. Will that be possible with frames corresponding to native code? (I suppose that’d be difficult.) > My current problem is knowing when a callee has JIT code. Say you're in > JITted function F which calls G. Can you directly jump to G's native > code, or is G not compiled yet and you need to use the interpreter? I > haven't solved this yet. "Known calls" that use call-label and similar > can of course eagerly ensure their callees are JIT-compiled, at > compilation time. Unknown calls are the problem. I don't know whether > to consider reserving another word in scm_tc7_program objects for JIT > code. I have avoided JIT overhead elsewhere and would like to do so > here as well! In the absence of a native code pointer in scm_tc7_program objects, how will libguile find the native code for a given program? Thanks for sharing this plan! Good times ahead! Ludo’. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: guile 3 update, june 2018 edition 2018-07-02 9:28 ` Ludovic Courtès @ 2018-07-05 17:05 ` Andy Wingo 0 siblings, 0 replies; 6+ messages in thread From: Andy Wingo @ 2018-07-05 17:05 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hi :) On Mon 02 Jul 2018 11:28, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> skribis: > >> My current plan is that the frame overhead will still be two slots: the >> saved previous FP, and the saved return address. Right now the return >> address is always a bytecode address. In the future it will be bytecode >> or native code. Guile will keep a runtime routine marking regions of >> native code so it can know if it needs to if an RA is bytecode or native >> code, for debugging reasons; but in most operation, Guile won't need to >> know. The interpreter will tier up to JIT code through an adapter frame >> that will do impedance matching over virtual<->physical addresses. To >> tier down to the interpreter (e.g. when JIT code calls interpreted >> code), the JIT will simply return to the interpreter, which will pick up >> state from the virtual IP, SP, and FP saved in the VM state. > > What will the “adapter frame” look like? Aah, sadly it won't work like this. Somehow I was thinking of an adapter frame on the C stack. However an adapter frame corresponds to a continuation, so it would have to have the life of a continuation, so it would have to be on the VM stack. I don't think I want adapter frames on the VM stack, so I have to scrap this. More below... >> We do walk the stack from Scheme sometimes, notably when making a >> backtrace. So, we'll make the runtime translate the JIT return >> addresses to virtual return addresses in the frame API. To Scheme, it >> will be as if all things were interpreted. > > Currently you can inspect the locals of a stack frame. Will that be > possible with frames corresponding to native code? (I suppose that’d be > difficult.) Yes, because native code manipulates the VM stack in exactly the same way as bytecode. Eventually we should do register allocation and avoid always writing values to the stack, but that is down the road. >> My current problem is knowing when a callee has JIT code. Say you're in >> JITted function F which calls G. Can you directly jump to G's native >> code, or is G not compiled yet and you need to use the interpreter? I >> haven't solved this yet. "Known calls" that use call-label and similar >> can of course eagerly ensure their callees are JIT-compiled, at >> compilation time. Unknown calls are the problem. I don't know whether >> to consider reserving another word in scm_tc7_program objects for JIT >> code. I have avoided JIT overhead elsewhere and would like to do so >> here as well! > > In the absence of a native code pointer in scm_tc7_program objects, how > will libguile find the native code for a given program? This is a good question and it was not clear to me when I wrote this! I think I have a solution now but it involves memory overhead. Oh well. Firstly, I propose to add a slot to stack frames. Stack frames will now store the saved FP, the virtual return address (vRA), and the machine return address IP (mRA). When in JIT code, a return will check if the mRA is nonzero, and if so jump to that mRA. Otherwise it will return from JIT, and the interpreter should continue. Likewise when doing a function return from the interpreter and the mRA is nonzero, the interpreter should return by entering JIT code to that address. When building an interpreter-only Guile (Guile without JIT) or an AOT-only Guile (doesn't exist currently), we could configure Guile to not reserve this extra stack word. However that would be a different ABI: a .go file built with interpreter-only Guile wouldn't work on Guile-with-JIT, because interpreter-only Guile would think stack frames only need two reserved words, whereas Guile-with-JIT would write three words. To avoid the complication, for 3.0 I think we should just use 3-word frames all the time. So, that's returns. Other kinds of non-local returns like abort-to-prompt, resuming delimited continuations, or calling undelimited continuations would work similarly: the continuation would additionally record an mRA, and resuming would jump there instead, if appropriate. Now, calls. One of the reasons that I wanted to avoid an extra program word was because scm_tc7_program doesn't exist in a one-to-one relationship with code. "Well-known" procedures get compiled by closure optimization to be always called via call-label or tail-call-label -- so some code doesn't have program objects. On the other hand, closures mean that some code has many program objects. So I thought about using side tables indexed by code; or inline "maybe-tier-up-here" instructions, which would reference a code pointer location, that if nonzero, would be the JIT code. However I see now that really we need to optimize for the JIT-to-JIT call case, as by definition that's going to be the hot case. Of course call-label from JIT can do an unconditional jmp. But calling a program object... how do we do this? This is complicated by code pages being read-only, so we don't have space to store a pointer in with the code. I think the answer is, like you say, another word in program objects. JIT code will then do, when it sees a call: if (SCM_PROGRAM_P (x) && SCM_PROGRAM_MCODE (x)) goto *SCM_PROGRAM_MCODE (x); vp->ip = SCM_PROGRAM_CODE (x); return; // Handle call in interpreter. which is reasonably direct and terse. I can't think of any other option that would be reasonably fast. Now, we're left with a couple of issues: what to do about call-label in interpreted code? Also: when should we JIT? Let's take the second question first. I think we should JIT based on counters, i.e. each function has an associated counter, and when that counter overflows (or underflows, depending on how we configure it), then we JIT that function. This has the advantage of being deterministic. Another option would be statprof-like profiling, which is cool because it measures real run time, but beside not being terrible portable, it isn't deterministic, so it makes bugs hard to reproduce. Another option would be a side hash table keyed by code address. This is what LuaJIT does but you get collisions and since addresses are randomized, that also makes JIT unpredictable. Usually your counter limit is set depending on how expensive it is to JIT. If JIT were free the counter limit would be 0. Guile's JIT will be cheap, though not free. Additionally some JITs set higher counter values because they do some form of adaptive optimization and they want to see more values. E.g. JavaScriptCore sets its default threshold for its first tier at 500, incrementing by 15 for each call, 10 for each return, and 1 for each loop iteration. But LuaJIT sets the threshold at 56, incrementing by 1 each call and 2 each loop. Weird, right? Anyway that's how it is. We'll have to measure. Note that there are two ways you might want to tier up to JIT from the interpreter -- at function entry, and in loops. Given that JIT code and interpreter code manipulates the stack in the same way, you *can* tier up anywhere -- but those are the places that it's worth checking. Function entry is the essential place. That's when you have the scm_tc7_program in slot 0, if that's how the procedure is compiled, and that's when you can patch the JIT code slot in the program object. The counter itself should be statically allocated in the image rather than existing as part of the scm_tc7_program object, because multiple closures can share code. So, that to me says: * we make a pass in the compiler in a late stage to determine which function entry labels will be ever put in a scm_tc7_program * we add "enter-function" opcodes to the beginning of those functions' code that will increment a statically allocated counter, maybe tier up and patch the program object. * the next call to those functions will see the JIT code in the program and jump in directly For nests of functions that call each other via call-label, I was thinking that these functions should be compiled all at once. But in the case of a big nested function where many things are visible -- the style of programming used in the fastest program -- then there we could cause too much latency, by biting off too much at once. So perhaps there we should use enter-function as well, just with a flag that this function has no closure so there's nothing to patch. If, when compiling, we see a call-label, we can speculatively see if there's JIT code for the callee already, and in that case emit a direct jump; but otherwise we can instead emit an indirect jump. The JIT code pointer would then be not in the scm_tc7_program, but statically allocated in the image. In fact probably we want statically allocated JIT code pointers in the image anyway so that when one closure tiers up, the next closures tier up "for free" on their next call just by checking that pointer. Tier-up sites inside functions are ephemeral: they are typically only entered once. The next time the function is entered it's via JIT, usually. So, no need to cache them, we can just tier up the function (ensuring its corresponding statically allocated JIT pointer is set), then jump to the offset in the JIT code corresponding to that bytecode offset. Probably we can make "handle-interrupts" do this tier-up. OK this mail is long enough!!! Does it make things clearer though? To-do: * adapt compiler and runtime for 3-word stack frames * adapt compiler and runtime for extra word in scm_tc7_program * allocate counter and code pointer for each function * adapt compiler and runtime to insert opcode at program entry (maybe it can run the apply hook!), with flag indicating whether argv[0] might need patching * re-take JIT implementation Cheers, Andy ps. I am thinking of calling bytecode "vcode" in the future, for virtual code, and JIT code "mcode", for machine code. WDYT? :) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-07-17 10:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-06-29 8:13 guile 3 update, june 2018 edition Andy Wingo 2018-06-30 0:40 ` dsmich 2018-07-02 4:02 ` dsmich 2018-07-17 10:32 ` dsmich 2018-07-02 9:28 ` Ludovic Courtès 2018-07-05 17:05 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).