guile 3 update, june 2018 edition

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* guile 3 update, june 2018 edition
@ 2018-06-29  8:13 Andy Wingo
  2018-06-30  0:40 ` dsmich
  2018-07-02  9:28 ` Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: Andy Wingo @ 2018-06-29  8:13 UTC (permalink / raw)
  To: guile-devel

Hi,

Just wanted to give an update on Guile 3 developments.  Last note was
here:

  https://lists.gnu.org/archive/html/guile-devel/2018-04/msg00004.html

The news is that the VM has been completely converted over to call out
to the Guile runtime through an "intrinsics" vtable.  For some
intrinsics, the compiler will emit specialized call-intrinsic opcodes.
(There's one of these opcodes for each intrinsic function type.)  For
others that are a bit more specialized, like the intrinsic used in
call-with-prompt, the VM calls out directly to the intrinsic.

The upshot is that we're now ready to do JIT compilation.  JIT-compiled
code will use the intrinsics vtable to embed references to runtime
routines.  In some future, AOT-compiled code can keep the intrinsics
vtable in a register, and call indirectly through that register.

My current plan is that the frame overhead will still be two slots: the
saved previous FP, and the saved return address.  Right now the return
address is always a bytecode address.  In the future it will be bytecode
or native code.  Guile will keep a runtime routine marking regions of
native code so it can know if it needs to if an RA is bytecode or native
code, for debugging reasons; but in most operation, Guile won't need to
know.  The interpreter will tier up to JIT code through an adapter frame
that will do impedance matching over virtual<->physical addresses.  To
tier down to the interpreter (e.g. when JIT code calls interpreted
code), the JIT will simply return to the interpreter, which will pick up
state from the virtual IP, SP, and FP saved in the VM state.

We do walk the stack from Scheme sometimes, notably when making a
backtrace.  So, we'll make the runtime translate the JIT return
addresses to virtual return addresses in the frame API.  To Scheme, it
will be as if all things were interpreted.

This strategy relies on the JIT being a simple code generator, not an
optimizer -- the state of the stack whether JIT or interpreted is the
same.  We can consider relaxing this in the future.

My current problem is knowing when a callee has JIT code.  Say you're in
JITted function F which calls G.  Can you directly jump to G's native
code, or is G not compiled yet and you need to use the interpreter?  I
haven't solved this yet.  "Known calls" that use call-label and similar
can of course eagerly ensure their callees are JIT-compiled, at
compilation time.  Unknown calls are the problem.  I don't know whether
to consider reserving another word in scm_tc7_program objects for JIT
code.  I have avoided JIT overhead elsewhere and would like to do so
here as well!

For actual JIT code generation, I think my current plan is to import a
copy of GNU lightning into Guile's source, using git-subtree merges.
Lightning is fine for our purposes as we only need code generation, not
optimization, and it supports lots of architectures: ARM, MIPS, PPC,
SPARC, x86 / x86-64, IA64, HPPA, AArch64, S390, and Alpha.

Lightning will be built statically into libguile.  This has the
advantage that we always know the version being used, and we are able to
extend lightning without waiting for distros to pick up a new version.
Already we will need to extend it to support atomic ops.  Subtree merges
should allow us to pick up upstream improvements without too much pain.
This strategy also allows us to drop lightning in the future if that's
the right thing.  Basically from the user POV it should be transparent.
The whole thing will be behind an --enable-jit / --disable-jit configure
option.  When it is working we can consider enabling shared lightning
usage.

Happy hacking,

Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 3 update, june 2018 edition
  2018-06-29  8:13 guile 3 update, june 2018 edition Andy Wingo
@ 2018-06-30  0:40 ` dsmich
  2018-07-02  4:02   ` dsmich
  2018-07-02  9:28 ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: dsmich @ 2018-06-30  0:40 UTC (permalink / raw)
  To: guile-devel, Andy Wingo

Greetings Andy!

---- Andy Wingo <wingo@pobox.com> wrote: 
> Hi,
> 
> Just wanted to give an update on Guile 3 developments.  Last note was
> here:
> 
>   https://lists.gnu.org/archive/html/guile-devel/2018-04/msg00004.html
> 
> The news is that the VM has been completely converted over to call out
> to the Guile runtime through an "intrinsics" vtable.  For some
> intrinsics, the compiler will emit specialized call-intrinsic opcodes.
> (There's one of these opcodes for each intrinsic function type.)  For
> others that are a bit more specialized, like the intrinsic used in
> call-with-prompt, the VM calls out directly to the intrinsic.

Very exciting!

However, master is not building for me. :(

  git clean -dxf; ./autogen.sh && ./configure && make -j5

gives me

  SNARF  atomic.x
  SNARF  backtrace.x
  SNARF  boolean.x
In file included from atomic.c:29:0:
extensions.h:26:30: fatal error: libguile/libpath.h: No such file or directory
 #include "libguile/libpath.h"
                              ^
compilation terminated.
Makefile:3893: recipe for target 'atomic.x' failed
make[2]: *** [atomic.x] Error 1



Maybe some dependency tuning is needed?

 

So.  Building without -j :

  make clean; make

gives gives a segfault when generating the docs


  SNARF  regex-posix.doc
  GEN      guile-procedures.texi
Uncaught exception:
Backtrace:
/bin/bash: line 1: 13428 Broken pipe             cat alist.doc array-handle.doc array-map.doc arrays.doc async.doc atomic.doc backtrace.doc boolean.doc bitvectors.doc bytevectors.doc chars.doc control.doc continuations.doc debug.doc deprecated.doc deprecation.doc dynl.doc dynwind.doc eq.doc error.doc eval.doc evalext.doc expand.doc extensions.doc fdes-finalizers.doc feature.doc filesys.doc fluids.doc foreign.doc fports.doc gc-malloc.doc gc.doc gettext.doc generalized-arrays.doc generalized-vectors.doc goops.doc gsubr.doc guardians.doc hash.doc hashtab.doc hooks.doc i18n.doc init.doc ioext.doc keywords.doc list.doc load.doc macros.doc mallocs.doc memoize.doc modules.doc numbers.doc objprop.doc options.doc pairs.doc ports.doc print.doc procprop.doc procs.doc promises.doc r6rs-ports.doc random.doc rdelim.doc read.doc rw.doc scmsigs.doc script.doc simpos.doc smob.doc sort.doc srcprop.doc srfi-1.doc srfi-4.doc srfi-13.doc srfi-14.doc srfi-60.doc stackchk.doc stacks.doc stime.doc strings.doc strorder.doc strports.doc struct.doc symbols.doc syntax.doc threads.doc throw.doc trees.doc unicode.doc uniform.doc values.doc variable.doc vectors.doc version.doc vports.doc weak-set.doc weak-table.doc weak-vector.doc dynl.doc posix.doc net_db.doc socket.doc regex-posix.doc
     13429 Segmentation fault      | GUILE_AUTO_COMPILE=0 ../meta/build-env guild snarf-check-and-output-texi > guile-procedures.texi
Makefile:3910: recipe for target 'guile-procedures.texi' failed

This is

$ git describe
v2.2.2-504-gb5dcdf2e2

And gcc is
$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

On an up to date Debian 9.4 system:
$ uname -a
Linux debmetrix 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux


-Dale





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 3 update, june 2018 edition
  2018-06-30  0:40 ` dsmich
@ 2018-07-02  4:02   ` dsmich
  2018-07-17 10:32     ` dsmich
  0 siblings, 1 reply; 6+ messages in thread
From: dsmich @ 2018-07-02  4:02 UTC (permalink / raw)
  To: guile-devel, dsmich, Andy Wingo

Ok! now getting past the "make -j" issue, but I'm still getting a segfault.

Here is a backtrace from the core dump.

Line 25:
#25 0x00007efeb518b09f in scm_error (key=0x563599bbb120, subr=subr@entry=0x0, message=message@entry=0x7efeb521c0cd "Unbound variable: ~S", args=0x563599f8f260, 
    rest=rest@entry=0x4) at error.c:62

Looks kinda suspicious.  Should subr be 0x0 there?


Thread 4 (Thread 0x7efeb282c700 (LWP 10059)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  0x00007efeb49f6494 in start_thread (arg=0x7efeb282c700) at pthread_create.c:333
#5  0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 3 (Thread 0x7efeb382e700 (LWP 10057)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  0x00007efeb49f6494 in start_thread (arg=0x7efeb382e700) at pthread_create.c:333
#5  0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 2 (Thread 0x7efeb302d700 (LWP 10058)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007efeb4401cc7 in GC_wait_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#2  0x00007efeb43f85ca in GC_help_marker () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#3  0x00007efeb440033c in GC_mark_thread () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#4  0x00007efeb49f6494 in start_thread (arg=0x7efeb302d700) at pthread_create.c:333
#5  0x00007efeb4738acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 1 (Thread 0x7efeb565d740 (LWP 10034)):
#0  0x00007efeb51afd26 in scm_maybe_resolve_module (name=name@entry=0x563599f8f140) at modules.c:195
#1  0x00007efeb51b01bf in scm_public_variable (module_name=0x563599f8f140, name=0x563599da50e0) at modules.c:656
#2  0x00007efeb518036a in init_print_frames_var_and_frame_to_stack_vector_var () at backtrace.c:103
#3  0x00007efeb49fd739 in __pthread_once_slow (once_control=0x7efeb545d828 <once>, init_routine=0x7efeb5180340 <init_print_frames_var_and_frame_to_stack_vector_var>)
    at pthread_once.c:116
#4  0x00007efeb49fd7e5 in __GI___pthread_once (once_control=once_control@entry=0x7efeb545d828 <once>, 
    init_routine=init_routine@entry=0x7efeb5180340 <init_print_frames_var_and_frame_to_stack_vector_var>) at pthread_once.c:143
#5  0x00007efeb51801b0 in display_backtrace_body (a=0x7ffe2b3b7ea0) at backtrace.c:218
#6  0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#7  0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da5aa0, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440
#8  0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da5aa0) at eval.c:489
#9  0x00007efeb51f8cd6 in catch (tag=tag@entry=0x404, thunk=0x563599da5aa0, handler=0x563599da5940, pre_unwind_handler=0x4) at throw.c:144
#10 0x00007efeb51f9015 in scm_catch_with_pre_unwind_handler (key=key@entry=0x404, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>)
    at throw.c:262
#11 0x00007efeb51f91cf in scm_c_catch (tag=tag@entry=0x404, body=body@entry=0x7efeb5180190 <display_backtrace_body>, body_data=body_data@entry=0x7ffe2b3b7ea0, 
    handler=handler@entry=0x7efeb5180580 <error_during_backtrace>, handler_data=handler_data@entry=0x563599baf000, pre_unwind_handler=pre_unwind_handler@entry=0x0, 
    pre_unwind_handler_data=0x0) at throw.c:387
#12 0x00007efeb51f91de in scm_internal_catch (tag=tag@entry=0x404, body=body@entry=0x7efeb5180190 <display_backtrace_body>, body_data=body_data@entry=0x7ffe2b3b7ea0, 
    handler=handler@entry=0x7efeb5180580 <error_during_backtrace>, handler_data=handler_data@entry=0x563599baf000) at throw.c:396
#13 0x00007efeb5180185 in scm_display_backtrace_with_highlights (stack=stack@entry=0x563599da5b60, port=port@entry=0x563599baf000, first=first@entry=0x4, 
    depth=depth@entry=0x4, highlights=highlights@entry=0x304) at backtrace.c:277
#14 0x00007efeb51f8fec in handler_message (tag=tag@entry=0x563599bbb120, args=args@entry=0x563599c0cdb0, handler_data=<optimized out>) at throw.c:548
#15 0x00007efeb51f93cb in scm_handle_by_message (handler_data=<optimized out>, tag=0x563599bbb120, args=0x563599c0cdb0) at throw.c:585
#16 0x00007efeb51f94fe in default_exception_handler (args=0x563599c0cdb0, k=0x563599bbb120) at throw.c:174
#17 throw_without_pre_unwind (tag=0x563599bbb120, args=0x563599c0cdb0) at throw.c:248
#18 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#19 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599baf9c0, argv=<optimized out>, nargs=5) at vm.c:1440
---Type <return> to continue, or q <return> to quit---
#20 0x00007efeb518ce4b in scm_apply_0 (proc=0x563599baf9c0, args=0x304) at eval.c:602
#21 0x00007efeb518da4d in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=0x563599bbb120, args=args@entry=0x563599f8f220) at eval.c:608
#22 0x00007efeb51f9056 in scm_throw (key=key@entry=0x563599bbb120, args=0x563599f8f220) at throw.c:274
#23 0x00007efeb51f95d9 in scm_ithrow (key=key@entry=0x563599bbb120, args=<optimized out>, no_return=no_return@entry=1) at throw.c:621
#24 0x00007efeb518b005 in scm_error_scm (key=key@entry=0x563599bbb120, subr=<optimized out>, message=message@entry=0x563599da5c20, args=args@entry=0x563599f8f260, 
    data=data@entry=0x4) at error.c:90
#25 0x00007efeb518b09f in scm_error (key=0x563599bbb120, subr=subr@entry=0x0, message=message@entry=0x7efeb521c0cd "Unbound variable: ~S", args=0x563599f8f260, 
    rest=rest@entry=0x4) at error.c:62
#26 0x00007efeb517a18b in error_unbound_variable (symbol=symbol@entry=0x563599b34440) at memoize.c:842
#27 0x00007efeb51ad66d in scm_sys_resolve_variable (loc=<optimized out>, loc@entry=0x563599f6cc20, mod=<optimized out>) at memoize.c:868
#28 0x00007efeb518d622 in eval (x=<optimized out>, x@entry=0x563599f6cc10, env=<optimized out>) at eval.c:431
#29 0x00007efeb518d4e1 in eval (x=<optimized out>, env=<optimized out>) at eval.c:414
#30 0x00007efeb518d380 in eval (x=<optimized out>, env=<optimized out>) at eval.c:338
#31 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944
#32 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#33 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da3280, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440
#34 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da3280) at eval.c:489
#35 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370
#36 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944
#37 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#38 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da3300, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440
#39 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da3300) at eval.c:489
#40 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370
#41 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944
#42 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#43 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599da34a0, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1440
#44 0x00007efeb518cab9 in scm_call_0 (proc=proc@entry=0x563599da34a0) at eval.c:489
#45 0x00007efeb518d58d in eval (x=<optimized out>, env=<optimized out>) at eval.c:370
#46 0x00007efeb518d40a in eval (x=<optimized out>, env=<optimized out>) at eval.c:354
#47 0x00007efeb518d737 in prepare_boot_closure_env_for_eval (inout_env=0x7ffe2b3b93c0, out_body=0x7ffe2b3b93c8, exps=0x563599f65c70, argc=<optimized out>, 
    proc=0x563599da37e0) at eval.c:933
#48 eval (x=<optimized out>, env=<optimized out>) at eval.c:344
#49 0x00007efeb518d40a in eval (x=<optimized out>, env=<optimized out>) at eval.c:354
#50 0x00007efeb518d737 in prepare_boot_closure_env_for_eval (inout_env=0x7ffe2b3b9700, out_body=0x7ffe2b3b9708, exps=0x563599f5cc20, argc=<optimized out>, 
    proc=0x563599bec340) at eval.c:933
#51 eval (x=<optimized out>, env=<optimized out>) at eval.c:344
#52 0x00007efeb518d333 in eval (x=<optimized out>, env=<optimized out>) at eval.c:282
#53 0x00007efeb518da0f in boot_closure_apply (closure=<optimized out>, args=<optimized out>) at eval.c:944
#54 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#55 0x00007efeb52046d3 in scm_call_n (proc=0x563599d3c060, argv=argv@entry=0x7ffe2b3b9bb8, nargs=nargs@entry=1) at vm.c:1440
#56 0x00007efeb518cad8 in scm_call_1 (proc=<optimized out>, arg1=<optimized out>, arg1@entry=0x563599f8b4f0) at eval.c:495
#57 0x00007efeb518d9b8 in scm_c_primitive_eval (exp=0x563599f8b4f0) at eval.c:662
#58 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#59 0x00007efeb52046d3 in scm_call_n (proc=0x563599bbe860, argv=argv@entry=0x7ffe2b3b9dc8, nargs=nargs@entry=1) at vm.c:1440
#60 0x00007efeb518dbb7 in scm_primitive_eval (exp=<optimized out>) at eval.c:670
#61 0x00007efeb51aa11b in scm_primitive_load (filename=filename@entry=0x563599c0dbc0) at load.c:130
#62 0x00007efeb51ab6f0 in scm_primitive_load_path (args=<optimized out>) at load.c:1266
#63 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#64 0x00007efeb52046d3 in scm_call_n (proc=proc@entry=0x563599ba03e0, argv=argv@entry=0x7ffe2b3ba160, nargs=nargs@entry=1) at vm.c:1440
#65 0x00007efeb518d43b in eval (x=<optimized out>, env=<optimized out>) at eval.c:356
#66 0x00007efeb520040f in vm_regular_engine (thread=0x563599b14dc0) at vm-engine.c:610
#67 0x00007efeb52046d3 in scm_call_n (proc=0x563599bbe860, argv=argv@entry=0x7ffe2b3ba4f8, nargs=nargs@entry=1) at vm.c:1440
#68 0x00007efeb518dbb7 in scm_primitive_eval (exp=<optimized out>) at eval.c:670
---Type <return> to continue, or q <return> to quit---
#69 0x00007efeb51aa11b in scm_primitive_load (filename=filename@entry=0x563599bc0740) at load.c:130
#70 0x00007efeb51ab6f0 in scm_primitive_load_path (args=<optimized out>) at load.c:1266
#71 0x00007efeb51abb15 in scm_c_primitive_load_path (filename=filename@entry=0x7efeb521a4bd "ice-9/boot-9") at load.c:1274
#72 0x00007efeb51a4307 in scm_load_startup_files () at init.c:251
#73 0x00007efeb51a46b1 in scm_i_init_guile (base=<optimized out>) at init.c:531
#74 0x00007efeb51f7888 in scm_i_init_thread_for_guile (base=0x7ffe2b3ba730, dynamic_state=0x0) at threads.c:574
#75 0x00007efeb51f78b9 in with_guile (base=0x7ffe2b3ba730, data=0x7ffe2b3ba760) at threads.c:642
#76 0x00007efeb43fb3c2 in GC_call_with_stack_base () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#77 0x00007efeb51f7c78 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7ffe2b3ba760, func=func@entry=0x7efeb51a41b0 <invoke_main_func>)
    at threads.c:692
#78 scm_with_guile (func=func@entry=0x7efeb51a41b0 <invoke_main_func>, data=data@entry=0x7ffe2b3ba790) at threads.c:698
#79 0x00007efeb51a4362 in scm_boot_guile (argc=6, argv=0x7ffe2b3ba8e8, main_func=0x563598b4fb40 <inner_main>, closure=0x0) at init.c:319
#80 0x0000563598b4f9a4 in main (argc=6, argv=0x7ffe2b3ba8e8) at guile.c:95






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 3 update, june 2018 edition
  2018-07-02  4:02   ` dsmich
@ 2018-07-17 10:32     ` dsmich
  0 siblings, 0 replies; 6+ messages in thread
From: dsmich @ 2018-07-17 10:32 UTC (permalink / raw)
  To: guile-devel, dsmich, Andy Wingo


---- dsmich@roadrunner.com wrote: 
> Ok! now getting past the "make -j" issue, but I'm still getting a segfault.

And now commit e6461cf1b2b63e3ec9a2867731742db552b61b71 has gotten past the segfault.

Wooo!

-Dale




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 3 update, june 2018 edition
  2018-06-29  8:13 guile 3 update, june 2018 edition Andy Wingo
  2018-06-30  0:40 ` dsmich
@ 2018-07-02  9:28 ` Ludovic Courtès
  2018-07-05 17:05   ` Andy Wingo
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2018-07-02  9:28 UTC (permalink / raw)
  To: guile-devel

Hello!

Andy Wingo <wingo@pobox.com> skribis:

> The news is that the VM has been completely converted over to call out
> to the Guile runtime through an "intrinsics" vtable.  For some
> intrinsics, the compiler will emit specialized call-intrinsic opcodes.
> (There's one of these opcodes for each intrinsic function type.)  For
> others that are a bit more specialized, like the intrinsic used in
> call-with-prompt, the VM calls out directly to the intrinsic.
>
> The upshot is that we're now ready to do JIT compilation.  JIT-compiled
> code will use the intrinsics vtable to embed references to runtime
> routines.  In some future, AOT-compiled code can keep the intrinsics
> vtable in a register, and call indirectly through that register.

Exciting!  It sounds like a really good strategy because it means that
the complex instructions don’t have to be implemented in lightning
assembly by hand, which would be a pain.

> My current plan is that the frame overhead will still be two slots: the
> saved previous FP, and the saved return address.  Right now the return
> address is always a bytecode address.  In the future it will be bytecode
> or native code.  Guile will keep a runtime routine marking regions of
> native code so it can know if it needs to if an RA is bytecode or native
> code, for debugging reasons; but in most operation, Guile won't need to
> know.  The interpreter will tier up to JIT code through an adapter frame
> that will do impedance matching over virtual<->physical addresses.  To
> tier down to the interpreter (e.g. when JIT code calls interpreted
> code), the JIT will simply return to the interpreter, which will pick up
> state from the virtual IP, SP, and FP saved in the VM state.

What will the “adapter frame” look like?

> We do walk the stack from Scheme sometimes, notably when making a
> backtrace.  So, we'll make the runtime translate the JIT return
> addresses to virtual return addresses in the frame API.  To Scheme, it
> will be as if all things were interpreted.

Currently you can inspect the locals of a stack frame.  Will that be
possible with frames corresponding to native code? (I suppose that’d be
difficult.)

> My current problem is knowing when a callee has JIT code.  Say you're in
> JITted function F which calls G.  Can you directly jump to G's native
> code, or is G not compiled yet and you need to use the interpreter?  I
> haven't solved this yet.  "Known calls" that use call-label and similar
> can of course eagerly ensure their callees are JIT-compiled, at
> compilation time.  Unknown calls are the problem.  I don't know whether
> to consider reserving another word in scm_tc7_program objects for JIT
> code.  I have avoided JIT overhead elsewhere and would like to do so
> here as well!

In the absence of a native code pointer in scm_tc7_program objects, how
will libguile find the native code for a given program?

Thanks for sharing this plan!  Good times ahead!

Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 3 update, june 2018 edition
  2018-07-02  9:28 ` Ludovic Courtès
@ 2018-07-05 17:05   ` Andy Wingo
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Wingo @ 2018-07-05 17:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi :)

On Mon 02 Jul 2018 11:28, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> My current plan is that the frame overhead will still be two slots: the
>> saved previous FP, and the saved return address.  Right now the return
>> address is always a bytecode address.  In the future it will be bytecode
>> or native code.  Guile will keep a runtime routine marking regions of
>> native code so it can know if it needs to if an RA is bytecode or native
>> code, for debugging reasons; but in most operation, Guile won't need to
>> know.  The interpreter will tier up to JIT code through an adapter frame
>> that will do impedance matching over virtual<->physical addresses.  To
>> tier down to the interpreter (e.g. when JIT code calls interpreted
>> code), the JIT will simply return to the interpreter, which will pick up
>> state from the virtual IP, SP, and FP saved in the VM state.
>
> What will the “adapter frame” look like?

Aah, sadly it won't work like this.  Somehow I was thinking of an
adapter frame on the C stack.  However an adapter frame corresponds to a
continuation, so it would have to have the life of a continuation, so it
would have to be on the VM stack.  I don't think I want adapter frames
on the VM stack, so I have to scrap this.  More below...

>> We do walk the stack from Scheme sometimes, notably when making a
>> backtrace.  So, we'll make the runtime translate the JIT return
>> addresses to virtual return addresses in the frame API.  To Scheme, it
>> will be as if all things were interpreted.
>
> Currently you can inspect the locals of a stack frame.  Will that be
> possible with frames corresponding to native code? (I suppose that’d be
> difficult.)

Yes, because native code manipulates the VM stack in exactly the same
way as bytecode.  Eventually we should do register allocation and avoid
always writing values to the stack, but that is down the road.

>> My current problem is knowing when a callee has JIT code.  Say you're in
>> JITted function F which calls G.  Can you directly jump to G's native
>> code, or is G not compiled yet and you need to use the interpreter?  I
>> haven't solved this yet.  "Known calls" that use call-label and similar
>> can of course eagerly ensure their callees are JIT-compiled, at
>> compilation time.  Unknown calls are the problem.  I don't know whether
>> to consider reserving another word in scm_tc7_program objects for JIT
>> code.  I have avoided JIT overhead elsewhere and would like to do so
>> here as well!
>
> In the absence of a native code pointer in scm_tc7_program objects, how
> will libguile find the native code for a given program?

This is a good question and it was not clear to me when I wrote this!  I
think I have a solution now but it involves memory overhead.  Oh well.

Firstly, I propose to add a slot to stack frames.  Stack frames will now
store the saved FP, the virtual return address (vRA), and the machine
return address IP (mRA).  When in JIT code, a return will check if the
mRA is nonzero, and if so jump to that mRA.  Otherwise it will return
from JIT, and the interpreter should continue.

Likewise when doing a function return from the interpreter and the mRA
is nonzero, the interpreter should return by entering JIT code to that
address.

When building an interpreter-only Guile (Guile without JIT) or an
AOT-only Guile (doesn't exist currently), we could configure Guile to
not reserve this extra stack word.  However that would be a different
ABI: a .go file built with interpreter-only Guile wouldn't work on
Guile-with-JIT, because interpreter-only Guile would think stack frames
only need two reserved words, whereas Guile-with-JIT would write three
words.  To avoid the complication, for 3.0 I think we should just use
3-word frames all the time.

So, that's returns.  Other kinds of non-local returns like
abort-to-prompt, resuming delimited continuations, or calling
undelimited continuations would work similarly: the continuation would
additionally record an mRA, and resuming would jump there instead, if
appropriate.

Now, calls.  One of the reasons that I wanted to avoid an extra program
word was because scm_tc7_program doesn't exist in a one-to-one
relationship with code.  "Well-known" procedures get compiled by closure
optimization to be always called via call-label or tail-call-label -- so
some code doesn't have program objects.  On the other hand, closures
mean that some code has many program objects.

So I thought about using side tables indexed by code; or inline
"maybe-tier-up-here" instructions, which would reference a code pointer
location, that if nonzero, would be the JIT code.

However I see now that really we need to optimize for the JIT-to-JIT
call case, as by definition that's going to be the hot case.  Of course
call-label from JIT can do an unconditional jmp.  But calling a program
object... how do we do this?  This is complicated by code pages being
read-only, so we don't have space to store a pointer in with the code.

I think the answer is, like you say, another word in program objects.
JIT code will then do, when it sees a call:

  if (SCM_PROGRAM_P (x) && SCM_PROGRAM_MCODE (x))
     goto *SCM_PROGRAM_MCODE (x);
  vp->ip = SCM_PROGRAM_CODE (x);
  return; // Handle call in interpreter.

which is reasonably direct and terse.  I can't think of any other option
that would be reasonably fast.

Now, we're left with a couple of issues: what to do about call-label in
interpreted code?  Also: when should we JIT?

Let's take the second question first.  I think we should JIT based on
counters, i.e. each function has an associated counter, and when that
counter overflows (or underflows, depending on how we configure it),
then we JIT that function.  This has the advantage of being
deterministic.  Another option would be statprof-like profiling, which
is cool because it measures real run time, but beside not being terrible
portable, it isn't deterministic, so it makes bugs hard to reproduce.
Another option would be a side hash table keyed by code address.  This
is what LuaJIT does but you get collisions and since addresses are
randomized, that also makes JIT unpredictable.

Usually your counter limit is set depending on how expensive it is to
JIT.  If JIT were free the counter limit would be 0.  Guile's JIT will
be cheap, though not free.  Additionally some JITs set higher counter
values because they do some form of adaptive optimization and they want
to see more values.  E.g. JavaScriptCore sets its default threshold for
its first tier at 500, incrementing by 15 for each call, 10 for each
return, and 1 for each loop iteration.  But LuaJIT sets the threshold at
56, incrementing by 1 each call and 2 each loop.  Weird, right?  Anyway
that's how it is.  We'll have to measure.

Note that there are two ways you might want to tier up to JIT from the
interpreter -- at function entry, and in loops.  Given that JIT code and
interpreter code manipulates the stack in the same way, you *can* tier
up anywhere -- but those are the places that it's worth checking.

Function entry is the essential place.  That's when you have the
scm_tc7_program in slot 0, if that's how the procedure is compiled, and
that's when you can patch the JIT code slot in the program object.

The counter itself should be statically allocated in the image rather
than existing as part of the scm_tc7_program object, because multiple
closures can share code.  So, that to me says:

  * we make a pass in the compiler in a late stage to determine which
    function entry labels will be ever put in a scm_tc7_program

  * we add "enter-function" opcodes to the beginning of those functions'
    code that will increment a statically allocated counter, maybe tier
    up and patch the program object.

  * the next call to those functions will see the JIT code in the
    program and jump in directly

For nests of functions that call each other via call-label, I was
thinking that these functions should be compiled all at once.  But in
the case of a big nested function where many things are visible -- the
style of programming used in the fastest program -- then there we could
cause too much latency, by biting off too much at once.  So perhaps
there we should use enter-function as well, just with a flag that this
function has no closure so there's nothing to patch.  If, when
compiling, we see a call-label, we can speculatively see if there's JIT
code for the callee already, and in that case emit a direct jump; but
otherwise we can instead emit an indirect jump.  The JIT code pointer
would then be not in the scm_tc7_program, but statically allocated in
the image.

In fact probably we want statically allocated JIT code pointers in the
image anyway so that when one closure tiers up, the next closures tier
up "for free" on their next call just by checking that pointer.

Tier-up sites inside functions are ephemeral: they are typically only
entered once.  The next time the function is entered it's via JIT,
usually.  So, no need to cache them, we can just tier up the function
(ensuring its corresponding statically allocated JIT pointer is set),
then jump to the offset in the JIT code corresponding to that bytecode
offset.

Probably we can make "handle-interrupts" do this tier-up.

OK this mail is long enough!!!  Does it make things clearer though?

To-do:
  * adapt compiler and runtime for 3-word stack frames
  * adapt compiler and runtime for extra word in scm_tc7_program
  * allocate counter and code pointer for each function
  * adapt compiler and runtime to insert opcode at program entry (maybe
    it can run the apply hook!), with flag indicating whether argv[0]
    might need patching
  * re-take JIT implementation

Cheers,

Andy

ps.  I am thinking of calling bytecode "vcode" in the future, for
virtual code, and JIT code "mcode", for machine code.  WDYT? :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-17 10:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-29  8:13 guile 3 update, june 2018 edition Andy Wingo
2018-06-30  0:40 ` dsmich
2018-07-02  4:02   ` dsmich
2018-07-17 10:32     ` dsmich
2018-07-02  9:28 ` Ludovic Courtès
2018-07-05 17:05   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).