unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* a plan for native compilation
@ 2010-04-16 11:09 Andy Wingo
  2010-04-16 20:47 ` No Itisnt
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andy Wingo @ 2010-04-16 11:09 UTC (permalink / raw)
  To: guile-devel

Hi,

I've been thinking some about native compilation, and how to do
ahead-of-time and just-in-time compilation, and not totally explode
Guile's mental footprint -- the number of cases and things that one has
to keep in mind when debugging Guile code.

Currently, Guile has a compiler to a custom virtual machine, and the
associated toolchain: assemblers and disassemblers, stack walkers, the
debugger, etc. One can get the source location of a particular
instruction pointer, for example. It's all mostly understandable, and
mostly uniform across platforms (modulo word size and byte order, and
perhaps we should make that uniform as well, especially if we can do
native compilation.)

So, my thought is to extend procedures with an additional pointer, a
pointer to a "native" code structure. The native code could be written
out ahead-of-time, or compiled at runtime. But procedures would still
have bytecode, for various purposes for example to enable code coverage
via the next-instruction hook, and in the JIT case, because only some
procedures will be native-compiled.

We keep the same stack representation, so stack walkers and the debugger
still work. Some local variables can be allocated into registers, but
procedure args are still passed and returned on the stack. Though the
procedure's arity and other metadata would be the same, the local
variable allocations and source locations would differ, so we would need
some additional debugger support, but we can work on that when the time
comes.

All Scheme procedures, bytecode and native, will run inside the VM. If a
bytecode procedure calls a native procedure, the machine registers are
saved, and some machine-specific stub transfers control to the native
code. Native code calling native code uses the same stack as the VM,
though it has its own conventions over what registers to save; and
native code calling bytecode prepares the Scheme stack, then restores
the VM-saved machine registers.

In this way we can do incremental experimentation with compiled
procedures, compiling at runtime, or ahead-of-time.

Now, what technology to choose for the compiler itself? Dunno. For a
JIT, it would be useful to use something portable, and perhaps do the
JIT compilation on the bytecode itself, without more source information.
It would not produce the fastest code, but it would run fast.

AIUI the hotspot compiler actually does an SSA transformation of Java
bytecode, then works on that. I'm not particularly interested in
something like that; I'm more interested in something direct and fast,
and obviously correct and understandable by our debugging
infrastructure.

I think we can produce better native code ahead-of-time coming from the
tree-il layer directly. I feel like eventually we'll need to replace
GLIL with something else, but I don't really know; we'll find out in the
future I guess. But I do want to do ahead-of-time compilation, because
I want Guile programs to start up very quickly and not consume much
memory.

Anyway, just some thoughts here. I'm not going to focus on native
compilation in the coming months, as there are other things to do, but
this is how I think it should be done :-)

Cheers,

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-16 11:09 a plan for native compilation Andy Wingo
@ 2010-04-16 20:47 ` No Itisnt
  2010-04-17 10:21   ` Andy Wingo
  2010-04-16 23:15 ` Ludovic Courtès
  2010-04-18  2:19 ` Ken Raeburn
  2 siblings, 1 reply; 11+ messages in thread
From: No Itisnt @ 2010-04-16 20:47 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

One option I am really starting to like is LLVM. I know what you're
thinking, huge memory consumption, giant dependency, etc, but it's so
cool! It supports every desktop architecture too.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-16 11:09 a plan for native compilation Andy Wingo
  2010-04-16 20:47 ` No Itisnt
@ 2010-04-16 23:15 ` Ludovic Courtès
  2010-04-17 11:19   ` Andy Wingo
  2010-04-18  2:19 ` Ken Raeburn
  2 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2010-04-16 23:15 UTC (permalink / raw)
  To: guile-devel

Howdy!

Andy Wingo <wingo@pobox.com> writes:

> So, my thought is to extend procedures with an additional pointer, a
> pointer to a "native" code structure.

(So your point is what should we do now to allow for such experiments
eventually, right?)

Adding an extra work to programs seems like a good idea, yes.

> Now, what technology to choose for the compiler itself? Dunno. For a
> JIT, it would be useful to use something portable, and perhaps do the
> JIT compilation on the bytecode itself, without more source information.
> It would not produce the fastest code, but it would run fast.

Yes, that’s what I had in mind, using GNU lightning (see
<http://www.fdn.fr/~lcourtes/software/guile/jit.html>.)  It /seems/ to
be doable, with milestones to do it incrementally, starting from a dumb
version.

> I think we can produce better native code ahead-of-time coming from the
> tree-il layer directly. I feel like eventually we'll need to replace
> GLIL with something else, but I don't really know; we'll find out in the
> future I guess. But I do want to do ahead-of-time compilation, because
> I want Guile programs to start up very quickly and not consume much
> memory.

Sure.

lightning does x86, x86_64, sparc, and powerpc (PLT uses it) while Sassy
does only x86, so it may be that both could play a role.

Anyway, not for now.  :-)

Thanks,
Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-16 20:47 ` No Itisnt
@ 2010-04-17 10:21   ` Andy Wingo
  2010-04-17 21:20     ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2010-04-17 10:21 UTC (permalink / raw)
  To: No Itisnt; +Cc: guile-devel

Hi,

On Fri 16 Apr 2010 22:47, No Itisnt <theseaisinhere@gmail.com> writes:

> One option I am really starting to like is LLVM. I know what you're
> thinking, huge memory consumption, giant dependency, etc, but it's so
> cool! It supports every desktop architecture too.

It's quite attractive! However I don't think it's the right thing for a
GNU project to use, especially one such as Guile that sits so low in the
stack.

It's a shame that GCC has not been able to support LLVM's level of
innovation, but perhaps that will change over time. The thing that's
clear to me is that it would be nice to use other parts of the GNU
toolchain for Guile's native compilation, but the LGPL/GPL thing makes
that a bit more difficult to think about -- and then there's the fact
that much of it is expressed as standalone binaries rather than
LLVM-like libraries.

Definitely something to think about, though, over the next 6-12 months:
how best to reuse existing work when thinking about AOT and JIT
compilation for Guile.

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-16 23:15 ` Ludovic Courtès
@ 2010-04-17 11:19   ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2010-04-17 11:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Greets,

On Sat 17 Apr 2010 01:15, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> writes:
>
>> So, my thought is to extend procedures with an additional pointer, a
>> pointer to a "native" code structure.
>
> (So your point is what should we do now to allow for such experiments
> eventually, right?)
>
> Adding an extra work to programs seems like a good idea, yes.

I'm not sure I have an immediate point ;) Also I don't think it's a good
idea to reserve a word "just-in-case". Better to do this work on a
future 2.2 branch. The word would probably go in the objcode structure,
also; a 5-word procedure object has bad cache implications.

>> Now, what technology to choose for the compiler itself? Dunno. For a
>> JIT, it would be useful to use something portable, and perhaps do the
>> JIT compilation on the bytecode itself, without more source information.
>> It would not produce the fastest code, but it would run fast.
>
> Yes, that’s what I had in mind, using GNU lightning (see
> <http://www.fdn.fr/~lcourtes/software/guile/jit.html>.)  It /seems/ to
> be doable, with milestones to do it incrementally, starting from a dumb
> version.

Yes, I had your model in mind. I think it's generally a good idea,
though I would like to be able to avoid trampolining.

>> I think we can produce better native code ahead-of-time coming from the
>> tree-il layer directly. I feel like eventually we'll need to replace
>> GLIL with something else, but I don't really know; we'll find out in the
>> future I guess. But I do want to do ahead-of-time compilation, because
>> I want Guile programs to start up very quickly and not consume much
>> memory.
>
> Sure.
>
> lightning does x86, x86_64, sparc, and powerpc (PLT uses it) while Sassy
> does only x86, so it may be that both could play a role.
>
> Anyway, not for now.  :-)

Agreed :) Just wanted to reify these thoughts, as breadcrumbs for the
future :)

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-17 10:21   ` Andy Wingo
@ 2010-04-17 21:20     ` Ludovic Courtès
  0 siblings, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2010-04-17 21:20 UTC (permalink / raw)
  To: guile-devel

Hi,

Andy Wingo <wingo@pobox.com> writes:

> It's a shame that GCC has not been able to support LLVM's level of
> innovation,

I don’t think innovation is the problem (did you read the 4.5 release
notes?).  However, it’s been too much of a monolithic compiler, unlike
LLVM, although plug-ins will now improve the situation.

Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-16 11:09 a plan for native compilation Andy Wingo
  2010-04-16 20:47 ` No Itisnt
  2010-04-16 23:15 ` Ludovic Courtès
@ 2010-04-18  2:19 ` Ken Raeburn
  2010-04-18 11:41   ` Andy Wingo
  2010-04-18 20:40   ` Ludovic Courtès
  2 siblings, 2 replies; 11+ messages in thread
From: Ken Raeburn @ 2010-04-18  2:19 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Good stuff, Andy!

On Apr 16, 2010, at 07:09, Andy Wingo wrote:
> Currently, Guile has a compiler to a custom virtual machine, and the
> associated toolchain: assemblers and disassemblers, stack walkers, the
> debugger, etc. One can get the source location of a particular
> instruction pointer, for example.

These are great... but if they're run-time features of Guile, they're useless when examining a core file.

It would be awesome if GDB could display this information when debugging a process, *and* when looking at a core file.  (For both JIT and AOT compilation, of course.)  It doesn't currently know about Scheme and Guile, so obviously some work would need to be done on that side.  It's got some rather clunky looking hooks for providing debug info associated with JIT compilation, which I think I mentioned in IRC a while back.  Maybe the GDB developers could be persuaded to support a more direct way of supplying debug info than the current mechanism, such as a pointer to DWARF data.  GDB would also need to learn about Scheme and Guile specifically, which would take cooperation from both groups.

Obviously, when looking at a core file, no helper code from the library can be executed.  Perhaps less obviously, with a live process, when doing simple things like looking at symbol values, you probably don't want to execute library code if it means enabling other threads to resume executing for a while as well.

GDB 7 supports supplying Python code to pretty-print selected object types, or defining new commands.  We could supply Python code for looking at SCM objects, maybe even walking the stack, if that turns out to be practical with the interfaces GDB supplies.

> So, my thought is to extend procedures with an additional pointer, a
> pointer to a "native" code structure. The native code could be written
> out ahead-of-time, or compiled at runtime. But procedures would still
> have bytecode, for various purposes for example to enable code coverage
> via the next-instruction hook, and in the JIT case, because only some
> procedures will be native-compiled.

I wondered about this, when looking briefly at what JIT compilation would need to generate given certain byte codes.  Would you generate code based on the debug or non-debug versions of the instructions?  What would the choice depend on?  Can both bytecode evaluators be used in one process and with the same bytecode object?

What about when profiling for performance?

> We keep the same stack representation, so stack walkers and the debugger
> still work. Some local variables can be allocated into registers, but
> procedure args are still passed and returned on the stack. Though the
> procedure's arity and other metadata would be the same, the local
> variable allocations and source locations would differ, so we would need
> some additional debugger support, but we can work on that when the time
> comes.

The "call" sequence would have to work a little differently from now, I think.  As you describe:

> All Scheme procedures, bytecode and native, will run inside the VM. If a
> bytecode procedure calls a native procedure, the machine registers are
> saved, and some machine-specific stub transfers control to the native
> code. Native code calling native code uses the same stack as the VM,
> though it has its own conventions over what registers to save; and
> native code calling bytecode prepares the Scheme stack, then restores
> the VM-saved machine registers.

Does the native code figure out if it's jumping to byte code or machine code, or does it use some transfer stub?

> AIUI the hotspot compiler actually does an SSA transformation of Java
> bytecode, then works on that. I'm not particularly interested in
> something like that; I'm more interested in something direct and fast,
> and obviously correct and understandable by our debugging
> infrastructure.

Though as you say, we can experiment later with additional changes.  If there's some heavily-used dynamically-generated code, it may be worth the extra effort, but we can find that out after we've got something working.

> Anyway, just some thoughts here. I'm not going to focus on native
> compilation in the coming months, as there are other things to do, but
> this is how I think it should be done :-)

Some random thoughts of my own:

Several possible options for AOT compilation (e.g., generating C or assembly and using native tools) could involve the generation of native object files.  It seems tempting to me to see how much we might be able to use the native C/C++/Fortran/etc method or do something parallel:

* Debug info in native representations, handled by GDB and other debuggers.  Okay, this is hard if we don't go via C code as an intermediate language, and probably even if we do.  But we can probably at least map PC address ranges to function names and line numbers, stuff like that.  Maybe we could do the more advanced stuff one format at a time, starting with DWARF.

* Code and read-only data sections shared across processes; read-write data mapped in copy-on-write.

* Loading Guile modules via dlopen or system runtime linker means they'd be visible to debuggers.

* With some special compile-time hooks, perhaps FFI symbol references could turn into (weak?) direct symbol references, processed with native relocation handling, etc.

* Linking multiple object files together into a single "library" object that can be loaded at once; possibly with cross-file optimization.

* Even for JIT compilation, but especially for AOT compilation, optimizations should only be enabled with careful consideration of concurrent execution.  E.g., if "(while (not done) ....)" is supposed to work with a second thread altering "done", you may not be able to combine multiple cases of reading the value of any variable even when you can prove that the current thread doesn't alter the value in between.

** Be especially careful if you want to be able to have Guile create a limited sandbox in which to run untrusted code.  Assume that the provider of the code will attempt to avoid mutexes and use race conditions and FFI pointer handling and opportunities for data corruption and such, in order to break out of the sandbox.

* Link compiled C and Scheme parts of a package together into a single shared library object, instead of the code in one language needing to know where the object for the other language is (awkward if you're trying to relocate the whole bundle via LD_LIBRARY_PATH) and explicitly load it.  (Perhaps a library initialization function could call a Guile library function to say, "if you need module (foo bar baz), it's mapped in at this address and is this big, and this much is read-only", or "here's a pointer to the struct Foo describing it, including pointers to various components".  Or we could generate C symbols reflecting module names and make the library explicitly make them known to the Guile library.)  If nothing else, the current .go file could be turned into a large character array....

* Can anything remotely reasonable happen when C++ code calls Scheme code which calls C++ code ... with stack-unwinding cleanup code specified in both languages, and an exception is raised?  Can the cleanup code be invoked in both languages?  (This applies to the bytecode interpreter as well, but the mechanism for compiled code would have to be different, as I believe C++/Ada/etc EH support typically maps PC address to handler info; I don't know how Java is handled under JIT compilation.)

* Did I mention how cool it would be to have GDB support? :-)

Looking forward to Emacs work:

Tom Tromey recently pointed out some JIT compilation work done on Emacs byte code back in 2004, with the conclusion that while some improvement is possible, the time spent in existing primitives dominates the execution time.  Playing devil's advocate for a minute: Why do you think we can do better?  Or was this modest improvement -- maybe a bit more for AOT compilation -- all you were expecting when you said we could run elisp faster than Emacs?

I'm hoping that AOT compilation will speed up the initial Lisp loading disproportionately though.  A lot of it is just loading function definitions, executing small blobs of Lisp code (like, create this keymap, then fill in this entry, then fill in that one, then assign it to this variable; or, add this property to this symbol with this value) and -- I *think* -- not relying too heavily on the built-in subrs that we can't speed up, and not doing any display updates, stuff like that.  But I'm still concerned about doing it at startup time rather than using the "unexec" mechanism Emacs currently uses to pre-initialize all the C and Lisp stuff and dump out an image that can be launched more quickly.

On my reasonably fast Mac desktop, Emacs takes about 3s to launch and load my .emacs file.  During the build, pre-loading the Lisp code takes it about another 3s, that would get added to the startup time without unexec.  If loading native compiled files (or .go files on platforms where we don't have native compilation yet) isn't so amazingly fast as to cut that down to 2-3s, do you have any ideas how we might be able to load and save an initialized Lisp environment?

One thing that might speed up the loading of .go files is making them more compact; there seems to be a lot of string duplication in the current format.  (Try running "strings module/ice-9/boot-9.go | sort | uniq -c | sort -n" to get a list of strings and the numbers of times they appear, sorted by count.)

I'm also pondering loading different Lisp files in two or three threads in parallel, when dependencies allow, but any manipulation of global variables has to be handled carefully, as do any load-time errors.  (One thread blocks reading, while another executes already-loaded code... maybe more, to keep multiple cores busy at once.)


... Sorry, that's a lot of tangents to be going off onto. :-)

Ken



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-18  2:19 ` Ken Raeburn
@ 2010-04-18 11:41   ` Andy Wingo
  2010-04-21 17:02     ` Ken Raeburn
  2010-04-18 20:40   ` Ludovic Courtès
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2010-04-18 11:41 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: guile-devel

Hi Ken,

You bring up lots of interesting points. Here are my initial reactions
to some of them.

On Sun 18 Apr 2010 04:19, Ken Raeburn <raeburn@raeburn.org> writes:

> On Apr 16, 2010, at 07:09, Andy Wingo wrote:
>> Currently, Guile has a compiler to a custom virtual machine, and the
>> associated toolchain: assemblers and disassemblers, stack walkers, the
>> debugger, etc. One can get the source location of a particular
>> instruction pointer, for example.
>
> These are great... but if they're run-time features of Guile, they're
> useless when examining a core file.

I don't think we should be thinking in terms of core files; core files
are corpses, whereas a Guile process is alive ;-)

Specifically, we should make it so that there is nothing you would want
to go to a core file for. Compiling Scheme code to native code should
never produce code that segfaults at runtime. All errors would still be
handled by the catch/throw mechanism.

So in a sense, native-compiled procedures are black boxes. All you need
to be able to do is to walk the stack -- which should have the same
representation for native-compiled and bytecode procedures -- and get
some representation of where you are in that procedure. I am hoping to
be able to treat the instruction pointer simply as an index into the
live variable set and source location array, so that we don't require
disassembly of native code for proper backtraces.

Of course, native disassembly is interesting sometimes, so we should
support it when we can, but it shouldn't be necessary to disassemble
instructions for all of the architectures supported by libjit or
lightning.


That said, it's true that if on the off chance you were to get a core
file from Guile, you would like to be able to understand it ;) So I see
your point here. I think ideally it would be a GCC plugin that linked to
libguile somehow, at least to get the typecodes and such. Lots of
interesting stuff to do here!


> Would you generate code based on the debug or non-debug versions of
> the instructions? What would the choice depend on? Can both bytecode
> evaluators be used in one process and with the same bytecode object?

Options to the compiler would be one way. Then there are the typical
(declare ...) blocks, both at the top-level and within procedures.

As an aside currently it's tough (almost impossible) to switch from the
debug-instrumented VM to the non-debugging one. You basically have to
select it at compile time; in the future we can allow selecting it when
you start Guile, but switching at runtime will have to wait for a
delimited call/cc, methinks.

> What about when profiling for performance?

Profiling with statprof should still work, because the stack
representation will be the same.

> Does the native code figure out if it's jumping to byte code or
> machine code, or does it use some transfer stub?

It has to do some checking anyway, to see if the procedure being called
is actually a procedure, so it will just fold in the check that the
procedure has native code. If the procedure has no native code or is not
a procedure, the native code jumps back into the body of e.g.
vm_debug_engine, after restoring registers. (I think?)

> Several possible options for AOT compilation (e.g., generating C or
> assembly and using native tools)

I would really like to avoid generating C. It seems to be to be a
needless abstraction layer, given that we will use our own calling
convention and stack. It's actually too high of an abstraction layer
IMO.

Hmm, another option would be to write a GCC plugin and feed it
s-expressions. Hmm...

> * Debug info in native representations, handled by GDB and other
> debuggers. Okay, this is hard if we don't go via C code as an
> intermediate language, and probably even if we do. But we can probably
> at least map PC address ranges to function names and line numbers,
> stuff like that. Maybe we could do the more advanced stuff one format
> at a time, starting with DWARF.

We should be able to do this already; given that we map bytecode address
ranges to line numbers, and the function is on the stack still you you
can query it for whatever you like. Adding a map when generating native
code should be easy.

> * Code and read-only data sections shared across processes; read-write
> data mapped in copy-on-write.

Read-only sharing does happen already with .go files. Copy-on write does
not happen yet.

I would actually like to switch our compiled-code on-disk format to be a
subset of ELF, so we can have e.g. a bytecode section, a native code
section, sections for RO and RW data, etc. But that would take a fair
amount of thinking.

> * With some special compile-time hooks, perhaps FFI symbol references
> could turn into (weak?) direct symbol references, processed with
> native relocation handling, etc.

This might improve startup times (marginally?), but it wouldn't affect
runtimes, would it?

> * Linking multiple object files together into a single "library"
> object that can be loaded at once; possibly with cross-file
> optimization.

Dunno; for me, cross-file optimization should happen at macroexpansion
time via define-integrable, or via similar approaches. But linking
together a number of modules into one file could be advantageous in e.g.
the emacs case, to avoid unexec.

> * Even for JIT compilation, but especially for AOT compilation,
> optimizations should only be enabled with careful consideration of
> concurrent execution. E.g., if "(while (not done) ....)" is supposed
> to work with a second thread altering "done", you may not be able to
> combine multiple cases of reading the value of any variable even when
> you can prove that the current thread doesn't alter the value in
> between.

Fortunately, Scheme programming style discourages global variables ;)
Reminds me of "spooky action at a distance". And when they are read, it
is always through an indirection, so we should be good.

> ** Be especially careful if you want to be able to have Guile create a
> limited sandbox in which to run untrusted code. Assume that the
> provider of the code will attempt to avoid mutexes and use race
> conditions and FFI pointer handling and opportunities for data
> corruption and such, in order to break out of the sandbox.

Of course. Sandboxed code of course should not have access to mutexes or
the FFI or many other things. Though it is an interesting point, that
resources that you provide to sandboxed code should be threadsafe, if
the sandbox itself has threads.

> * Link compiled C and Scheme parts of a package together into a single
> shared library object, instead of the code in one language needing to
> know where the object for the other language is (awkward if you're
> trying to relocate the whole bundle via LD_LIBRARY_PATH) and
> explicitly load it. (Perhaps a library initialization function could
> call a Guile library function to say, "if you need module (foo bar
> baz), it's mapped in at this address and is this big, and this much is
> read-only", or "here's a pointer to the struct Foo describing it,
> including pointers to various components". Or we could generate C
> symbols reflecting module names and make the library explicitly make
> them known to the Guile library.) If nothing else, the current .go
> file could be turned into a large character array....

This is all very hard stuff!

> * Can anything remotely reasonable happen when C++ code calls Scheme
> code which calls C++ code ... with stack-unwinding cleanup code
> specified in both languages, and an exception is raised? Can the
> cleanup code be invoked in both languages? (This applies to the
> bytecode interpreter as well, but the mechanism for compiled code
> would have to be different, as I believe C++/Ada/etc EH support
> typically maps PC address to handler info; I don't know how Java is
> handled under JIT compilation.)

I have no earthly idea :)

> Looking forward to Emacs work:
>
> Tom Tromey recently pointed out some JIT compilation work done on
> Emacs byte code back in 2004, with the conclusion that while some
> improvement is possible, the time spent in existing primitives
> dominates the execution time. Playing devil's advocate for a minute:
> Why do you think we can do better? Or was this modest improvement --
> maybe a bit more for AOT compilation -- all you were expecting when
> you said we could run elisp faster than Emacs?

Better for emacs? Well I don't think we should over-sell speed, if
that's what you're getting at. Bytecode-wise, the performace will
probably be the same. I suspect the same code in Scheme will run faster
than Elisp, due to lexical scoping, and a richer set of bytecode
primitives. But I think the goal for phase 1 should be "no one will
notice" ;-)

Native-code compilation will make both Scheme and Elisp significantly
faster -- I think 4x would be a typical improvement, though one would
find 2x and 20x as well.

More broadly, though, I don't really believe in the long-term health of
a system that relies on primitives for speed, because such a system
necessarily restricts the expressive power of the extension language.
There are many things you just can't do in Emacs these days -- and
sometimes it's things as basic as "display all of the messages in my
archived folder". Making the extension language more capable allows for
more programs to be written inside Emacs. Eventually we will even
migrate many of the primitives out of C, and back into Elisp or Scheme.

> But I'm still concerned about [loading init code] at startup time
> rather than using the "unexec" mechanism Emacs currently uses to
> pre-initialize all the C and Lisp stuff and dump out an image that can
> be launched more quickly.

Yeah, understood. We need to figure out what the status is with this.

> On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
> load my .emacs file.

How long does emacs -Q take?

> During the build, pre-loading the Lisp code takes it about another 3s,
> that would get added to the startup time without unexec. If loading
> native compiled files (or .go files on platforms where we don't have
> native compilation yet) isn't so amazingly fast as to cut that down to
> 2-3s, do you have any ideas how we might be able to load and save an
> initialized Lisp environment?

I think we'll just have to see, unfortunately. Currently our mess of .go
files everywhere means that you get significantly different numbers
depending on whether you're in the disk cache or not... Perhaps we can
make a quick estimate just based on KSLOC? How many KSLOC get loaded in
a base emacs -Q ?

> One thing that might speed up the loading of .go files is making them
> more compact; there seems to be a lot of string duplication in the
> current format. (Try running "strings module/ice-9/boot-9.go | sort |
> uniq -c | sort -n" to get a list of strings and the numbers of times
> they appear, sorted by count.)

Interesting. You're probably right here. I think our bytecode files need
separate RO and RW data sections.

> I'm also pondering loading different Lisp files in two or three
> threads in parallel, when dependencies allow, but any manipulation of
> global variables has to be handled carefully, as do any load-time
> errors. (One thread blocks reading, while another executes
> already-loaded code... maybe more, to keep multiple cores busy at
> once.)

This is a little crazy ;-)

> ... Sorry, that's a lot of tangents to be going off onto. :-)

Heh, no prob. If we figure out how things should work, in public on the
list, it will be easier for other people to help us make it there.

Cheers,

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-18  2:19 ` Ken Raeburn
  2010-04-18 11:41   ` Andy Wingo
@ 2010-04-18 20:40   ` Ludovic Courtès
  1 sibling, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2010-04-18 20:40 UTC (permalink / raw)
  To: guile-devel

Hi,

Ken Raeburn <raeburn@raeburn.org> writes:

> It would be awesome if GDB could display this information when
> debugging a process, *and* when looking at a core file.

Actually, GDB has some Guile support (‘set language scheme’), although
it works only with 1.6 and partly with 1.8 (because tags have changed,
stack management as well, etc.).

Thanks,
Ludo’.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-18 11:41   ` Andy Wingo
@ 2010-04-21 17:02     ` Ken Raeburn
  2010-04-22 11:28       ` Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Ken Raeburn @ 2010-04-21 17:02 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

On Apr 18, 2010, at 07:41, Andy Wingo wrote:
> Specifically, we should make it so that there is nothing you would want
> to go to a core file for. Compiling Scheme code to native code should
> never produce code that segfaults at runtime. All errors would still be
> handled by the catch/throw mechanism.

Including a segfault in compiled Scheme code, caused by an application-supplied C procedure returning something that looks like one of the pointer-using SCM objects but is in reality just garbage?  There *will* be core files.

>> * Debug info in native representations, handled by GDB and other
>> debuggers. Okay, this is hard if we don't go via C code as an
>> intermediate language, and probably even if we do. But we can probably
>> at least map PC address ranges to function names and line numbers,
>> stuff like that. Maybe we could do the more advanced stuff one format
>> at a time, starting with DWARF.
> 
> We should be able to do this already; given that we map bytecode address
> ranges to line numbers, and the function is on the stack still you you
> can query it for whatever you like. Adding a map when generating native
> code should be easy.

I think for best results with GDB and other debuggers, it should be converted into whatever the native format is, DWARF or otherwise.

> I would actually like to switch our compiled-code on-disk format to be a
> subset of ELF, so we can have e.g. a bytecode section, a native code
> section, sections for RO and RW data, etc. But that would take a fair
> amount of thinking.

And if it's actually compatible with ELF, would make special handling of compiled Scheme + compiled C possible on ELF platforms but not others, leading to two different ways of potentially building stuff (or, people supporting only ELF platforms in their packages, whether intentionally or not; or, people not bothering using the non-portable special handling).  Which is why I was suggesting native formats rather than ELF specifically -- more work up front, but more uniform treatment of platforms in the build process.

>> * With some special compile-time hooks, perhaps FFI symbol references
>> could turn into (weak?) direct symbol references, processed with
>> native relocation handling, etc.
> 
> This might improve startup times (marginally?), but it wouldn't affect
> runtimes, would it?

Depending how it's done, it might improve the first reference to a symbol very slightly.  You could (again, depending how it's done) perhaps trigger link-time errors if a developer forgets to supply libraries defining symbols the Scheme code knows will be required, instead of a delayed run-time error.

>> * Even for JIT compilation, but especially for AOT compilation,
>> optimizations should only be enabled with careful consideration of
>> concurrent execution. E.g., if "(while (not done) ....)" is supposed
>> to work with a second thread altering "done", you may not be able to
>> combine multiple cases of reading the value of any variable even when
>> you can prove that the current thread doesn't alter the value in
>> between.
> 
> Fortunately, Scheme programming style discourages global variables ;)
> Reminds me of "spooky action at a distance". And when they are read, it
> is always through an indirection, so we should be good.

Who said global?  It could be two procedures accessing a value in a shared outer scope, with one of them launched in a second thread, perhaps indirectly via a third procedure which the compiler couldn't examine at the time to know that it would create a thread.

I'm not sure indirection helps -- unless you mean it disables that sort of optimization.

> Of course. Sandboxed code of course should not have access to mutexes or
> the FFI or many other things. Though it is an interesting point, that
> resources that you provide to sandboxed code should be threadsafe, if
> the sandbox itself has threads.

Actually, I'm not sure that mutexes should be forbidden, especially if you let the sandbox create threads.  But they should be well-protected, bullet-proof mutexes; none of this "undefined behavior" stuff.  :-)

>> * Link compiled C and Scheme parts of a package together into a single
>> shared library object, [....]
> 
> This is all very hard stuff!

Maybe somewhat.  The "big char array" transformation wouldn't be that hard, I think, though we'd clearly be going outside the bounds of what a C99 compiler is *required* to support in terms of array size.  Slap a C struct wrapper on it (or C++, which would give you an encoding system for multiple names in a hierarchy, though with different character set limitations), and you've basically got an object file ready to be created.  Then you just have to teach libguile how not to read files for some modules.

>> * Can anything remotely reasonable happen when C++ code calls Scheme
>> code which calls C++ code ... with stack-unwinding cleanup code
>> specified in both languages, and an exception is raised? [....]
> 
> I have no earthly idea :)

It only just occurred to me.  It may be worth looking at the C++ plus Java case, and see if something reasonable happens there, especially with the GNU tools in particular.

My hunch is that we might be able to do it, but would need to compile at least a little C++ code into the library to do it portably.  That wouldn't be hard, as I doubt there are many platforms where you get a C but not C++ compiler these days, but I don't know if the C++ ABI work has progressed far enough along that it wouldn't tie us to a specific C++ implementation (on platforms with more than one), or how much of an issue that would be.  Worst case, we might have to run all the libguile code through the C++ compiler, to get stack-unwinding data recorded for EH processing; while there's a fairly large common subset of C and C++, it would still be annoying.

It might have to be crude, too -- for example, maybe on the C++ side we'd define a "Scheme exception" type that normally would not be caught specially by application code (though it could be), and perhaps Scheme wouldn't be able to catch C++ exceptions at all, just do the unwinding.

Just an idea to keep in mind....


> 
>> Looking forward to Emacs work:
>> 
>> Tom Tromey recently pointed out some JIT compilation work done on
>> Emacs byte code back in 2004, with the conclusion that while some
>> improvement is possible, the time spent in existing primitives
>> dominates the execution time. Playing devil's advocate for a minute:
>> Why do you think we can do better? Or was this modest improvement --
>> maybe a bit more for AOT compilation -- all you were expecting when
>> you said we could run elisp faster than Emacs?
> 
> Better for emacs? Well I don't think we should over-sell speed, if
> that's what you're getting at.

Hey, you're the one who said, "Guile can implement Emacs Lisp better than Emacs can."  :-) And specifically said that Emacs using Guile would be faster.

> Bytecode-wise, the performace will
> probably be the same. I suspect the same code in Scheme will run faster
> than Elisp, due to lexical scoping, and a richer set of bytecode
> primitives. But I think the goal for phase 1 should be "no one will
> notice" ;-)

The initial work, at least, wouldn't involve a rewrite of Lisp into Scheme.  So we still need to support dynamic scoping of, well, just about anything.

> 
> Native-code compilation will make both Scheme and Elisp significantly
> faster -- I think 4x would be a typical improvement, though one would
> find 2x and 20x as well.

For raw Scheme data processing, perhaps.  Like I said, I'm concerned about how much of the performance of Emacs is tied to that of the Emacs C code (redisplay, buffer manipulation, etc) and that part probably wouldn't improve much if at all.  So a 4x speedup of actual Emacs Lisp code becomes ... well, a much smaller speedup of Emacs overall.

> More broadly, though, I don't really believe in the long-term health of
> a system that relies on primitives for speed, because such a system
> necessarily restricts the expressive power of the extension language.
> There are many things you just can't do in Emacs these days -- and
> sometimes it's things as basic as "display all of the messages in my
> archived folder". Making the extension language more capable allows for
> more programs to be written inside Emacs. Eventually we will even
> migrate many of the primitives out of C, and back into Elisp or Scheme.

I'd like to see that, and I think many Emacs developers would as well.

>> On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
>> load my .emacs file.
> 
> How long does emacs -Q take?

Maybe about 1s less?

>> During the build, pre-loading the Lisp code takes it about another 3s,
>> that would get added to the startup time without unexec. If loading
>> native compiled files (or .go files on platforms where we don't have
>> native compilation yet) isn't so amazingly fast as to cut that down to
>> 2-3s, do you have any ideas how we might be able to load and save an
>> initialized Lisp environment?
> 
> I think we'll just have to see, unfortunately. Currently our mess of .go
> files everywhere means that you get significantly different numbers
> depending on whether you're in the disk cache or not... Perhaps we can
> make a quick estimate just based on KSLOC? How many KSLOC get loaded in
> a base emacs -Q ?

Not sure, I'll take a look.

>> I'm also pondering loading different Lisp files in two or three
>> threads in parallel, when dependencies allow, but any manipulation of
>> global variables has to be handled carefully, as do any load-time
>> errors. (One thread blocks reading, while another executes
>> already-loaded code... maybe more, to keep multiple cores busy at
>> once.)
> 
> This is a little crazy ;-)

Only a little?

Modern machines are going more and more in the multicore direction.  Even without that, often a thread blocks waiting to read stuff off disk, while another could continue doing work.  Why should my Emacs startup stall waiting on the disk any more than it absolutely needs to?  (Also, POSIX has async file i/o routines now, so "prefetching" the file contents is also an option; conceptually in the same thread, though it could be implemented with extra threads under the covers.)

Ken



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: a plan for native compilation
  2010-04-21 17:02     ` Ken Raeburn
@ 2010-04-22 11:28       ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2010-04-22 11:28 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: guile-devel

Hi Ken,

On Wed 21 Apr 2010 19:02, Ken Raeburn <raeburn@raeburn.org> writes:

> On Apr 18, 2010, at 07:41, Andy Wingo wrote:
>> Specifically, we should make it so that there is nothing you would
> want
>> to go to a core file for. Compiling Scheme code to native code should
>> never produce code that segfaults at runtime. All errors would still
> be
>> handled by the catch/throw mechanism.
>
> Including a segfault in compiled Scheme code, caused by an
> application-supplied C procedure returning something that looks like one
> of the pointer-using SCM objects but is in reality just garbage? There
> *will* be core files.

Good point.

>>> * Debug info in native representations, handled by GDB and other
>>> debuggers. Okay, this is hard if we don't go via C code as an
>>> intermediate language, and probably even if we do. But we can
> probably
>>> at least map PC address ranges to function names and line numbers,
>>> stuff like that. Maybe we could do the more advanced stuff one format
>>> at a time, starting with DWARF.
>> 
>> We should be able to do this already; given that we map bytecode
> address
>> ranges to line numbers, and the function is on the stack still you you
>> can query it for whatever you like. Adding a map when generating
> native
>> code should be easy.
>
> I think for best results with GDB and other debuggers, it should be
> converted into whatever the native format is, DWARF or otherwise.

I agree that this would be nice, eventually. However Guile's debugging
information is currently for Guile, not for GDB. It needs to be readable
by Guile. This would imply DWARF readers for Guile, which would be nice,
but a pain also.

>>> * Even for JIT compilation, but especially for AOT compilation,
>>> optimizations should only be enabled with careful consideration of
>>> concurrent execution. E.g., if "(while (not done) ....)" is supposed
>>> to work with a second thread altering "done", you may not be able to
>>> combine multiple cases of reading the value of any variable even when
>>> you can prove that the current thread doesn't alter the value in
>>> between.
>> 
>> Fortunately, Scheme programming style discourages global variables ;)
>> Reminds me of "spooky action at a distance". And when they are read,
> it
>> is always through an indirection, so we should be good.
>
> Who said global? It could be two procedures accessing a value in a
> shared outer scope, with one of them launched in a second thread,
> perhaps indirectly via a third procedure which the compiler couldn't
> examine at the time to know that it would create a thread.
>
> I'm not sure indirection helps -- unless you mean it disables that sort
> of optimization.

Variables which are never set may be copied when closures are made.
Variables which are set! need to be boxed due to continuations, and so
that closures can just copy the box instead of values. There is still an
indirection. The compiler handles this.

>> Better for emacs? Well I don't think we should over-sell speed, if
>> that's what you're getting at.
>
> Hey, you're the one who said, "Guile can implement Emacs Lisp better
> than Emacs can." :-) And specifically said that Emacs using Guile would
> be faster.

You caught me! ;)

Emacs using Guile will certainly be faster, when we get native
compilation. I think while we're just bytecode-based though it will be
the same.

> The initial work, at least, wouldn't involve a rewrite of Lisp into
> Scheme. So we still need to support dynamic scoping of, well, just about
> anything.

Indeed.

>> Native-code compilation will make both Scheme and Elisp significantly
>> faster -- I think 4x would be a typical improvement, though one would
>> find 2x and 20x as well.
>
> For raw Scheme data processing, perhaps. Like I said, I'm concerned
> about how much of the performance of Emacs is tied to that of the Emacs
> C code (redisplay, buffer manipulation, etc) and that part probably
> wouldn't improve much if at all. So a 4x speedup of actual Emacs Lisp
> code becomes ... well, a much smaller speedup of Emacs overall.

Ah, a speedup to emacs itself! I was just talking about elisp ;-) It
certainly depends on what you're doing, I guess is the answer here. I
would like my Gnus to be faster, but I'm quite fine with just editing
source code and mail ;-)

>>> On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
>>> load my .emacs file.
>> 
>> How long does emacs -Q take?
>
> Maybe about 1s less?

Good to know, thanks.

>>> I'm also pondering loading different Lisp files in two or three
>>> threads in parallel, when dependencies allow, but any manipulation of
>>> global variables has to be handled carefully, as do any load-time
>>> errors. (One thread blocks reading, while another executes
>>> already-loaded code... maybe more, to keep multiple cores busy at
>>> once.)
>> 
>> This is a little crazy ;-)
>
> Only a little?

:)


Well, I've spent the whole morning poking mail, which doesn't do much to
help Guile or Emacs. I'm going to see if I can focus on code in the next
two or three weeks, besides GHM organization obligations.

Happy hacking,

Andy
-- 
http://wingolog.org/




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-04-22 11:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-16 11:09 a plan for native compilation Andy Wingo
2010-04-16 20:47 ` No Itisnt
2010-04-17 10:21   ` Andy Wingo
2010-04-17 21:20     ` Ludovic Courtès
2010-04-16 23:15 ` Ludovic Courtès
2010-04-17 11:19   ` Andy Wingo
2010-04-18  2:19 ` Ken Raeburn
2010-04-18 11:41   ` Andy Wingo
2010-04-21 17:02     ` Ken Raeburn
2010-04-22 11:28       ` Andy Wingo
2010-04-18 20:40   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).