Hi,

The byte-code -> native-code compiler is does serve my needs pretty well
now. It should really
soon be possible to add code that will auto compile bytecode versions to
native versions. The compiler is not perfect and some instructions is
missing. But it can go from VM->NATIVE->VM and so on so whenever there is
missing instruction the compiler can bail out to vm code. What's left is
to be able to go from VM to Native returning multiple values and in all
call positions.

To note

* the code is for x86-64, linux.

* Windows have another calling convention => the assembler has to be recoded
  => we need compilers for all interesting combinations of operating
systems and native targets

* Using the C-stack is nice because the native push and pop instructions
can be used as well as
  brk-ings makes for automatic stack growth? also calling out c functions
can be fast. On the
  other hand stack traces is defunct with this code and I'm uncertain how
the prompting will
  cope with this feature. It's probably better to use a separate stack for
the native code and model
  it like the wip-rtl stack. On the other hand it has been convenient to
use it as a stack to save
  variables before calling out to helper c-functions, but these helper
functions usually is on the slow
  path and the savings can be done using helper registers that is local to
the vm a little bit slower
  but doable. Not sure what path to take here.

* Writing assembler is really tough. Because debugging is really difficult.

* To ease things I compiled C code and examined the assembler => fragile
and difficult to port the
  code. The final version needs to put more effort into probing for
constants used in the generated
  assembler.

* x86 code is pretty different because of the low number of registers and
invariant registers over c-call's

* prompt and aborts are tricky instructions!

Example:
as an example reducing a list of 1000 elements with a function that is
basically + a 4x increase in performance when compiling to native code
could be seen. This are typical figures for what one can expect to improve
in speed. A smarter usage of registers and less poping and pushing (RTL)
could mean that we can increase the speedup from stable-2.0 even further.


I will next week start working on the RTL branch porting the current setup
but use the rtl stack in stead of the native C stack.

Regards
/Stefan