A vm for native code in guile

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* A vm for native code in guile
@ 2012-07-02  7:53 Stefan Israelsson Tampe
  2012-07-02 22:16 ` Andy Wingo
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Israelsson Tampe @ 2012-07-02  7:53 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]

Hi,

As Noha said, it's not that difficult to hook native code in. But to have
something maintainable
and bug less will be a challenge. To mitigate this having all code in
scheme could be wise.
Anyway I can now compile simple functions to native sequences of machine
code but with some
tools around it so let me explain the setup.

The idea is that To have a set of registers that is not touched, the
registers are

vm   -   points to a table with vm specific datastructures like free
variables etc.
jmp  -   a pointer to a jump table
rbx  -    neeeded to be maintained because of C conventions.

The sellected registers will not be changed during a C call and is
therefore not needed to be stored when calling out
to C (for x86-64, for x86 you do not have this possibility)

The vm contains at the moment,
vm-rbx
vm-r12       |
vm-r14       | Stored registers according to the C convention

vm-clo       | free variable table

vm-obj       | object table

vm-c         | C - function table

ret            | A return adress

The generated instruction could feature
1. Pure machine instructions for fastpaths.
2. The option to jump to a vm instruction like a named goto. Then ret needs
to contain a return adress
3. The option to call a C function using the function table

So the idea here is to keep some of the vm:ness of the setup in order to
keep code complexity low.
Also rsp ands tbp are used to match sp and fp in the guile-2.0 VM.

The general impression after looking at about 142 instructions are that the
bytevectors will be typical 10x
larger then the VM bytevectors e.g. there is a good compression ratio by
using the VM ops. On the other hand
we do save space by not inlining everything.

Anyway this is indeed a fun exercise and I see it as a testbed for
implementation strategies and as a fun testbed
for assemblers in guile. The examples that can be run does not include
looping (yet) so it's really not that useful but
in a matter of a week I would expect that some loop benchmarks can be done.

An example currently look like,

(use-modules (native vm init))
(use-modules (native vm jit))
(define (f) (vector 1 2 3 4 5 6))
(jit f)
(f)
execute native
execute native/finish
$2 = #(1 2 3 4 5 6)

/stefan

[-- Attachment #2: Type: text/html, Size: 2444 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A vm for native code in guile
  2012-07-02  7:53 A vm for native code in guile Stefan Israelsson Tampe
@ 2012-07-02 22:16 ` Andy Wingo
  2012-07-03 14:24   ` Stefan Israelsson Tampe
  2012-08-01 20:59   ` Stefan Israelsson Tampe
  0 siblings, 2 replies; 7+ messages in thread
From: Andy Wingo @ 2012-07-02 22:16 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

On Mon 02 Jul 2012 09:53, Stefan Israelsson Tampe <stefan.itampe@gmail.com> writes:

> Anyway I can now compile simple functions to native sequences of machine code but with some
> tools around it so let me explain the setup.

Where is this code?  Sorry for not following the context.

I agree with you that maintenance is the challenge.  Have you looked at
wip-rtl, Stefan?  It should be easier to compile.  However it's not
fully baked yet, unfortunately.

To speak honestly I am very impressed with your work.

I leave that as its own paragraph because that's what I think.  However,
with my conservative Guile co-maintainer hat on I still have hesitations
as to its applicability to mainline Guile.  You are very good at getting
Guile to work for you, but I would like to see more incremental
patches.

I know that we have gone back and forth over the past couple years and
this is probably frustrating to you.  At the same time I think that your
code, commits, and communication have gotten a _lot_ better over that
time.  If you have a complaint, please voice it.  Otherwise we keep with
this code review thing.

What do you think about this situation?

Peace,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A vm for native code in guile
  2012-07-02 22:16 ` Andy Wingo
@ 2012-07-03 14:24   ` Stefan Israelsson Tampe
  2012-07-07 20:12     ` Stefan Israelsson Tampe
  2012-08-01 20:59   ` Stefan Israelsson Tampe
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Israelsson Tampe @ 2012-07-03 14:24 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 4306 bytes --]

On Tue, Jul 3, 2012 at 12:16 AM, Andy Wingo <wingo@pobox.com> wrote:

> On Mon 02 Jul 2012 09:53, Stefan Israelsson Tampe <stefan.itampe@gmail.com>
> writes:
>
> > Anyway I can now compile simple functions to native sequences of machine
> code but with some
> > tools around it so let me explain the setup.
>
> Where is this code?  Sorry for not following the context.
>

https://gitorious.org/aschm

> I agree with you that maintenance is the challenge.  Have you looked at
> wip-rtl, Stefan?  It should be easier to compile.  However it's not
> fully baked yet, unfortunately.
>

The problem with rtl is that it is not fully baked and that I expect that
the hard part is to get
the framework correct so that you do not shoot yourself in the foot all the
time with the
assembler - it's really hard to use!

> To speak honestly I am very impressed with your work.
>
> Thx, dito for many other guile hackers, including you, I really like this
community!

I leave that as its own paragraph because that's what I think.  However,
> with my conservative Guile co-maintainer hat on I still have hesitations
> as to its applicability to mainline Guile.  You are very good at getting
> Guile to work for you, but I would like to see more incremental
> patches.
>

No problemo, The worst thing that can happen is that I get a well tested
assembler framework running that people can download and use at their will
which is not too bad. I do hope to start discussions about how to setup a
native environment, If you look at the vm.scm file in the repo
you will get the impression of a proposed design, I also propose to still
keep some of the old VM's philosophy in using named goto when we can
amortize the cost of doing it for the sake of a more compact compiled
procedures. And many more things. However the assembler is a stand alone
application independent of guile and there is a connection that hook into
guile but that's just an incrementation away from plain guile. So you see,
there is this although bloated chunk of 10000 row scheme code that is
standalone and there is a patch of <100 rows that makes it possible to
execute
a procedure as native. Not sure how to make the 10000, row beasty creature
incremental but surely
the other part should be fine to increment in. It's basically

1. An extra flag for marking a procedure native executable
2. code to set a native bytevector on the objcode
3. hooks in vm-i-system.c to check for the native flag and if so executre
the native code.
(this is a half hearted implementation)
4. Implement some support functions on the C level to avoid writing all
code in assenmbler but this is orthogonal code compared to plan guile. I
expect this to grow
look in the guile directory to see the file'a I've been touched in the
guile-repo

Also the assembler needs to go at some point because it's not GNU owned
code. And Ludo suggested that we try to port over MIT/GNU Scheme over to
guile, which is not a bad idea
Also I do know enough, (after porting SBCL's assembler) to implement a
Assembler from scratch
but of cause that will take a couple of month's. I'll keep the aschm
assembler for now though because I really want to play with the native
execution issues. (sassy is also a possibility
but that is only for x86 assembler and I wanted x86-64 to play with)

> I know that we have gone back and forth over the past couple years and
> this is probably frustrating to you.  At the same time I think that your
> code, commits, and communication have gotten a _lot_ better over that
> time.  If you have a complaint, please voice it.  Otherwise we keep with
> this code review thing.
>

Actually I have a really fun time hacking, so that I really do not care
much about
thumbs down or nit pickling, I can cope with that. Also I'm realistic about
how much time
people can spare and I do care that you continue with your own work for the
benefit of the
project. Also my own time commenting on other peoples ideas and question is
not good
so I can't demand that all should gather around and cheer the ideas I try
to bring. But I do
try to read in some of what you do and sometimes takes a detour in the wake
of that information.

> What do you think about this situation?
>
>
I enjoy it currently so why change!

Peace,

Andy
> --
> http://wingolog.org/
>

/Stefan

[-- Attachment #2: Type: text/html, Size: 6060 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A vm for native code in guile
  2012-07-03 14:24   ` Stefan Israelsson Tampe
@ 2012-07-07 20:12     ` Stefan Israelsson Tampe
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Israelsson Tampe @ 2012-07-07 20:12 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

Hi,

Branching now works and a big enough subset of the VM is translatable for
some interesting
benchmarks to be done.

So by skipping the goto structure a the win is maybe 3-4x for simple
numerical loops. I do expect
these loop ta be another factor of 2 when the wip-rtl is translated in the
same way. The
reason is that the overhead mainly consists of the instructions that move
things to and from the cache and rtl seams to decrease the number of such
operations. I've been incrementing fixnums and walked
some through lists of size 10000 to measure these numbers.

One thing to note with that code are that it piggy-packs onto the C-stack
and is not working with it's own.
I bet that is not optimal but that's what I did and it should mean that
it's fast to switch to C-code from the
native compiled or jit compiled ones.

Have fun!
/Stefan

[-- Attachment #2: Type: text/html, Size: 918 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A vm for native code in guile
  2012-07-02 22:16 ` Andy Wingo
  2012-07-03 14:24   ` Stefan Israelsson Tampe
@ 2012-08-01 20:59   ` Stefan Israelsson Tampe
  2013-01-14  2:51     ` Nala Ginrut
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Israelsson Tampe @ 2012-08-01 20:59 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

Hi,

The byte-code -> native-code compiler is does serve my needs pretty well
now. It should really
soon be possible to add code that will auto compile bytecode versions to
native versions. The compiler is not perfect and some instructions is
missing. But it can go from VM->NATIVE->VM and so on so whenever there is
missing instruction the compiler can bail out to vm code. What's left is
to be able to go from VM to Native returning multiple values and in all
call positions.

To note

* the code is for x86-64, linux.

* Windows have another calling convention => the assembler has to be recoded
  => we need compilers for all interesting combinations of operating
systems and native targets

* Using the C-stack is nice because the native push and pop instructions
can be used as well as
  brk-ings makes for automatic stack growth? also calling out c functions
can be fast. On the
  other hand stack traces is defunct with this code and I'm uncertain how
the prompting will
  cope with this feature. It's probably better to use a separate stack for
the native code and model
  it like the wip-rtl stack. On the other hand it has been convenient to
use it as a stack to save
  variables before calling out to helper c-functions, but these helper
functions usually is on the slow
  path and the savings can be done using helper registers that is local to
the vm a little bit slower
  but doable. Not sure what path to take here.

* Writing assembler is really tough. Because debugging is really difficult.

* To ease things I compiled C code and examined the assembler => fragile
and difficult to port the
  code. The final version needs to put more effort into probing for
constants used in the generated
  assembler.

* x86 code is pretty different because of the low number of registers and
invariant registers over c-call's

* prompt and aborts are tricky instructions!

Example:
as an example reducing a list of 1000 elements with a function that is
basically + a 4x increase in performance when compiling to native code
could be seen. This are typical figures for what one can expect to improve
in speed. A smarter usage of registers and less poping and pushing (RTL)
could mean that we can increase the speedup from stable-2.0 even further.

I will next week start working on the RTL branch porting the current setup
but use the rtl stack in stead of the native C stack.

Regards
/Stefan

[-- Attachment #2: Type: text/html, Size: 2587 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A vm for native code in guile
  2012-08-01 20:59   ` Stefan Israelsson Tampe
@ 2013-01-14  2:51     ` Nala Ginrut
       [not found]       ` <CAGua6m0x-W4kdWCSoonjn-+xY=RvSjvniN=wQH53h+MzEQPpbw@mail.gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Nala Ginrut @ 2013-01-14  2:51 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: Andy Wingo, guile-devel

On Wed, 2012-08-01 at 22:59 +0200, Stefan Israelsson Tampe wrote:
> Hi,
> 
> The byte-code -> native-code compiler is does serve my needs pretty well
> now. It should really
> soon be possible to add code that will auto compile bytecode versions to
> native versions. The compiler is not perfect and some instructions is
> missing. But it can go from VM->NATIVE->VM and so on so whenever there is
> missing instruction the compiler can bail out to vm code. What's left is
> to be able to go from VM to Native returning multiple values and in all
> call positions.
> 
> To note
> 
> * the code is for x86-64, linux.
> 
> * Windows have another calling convention => the assembler has to be recoded
>   => we need compilers for all interesting combinations of operating
> systems and native targets
> 
> * Using the C-stack is nice because the native push and pop instructions
> can be used as well as
>   brk-ings makes for automatic stack growth? also calling out c functions
> can be fast. On the
>   other hand stack traces is defunct with this code and I'm uncertain how
> the prompting will
>   cope with this feature. It's probably better to use a separate stack for
> the native code and model
>   it like the wip-rtl stack. On the other hand it has been convenient to
> use it as a stack to save
>   variables before calling out to helper c-functions, but these helper
> functions usually is on the slow
>   path and the savings can be done using helper registers that is local to
> the vm a little bit slower
>   but doable. Not sure what path to take here.
> 
> * Writing assembler is really tough. Because debugging is really difficult.
> 

IMO, we don't have to write assembler again, since GNU Binutils does.
The only necessary work is to map bytecode->asm, and add a AOT option
with a script into 'guild' for calling Binutils. 
We may borrow some work from GCC. I don't know if it's easy, but GCC
uses Lisp-like thing to handle machine-description. Though it could be
interesting, it's a lot of work todo. Then it could support many
platforms rather than x86. 

> * To ease things I compiled C code and examined the assembler => fragile
> and difficult to port the
>   code. The final version needs to put more effort into probing for
> constants used in the generated
>   assembler.
> 
> * x86 code is pretty different because of the low number of registers and
> invariant registers over c-call's
> 
> * prompt and aborts are tricky instructions!
> 
> Example:
> as an example reducing a list of 1000 elements with a function that is
> basically + a 4x increase in performance when compiling to native code
> could be seen. This are typical figures for what one can expect to improve
> in speed. A smarter usage of registers and less poping and pushing (RTL)
> could mean that we can increase the speedup from stable-2.0 even further.
> 
> 
> I will next week start working on the RTL branch porting the current setup
> but use the rtl stack in stead of the native C stack.
> 
> Regards
> /Stefan





^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <CAGua6m0x-W4kdWCSoonjn-+xY=RvSjvniN=wQH53h+MzEQPpbw@mail.gmail.com>]

* Re: A vm for native code in guile
       [not found]       ` <CAGua6m0x-W4kdWCSoonjn-+xY=RvSjvniN=wQH53h+MzEQPpbw@mail.gmail.com>
@ 2013-01-14  9:10         ` Nala Ginrut
  0 siblings, 0 replies; 7+ messages in thread
From: Nala Ginrut @ 2013-01-14  9:10 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

On Mon, 2013-01-14 at 09:43 +0100, Stefan Israelsson Tampe wrote:
> IMO, we don't have to write assembler again, since GNU Binutils does.
> The only necessary work is to map bytecode->asm, and add a AOT option
> with a script into 'guild' for calling Binutils.
> 
> 
> This is my approach: translate each opcode to a either a call to a c
> func or
> to a few assembller instructions. The translation to the c code is
> easy. But jumping and 
> moving data is best served by native instrucitons. Also we would like
> to effectivelly use machine registers
> ta gain perhaps an extra 2x in speed. This means quite a lot of work
> in assembler to move
> data around. To note we can gain some use of gcc by compiling for a
> restricted set of registers
> in order to be able to have a set of registers free for temporary
> storage. All this also means that we 
> can perhaps restrict the assembler to just include features for a very
> small set of instructions meaning that
> the assembler writing will not be as huge effort.
> 
Though write an simple x86-specific assembler won't be a difficult one,
I recommend reuse GNU code as much as possible.
And convert Scheme code to C code could be one of the way. It's nice to
give it a try. ;-)


> I'm not really sure how to use binutils as you suggest for the whole
> setup though. Perhaps you can explain in more detail.
> 
Binutils contains 'Assembler' & 'Linker' and other binary tools to
handle the native code generation.
I mention it here since we don't have to write assembler, the way I
suggest is to generate asm code only, then use binutls to make it an elf
one.

I think it's easier to implement a lisp-dialect compatible with
machine-description of GCC, like this:
https://android.googlesource.com/toolchain/gcc/+/57cfd7a03820d1241333e059feda0ab04829f575/gcc-4.6/gcc/config/i386/i386.md
And use this dialect to map registerVM-bytecode->asm. Then use 'as' to
assemble it. This way could avoid to re-implement assembler, and reuse
large of GCC code. 

> /Stefan
> 
> 
> On Mon, Jan 14, 2013 at 3:51 AM, Nala Ginrut <nalaginrut@gmail.com>
> wrote:
>         On Wed, 2012-08-01 at 22:59 +0200, Stefan Israelsson Tampe
>         wrote:
>         > Hi,
>         >
>         > The byte-code -> native-code compiler is does serve my needs
>         pretty well
>         > now. It should really
>         > soon be possible to add code that will auto compile bytecode
>         versions to
>         > native versions. The compiler is not perfect and some
>         instructions is
>         > missing. But it can go from VM->NATIVE->VM and so on so
>         whenever there is
>         > missing instruction the compiler can bail out to vm code.
>         What's left is
>         > to be able to go from VM to Native returning multiple values
>         and in all
>         > call positions.
>         >
>         > To note
>         >
>         > * the code is for x86-64, linux.
>         >
>         > * Windows have another calling convention => the assembler
>         has to be recoded
>         >   => we need compilers for all interesting combinations of
>         operating
>         > systems and native targets
>         >
>         > * Using the C-stack is nice because the native push and pop
>         instructions
>         > can be used as well as
>         >   brk-ings makes for automatic stack growth? also calling
>         out c functions
>         > can be fast. On the
>         >   other hand stack traces is defunct with this code and I'm
>         uncertain how
>         > the prompting will
>         >   cope with this feature. It's probably better to use a
>         separate stack for
>         > the native code and model
>         >   it like the wip-rtl stack. On the other hand it has been
>         convenient to
>         > use it as a stack to save
>         >   variables before calling out to helper c-functions, but
>         these helper
>         > functions usually is on the slow
>         >   path and the savings can be done using helper registers
>         that is local to
>         > the vm a little bit slower
>         >   but doable. Not sure what path to take here.
>         >
>         > * Writing assembler is really tough. Because debugging is
>         really difficult.
>         >
>         
>         
>         IMO, we don't have to write assembler again, since GNU
>         Binutils does.
>         The only necessary work is to map bytecode->asm, and add a AOT
>         option
>         with a script into 'guild' for calling Binutils.
>         We may borrow some work from GCC. I don't know if it's easy,
>         but GCC
>         uses Lisp-like thing to handle machine-description. Though it
>         could be
>         interesting, it's a lot of work todo. Then it could support
>         many
>         platforms rather than x86.
>         
>         > * To ease things I compiled C code and examined the
>         assembler => fragile
>         > and difficult to port the
>         >   code. The final version needs to put more effort into
>         probing for
>         > constants used in the generated
>         >   assembler.
>         >
>         > * x86 code is pretty different because of the low number of
>         registers and
>         > invariant registers over c-call's
>         >
>         > * prompt and aborts are tricky instructions!
>         >
>         > Example:
>         > as an example reducing a list of 1000 elements with a
>         function that is
>         > basically + a 4x increase in performance when compiling to
>         native code
>         > could be seen. This are typical figures for what one can
>         expect to improve
>         > in speed. A smarter usage of registers and less poping and
>         pushing (RTL)
>         > could mean that we can increase the speedup from stable-2.0
>         even further.
>         >
>         >
>         > I will next week start working on the RTL branch porting the
>         current setup
>         > but use the rtl stack in stead of the native C stack.
>         >
>         > Regards
>         > /Stefan
>         
>         
>         
> 
> 





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-14  9:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-02  7:53 A vm for native code in guile Stefan Israelsson Tampe
2012-07-02 22:16 ` Andy Wingo
2012-07-03 14:24   ` Stefan Israelsson Tampe
2012-07-07 20:12     ` Stefan Israelsson Tampe
2012-08-01 20:59   ` Stefan Israelsson Tampe
2013-01-14  2:51     ` Nala Ginrut
     [not found]       ` <CAGua6m0x-W4kdWCSoonjn-+xY=RvSjvniN=wQH53h+MzEQPpbw@mail.gmail.com>
2013-01-14  9:10         ` Nala Ginrut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).