Hi,

This looks very good. i like the hole approach and this approach has the
potential to address most of the issues I have seen when disassembling
guile-2.0 output. A few notes.

1. What about growing stacks any coments if they will be easier to manage
for this setup. Can one copy the C stack logic?

2. Is there an instruction that does what call does but can be used for
tail call's
when it needs it e.g. the code
 for (n = 0; n < nargs; n++)
        LOCAL_SET (n, old_fp[ip[4 + n]]);
that is missing for the tail code

3. I would appriciate if the frame is always below say 256 SCM:s of the fp
stack limit
that way when preparing tail calling one doesn't usally need to check if
the argument fit's
when issuing a tail call. If you compile a function that tail call more
then 254 (?) arguments
then you can as well check because then be free relative the argument
handling.

4. I think the logic code hook I recently investigated could easily fit
into this VM engine with
using similar techniques as I described in previous mails.

Thanks for your work on this
Stefan


On Fri, May 11, 2012 at 6:19 PM, Andy Wingo <wingo@pobox.com> wrote:

> Hi all,
>
> This mail announces some very early work on a register VM.  The code is
> in wip-rtl ("work in progress, register transfer language".  The latter
> bit is something of a misnomer.).  There is not much there yet:
> basically just the VM, an assembler, and a disassembler.  Still, it's
> interesting, and I thought people might want to hear more about it.
>
> So, the deal: why is it interesting to switch from a stack VM, which is
> what we have, to a register VM?  There are three overriding
> disadvantages to the current VM.
>
>  1) With our stack VM, instructions are smaller.  They do less, so you
>     need more of them.  This increases dispatch cost, which is the
>     largest cost of a VM.
>
>  2) On a stack VM, there is a penalty to naming values.  Since the only
>     values that are accessible to an instruction are the ones on the
>     top of the stack, whenever you want to use more names, you end up
>     doing a lot of local-ref / local-set operations.  In contrast an
>     instruction for a register VM can address many more operands, so
>     there is much less penalty to storing something on the stack.  (The
>     penalty is not so much in the storage, but in the pushing and
>     popping needed to access it.)
>
>  3) Our stack VM has variable-sized stack frames, so we need to check
>     for overflow every time we push a value on the stack.  This is
>     quite costly.
>
> The WIP register VM fixes all of these issues.
>
> The basic design of the VM is: 32-bit instruction words, 8-bit opcodes,
> variable-length instructions, maximum of 24-bit register addressing, and
> static, relocatable allocation of constants.
>
> Also, with the wip-rtl VM there is no stack pointer: locals are
> addressed directly via the frame pointer, and the call frame for a
> function is of a fixed size.  Indeed the IP and FP are the only state
> variables of the VM, which makes it much easier to think about native
> compilation, given the scarcity of CPU registers on some architectures.
>
> See vm-engine.c from around line 1000 for a commented set of
> instructions.  It's messy in many ways now, but hey.
>
> As far as performance goes, we won't know yet.  But at least for a
> simple loop, counting down from a billion, the register VM is a few
> times faster than the stack VM.  Still, I would be happy if the general
> speedup were on the order of 40%.  We'll see.
>
> Here's that loop in rtl VM:
>
>   (use-modules (system vm rtl))
>
>   (assemble-rtl-program
>     0
>     '((assert-nargs-ee/locals 1 2)
>       (br fix-body)
>       loop-head
>       (make-short-immediate 2 0)
>       (br-if-= 1 2 out)
>       (sub1 1 1)
>       (br loop-head)
>       fix-body
>       (mov 1 0)
>       (br loop-head)
>       out
>       (make-short-immediate 0 #t)
>       (return 0)))
>
> There are various ways to improve this, but its structure is like what
> the stack VM produces.
>
> Compare to the current opcode:
>
>    scheme@(guile-user)> (define (countdown n) (let lp ((n n)) (or (zero?
> n) (lp (1- n)))))
>    scheme@(guile-user)> ,x countdown
>    Disassembly of #<procedure countdown (n)>:
>
>       0    (assert-nargs-ee/locals 17)     ;; 1 arg, 2 locals
>       2    (br :L186)                      ;; -> 30
>       6    (local-ref 1)                   ;; `n'
>       8    (make-int8:0)                   ;; 0
>       9    (ee?)
>      10    (local-set 2)                   ;; `t'
>      12    (local-ref 2)                   ;; `t'
>      14    (br-if-not :L187)               ;; -> 21
>      18    (local-ref 2)                   ;; `t'
>      20    (return)
>      21    (local-ref 1)                   ;; `n'
>      23    (sub1)
>      24    (local-set 1)                   ;; `n'
>      26    (br :L188)                      ;; -> 6
>      30    (local-ref 0)                   ;; `n'
>      32    (local-set 1)
>      34    (br :L188)                      ;; -> 6
>
> OK, time to set down the keyboard; been working far too much on this in
> recent days.  I still need to adapt the compiler to produce RTL
> bytecode.  I am going to let it sit for a week or two before touching it
> again.  Comments welcome.
>
> Regards,
>
> Andy
> --
> http://wingolog.org/
>
>