Hi, This looks very good. i like the hole approach and this approach has the potential to address most of the issues I have seen when disassembling guile-2.0 output. A few notes. 1. What about growing stacks any coments if they will be easier to manage for this setup. Can one copy the C stack logic? 2. Is there an instruction that does what call does but can be used for tail call's when it needs it e.g. the code for (n = 0; n < nargs; n++) LOCAL_SET (n, old_fp[ip[4 + n]]); that is missing for the tail code 3. I would appriciate if the frame is always below say 256 SCM:s of the fp stack limit that way when preparing tail calling one doesn't usally need to check if the argument fit's when issuing a tail call. If you compile a function that tail call more then 254 (?) arguments then you can as well check because then be free relative the argument handling. 4. I think the logic code hook I recently investigated could easily fit into this VM engine with using similar techniques as I described in previous mails. Thanks for your work on this Stefan On Fri, May 11, 2012 at 6:19 PM, Andy Wingo wrote: > Hi all, > > This mail announces some very early work on a register VM. The code is > in wip-rtl ("work in progress, register transfer language". The latter > bit is something of a misnomer.). There is not much there yet: > basically just the VM, an assembler, and a disassembler. Still, it's > interesting, and I thought people might want to hear more about it. > > So, the deal: why is it interesting to switch from a stack VM, which is > what we have, to a register VM? There are three overriding > disadvantages to the current VM. > > 1) With our stack VM, instructions are smaller. They do less, so you > need more of them. This increases dispatch cost, which is the > largest cost of a VM. > > 2) On a stack VM, there is a penalty to naming values. Since the only > values that are accessible to an instruction are the ones on the > top of the stack, whenever you want to use more names, you end up > doing a lot of local-ref / local-set operations. In contrast an > instruction for a register VM can address many more operands, so > there is much less penalty to storing something on the stack. (The > penalty is not so much in the storage, but in the pushing and > popping needed to access it.) > > 3) Our stack VM has variable-sized stack frames, so we need to check > for overflow every time we push a value on the stack. This is > quite costly. > > The WIP register VM fixes all of these issues. > > The basic design of the VM is: 32-bit instruction words, 8-bit opcodes, > variable-length instructions, maximum of 24-bit register addressing, and > static, relocatable allocation of constants. > > Also, with the wip-rtl VM there is no stack pointer: locals are > addressed directly via the frame pointer, and the call frame for a > function is of a fixed size. Indeed the IP and FP are the only state > variables of the VM, which makes it much easier to think about native > compilation, given the scarcity of CPU registers on some architectures. > > See vm-engine.c from around line 1000 for a commented set of > instructions. It's messy in many ways now, but hey. > > As far as performance goes, we won't know yet. But at least for a > simple loop, counting down from a billion, the register VM is a few > times faster than the stack VM. Still, I would be happy if the general > speedup were on the order of 40%. We'll see. > > Here's that loop in rtl VM: > > (use-modules (system vm rtl)) > > (assemble-rtl-program > 0 > '((assert-nargs-ee/locals 1 2) > (br fix-body) > loop-head > (make-short-immediate 2 0) > (br-if-= 1 2 out) > (sub1 1 1) > (br loop-head) > fix-body > (mov 1 0) > (br loop-head) > out > (make-short-immediate 0 #t) > (return 0))) > > There are various ways to improve this, but its structure is like what > the stack VM produces. > > Compare to the current opcode: > > scheme@(guile-user)> (define (countdown n) (let lp ((n n)) (or (zero? > n) (lp (1- n))))) > scheme@(guile-user)> ,x countdown > Disassembly of #: > > 0 (assert-nargs-ee/locals 17) ;; 1 arg, 2 locals > 2 (br :L186) ;; -> 30 > 6 (local-ref 1) ;; `n' > 8 (make-int8:0) ;; 0 > 9 (ee?) > 10 (local-set 2) ;; `t' > 12 (local-ref 2) ;; `t' > 14 (br-if-not :L187) ;; -> 21 > 18 (local-ref 2) ;; `t' > 20 (return) > 21 (local-ref 1) ;; `n' > 23 (sub1) > 24 (local-set 1) ;; `n' > 26 (br :L188) ;; -> 6 > 30 (local-ref 0) ;; `n' > 32 (local-set 1) > 34 (br :L188) ;; -> 6 > > OK, time to set down the keyboard; been working far too much on this in > recent days. I still need to adapt the compiler to produce RTL > bytecode. I am going to let it sit for a week or two before touching it > again. Comments welcome. > > Regards, > > Andy > -- > http://wingolog.org/ > >