From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: Re: Register VM WIP Date: Fri, 11 May 2012 22:29:11 +0200 Message-ID: References: <871umqr8q0.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=14dae9340c45ce6d6904bfc8968b X-Trace: dough.gmane.org 1336768163 13639 80.91.229.3 (11 May 2012 20:29:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 11 May 2012 20:29:23 +0000 (UTC) Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri May 11 22:29:23 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SSwSn-0002Px-9C for guile-devel@m.gmane.org; Fri, 11 May 2012 22:29:21 +0200 Original-Received: from localhost ([::1]:42765 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSwSm-0006MT-Dh for guile-devel@m.gmane.org; Fri, 11 May 2012 16:29:20 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:34977) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSwSi-0006MD-R9 for guile-devel@gnu.org; Fri, 11 May 2012 16:29:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SSwSg-0004Io-AK for guile-devel@gnu.org; Fri, 11 May 2012 16:29:16 -0400 Original-Received: from mail-yx0-f169.google.com ([209.85.213.169]:58500) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSwSg-0004IY-2B for guile-devel@gnu.org; Fri, 11 May 2012 16:29:14 -0400 Original-Received: by yenm7 with SMTP id m7so3797811yen.0 for ; Fri, 11 May 2012 13:29:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EbJyu7xwW4Z+0pAB3TBnMaXAkDWgBsSql8yj0JSRby8=; b=VCkvbX+HUemWekosnGg77NSnJ02Bt7omhh8TOCd3pCFrPVwLUHwuiXyJmUNK8hHz5O oY0EvG631jUF/y3ugB3YgGm8bLLwRE2vZuvT+RVPJMoLSLsZgSePWmPDyPkttUwmZT/p GrkeDt27fE2VuWYOqnytYr8eLoyCsDyTz+dUYl+oV+KmsEO9wFPnymj8j24JMN9VSv87 WS9Nf9I+6idlqur0NRGgQUCRwXYfz/pod3uio7j0qM3xK4SFVcyOt8TBnEH2bpVq1foI sB56A46aAoBZIowN3b4quKg/X8ewVIzNyXwXgCQZUOpjZ4K9Em3Upv9b4x9NaYu+g++n oSIw== Original-Received: by 10.50.15.137 with SMTP id x9mr2446631igc.8.1336768151317; Fri, 11 May 2012 13:29:11 -0700 (PDT) Original-Received: by 10.50.242.102 with HTTP; Fri, 11 May 2012 13:29:11 -0700 (PDT) In-Reply-To: <871umqr8q0.fsf@pobox.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.213.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14396 Archived-At: --14dae9340c45ce6d6904bfc8968b Content-Type: text/plain; charset=ISO-8859-1 Hi, This looks very good. i like the hole approach and this approach has the potential to address most of the issues I have seen when disassembling guile-2.0 output. A few notes. 1. What about growing stacks any coments if they will be easier to manage for this setup. Can one copy the C stack logic? 2. Is there an instruction that does what call does but can be used for tail call's when it needs it e.g. the code for (n = 0; n < nargs; n++) LOCAL_SET (n, old_fp[ip[4 + n]]); that is missing for the tail code 3. I would appriciate if the frame is always below say 256 SCM:s of the fp stack limit that way when preparing tail calling one doesn't usally need to check if the argument fit's when issuing a tail call. If you compile a function that tail call more then 254 (?) arguments then you can as well check because then be free relative the argument handling. 4. I think the logic code hook I recently investigated could easily fit into this VM engine with using similar techniques as I described in previous mails. Thanks for your work on this Stefan On Fri, May 11, 2012 at 6:19 PM, Andy Wingo wrote: > Hi all, > > This mail announces some very early work on a register VM. The code is > in wip-rtl ("work in progress, register transfer language". The latter > bit is something of a misnomer.). There is not much there yet: > basically just the VM, an assembler, and a disassembler. Still, it's > interesting, and I thought people might want to hear more about it. > > So, the deal: why is it interesting to switch from a stack VM, which is > what we have, to a register VM? There are three overriding > disadvantages to the current VM. > > 1) With our stack VM, instructions are smaller. They do less, so you > need more of them. This increases dispatch cost, which is the > largest cost of a VM. > > 2) On a stack VM, there is a penalty to naming values. Since the only > values that are accessible to an instruction are the ones on the > top of the stack, whenever you want to use more names, you end up > doing a lot of local-ref / local-set operations. In contrast an > instruction for a register VM can address many more operands, so > there is much less penalty to storing something on the stack. (The > penalty is not so much in the storage, but in the pushing and > popping needed to access it.) > > 3) Our stack VM has variable-sized stack frames, so we need to check > for overflow every time we push a value on the stack. This is > quite costly. > > The WIP register VM fixes all of these issues. > > The basic design of the VM is: 32-bit instruction words, 8-bit opcodes, > variable-length instructions, maximum of 24-bit register addressing, and > static, relocatable allocation of constants. > > Also, with the wip-rtl VM there is no stack pointer: locals are > addressed directly via the frame pointer, and the call frame for a > function is of a fixed size. Indeed the IP and FP are the only state > variables of the VM, which makes it much easier to think about native > compilation, given the scarcity of CPU registers on some architectures. > > See vm-engine.c from around line 1000 for a commented set of > instructions. It's messy in many ways now, but hey. > > As far as performance goes, we won't know yet. But at least for a > simple loop, counting down from a billion, the register VM is a few > times faster than the stack VM. Still, I would be happy if the general > speedup were on the order of 40%. We'll see. > > Here's that loop in rtl VM: > > (use-modules (system vm rtl)) > > (assemble-rtl-program > 0 > '((assert-nargs-ee/locals 1 2) > (br fix-body) > loop-head > (make-short-immediate 2 0) > (br-if-= 1 2 out) > (sub1 1 1) > (br loop-head) > fix-body > (mov 1 0) > (br loop-head) > out > (make-short-immediate 0 #t) > (return 0))) > > There are various ways to improve this, but its structure is like what > the stack VM produces. > > Compare to the current opcode: > > scheme@(guile-user)> (define (countdown n) (let lp ((n n)) (or (zero? > n) (lp (1- n))))) > scheme@(guile-user)> ,x countdown > Disassembly of #: > > 0 (assert-nargs-ee/locals 17) ;; 1 arg, 2 locals > 2 (br :L186) ;; -> 30 > 6 (local-ref 1) ;; `n' > 8 (make-int8:0) ;; 0 > 9 (ee?) > 10 (local-set 2) ;; `t' > 12 (local-ref 2) ;; `t' > 14 (br-if-not :L187) ;; -> 21 > 18 (local-ref 2) ;; `t' > 20 (return) > 21 (local-ref 1) ;; `n' > 23 (sub1) > 24 (local-set 1) ;; `n' > 26 (br :L188) ;; -> 6 > 30 (local-ref 0) ;; `n' > 32 (local-set 1) > 34 (br :L188) ;; -> 6 > > OK, time to set down the keyboard; been working far too much on this in > recent days. I still need to adapt the compiler to produce RTL > bytecode. I am going to let it sit for a week or two before touching it > again. Comments welcome. > > Regards, > > Andy > -- > http://wingolog.org/ > > --14dae9340c45ce6d6904bfc8968b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

This looks very good. i like the hole approach and this approach= has the
potential to address most of the issues I have seen when disas= sembling
guile-2.0 output. A few notes.

1. What about growing sta= cks any coments if they will be easier to manage
for this setup. Can one copy the C stack logic?

2. Is there an instr= uction that does what call does but can be used for tail call's
when= it needs it e.g. the code
=A0for (n =3D 0; n < nargs; n++)
=A0=A0= =A0=A0=A0=A0=A0 LOCAL_SET (n, old_fp[ip[4 + n]]);
that is missing for the tail code

3. I would appriciate if the frame= is always below say 256 SCM:s of the fp stack limit
that way when prepa= ring tail calling one doesn't usally need to check if the argument fit&= #39;s
when issuing a tail call. If you compile a function that tail call more the= n 254 (?) arguments
then you can as well check because then be free rela= tive the argument handling.

4. I think the logic code hook I recentl= y investigated could easily fit into this VM engine with
using similar techniques as I described in previous mails.

Thanks fo= r your work on this
Stefan


On Fri,= May 11, 2012 at 6:19 PM, Andy Wingo <wingo@pobox.com> wrote:<= br>
Hi all,

This mail announces some very early work on a register VM. =A0The code is in wip-rtl ("work in progress, register transfer language". =A0Th= e latter
bit is something of a misnomer.). =A0There is not much there yet:
basically just the VM, an assembler, and a disassembler. =A0Still, it's=
interesting, and I thought people might want to hear more about it.

So, the deal: why is it interesting to switch from a stack VM, which is
what we have, to a register VM? =A0There are three overriding
disadvantages to the current VM.

=A01) With our stack VM, instructions are smaller. =A0They do less, so you=
=A0 =A0 need more of them. =A0This increases dispatch cost, which is the =A0 =A0 largest cost of a VM.

=A02) On a stack VM, there is a penalty to naming values. =A0Since the onl= y
=A0 =A0 values that are accessible to an instruction are the ones on the =A0 =A0 top of the stack, whenever you want to use more names, you end up<= br> =A0 =A0 doing a lot of local-ref / local-set operations. =A0In contrast an=
=A0 =A0 instruction for a register VM can address many more operands, so =A0 =A0 there is much less penalty to storing something on the stack. =A0(= The
=A0 =A0 penalty is not so much in the storage, but in the pushing and
=A0 =A0 popping needed to access it.)

=A03) Our stack VM has variable-sized stack frames, so we need to check =A0 =A0 for overflow every time we push a value on the stack. =A0This is =A0 =A0 quite costly.

The WIP register VM fixes all of these issues.

The basic design of the VM is: 32-bit instruction words, 8-bit opcodes,
variable-length instructions, maximum of 24-bit register addressing, and static, relocatable allocation of constants.

Also, with the wip-rtl VM there is no stack pointer: locals are
addressed directly via the frame pointer, and the call frame for a
function is of a fixed size. =A0Indeed the IP and FP are the only state
variables of the VM, which makes it much easier to think about native
compilation, given the scarcity of CPU registers on some architectures.

See vm-engine.c from around line 1000 for a commented set of
instructions. =A0It's messy in many ways now, but hey.

As far as performance goes, we won't know yet. =A0But at least for a simple loop, counting down from a billion, the register VM is a few
times faster than the stack VM. =A0Still, I would be happy if the general speedup were on the order of 40%. =A0We'll see.

Here's that loop in rtl VM:

=A0 (use-modules (system vm rtl))

=A0 (assemble-rtl-program
=A0 =A0 0
=A0 =A0 '((assert-nargs-ee/locals 1 2)
=A0 =A0 =A0 (br fix-body)
=A0 =A0 =A0 loop-head
=A0 =A0 =A0 (make-short-immediate 2 0)
=A0 =A0 =A0 (br-if-=3D 1 2 out)
=A0 =A0 =A0 (sub1 1 1)
=A0 =A0 =A0 (br loop-head)
=A0 =A0 =A0 fix-body
=A0 =A0 =A0 (mov 1 0)
=A0 =A0 =A0 (br loop-head)
=A0 =A0 =A0 out
=A0 =A0 =A0 (make-short-immediate 0 #t)
=A0 =A0 =A0 (return 0)))

There are various ways to improve this, but its structure is like what
the stack VM produces.

Compare to the current opcode:

=A0 =A0scheme@(guile-user)> (define (countdown n) (let lp ((n n)) (or (= zero? n) (lp (1- n)))))
=A0 =A0scheme@(guile-user)> ,x countdown
=A0 =A0Disassembly of #<procedure countdown (n)>:

=A0 =A0 =A0 0 =A0 =A0(assert-nargs-ee/locals 17) =A0 =A0 ;; 1 arg, 2 local= s
=A0 =A0 =A0 2 =A0 =A0(br :L186) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0;; -> 30
=A0 =A0 =A0 6 =A0 =A0(local-ref 1) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `n'
=A0 =A0 =A0 8 =A0 =A0(make-int8:0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = 0
=A0 =A0 =A0 9 =A0 =A0(ee?)
=A0 =A0 =A010 =A0 =A0(local-set 2) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `t'
=A0 =A0 =A012 =A0 =A0(local-ref 2) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `t'
=A0 =A0 =A014 =A0 =A0(br-if-not :L187) =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; ->= ; 21
=A0 =A0 =A018 =A0 =A0(local-ref 2) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `t'
=A0 =A0 =A020 =A0 =A0(return)
=A0 =A0 =A021 =A0 =A0(local-ref 1) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `n'
=A0 =A0 =A023 =A0 =A0(sub1)
=A0 =A0 =A024 =A0 =A0(local-set 1) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `n'
=A0 =A0 =A026 =A0 =A0(br :L188) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0;; -> 6
=A0 =A0 =A030 =A0 =A0(local-ref 0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ;; = `n'
=A0 =A0 =A032 =A0 =A0(local-set 1)
=A0 =A0 =A034 =A0 =A0(br :L188) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0;; -> 6

OK, time to set down the keyboard; been working far too much on this in
recent days. =A0I still need to adapt the compiler to produce RTL
bytecode. =A0I am going to let it sit for a week or two before touching it<= br> again. =A0Comments welcome.

Regards,

Andy
--
http://wingolog.org/=


--14dae9340c45ce6d6904bfc8968b--