From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: Re: ballpark rtl speeds Date: Thu, 7 Jun 2012 21:59:33 +0200 Message-ID: References: <87ipf3jyof.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec5396b5087ba6b04c1e752c9 X-Trace: dough.gmane.org 1339099187 9630 80.91.229.3 (7 Jun 2012 19:59:47 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 7 Jun 2012 19:59:47 +0000 (UTC) Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Jun 07 21:59:46 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Scirx-00007S-3M for guile-devel@m.gmane.org; Thu, 07 Jun 2012 21:59:45 +0200 Original-Received: from localhost ([::1]:36352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Scirx-0000jq-20 for guile-devel@m.gmane.org; Thu, 07 Jun 2012 15:59:45 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:49570) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Scirs-0000Vx-4v for guile-devel@gnu.org; Thu, 07 Jun 2012 15:59:43 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sciro-0001c2-Fj for guile-devel@gnu.org; Thu, 07 Jun 2012 15:59:39 -0400 Original-Received: from mail-yw0-f41.google.com ([209.85.213.41]:38551) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sciro-0001b3-2T for guile-devel@gnu.org; Thu, 07 Jun 2012 15:59:36 -0400 Original-Received: by yhr47 with SMTP id 47so876205yhr.0 for ; Thu, 07 Jun 2012 12:59:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=QI8otQud+K4kLd9IxhpByrYVeCKruJK46cZmB0zwcdg=; b=H4mDirJlY4sCK/kmBJ80gwr16WnQhBC45JF0BbS63B8JtFDysNv6F9+apAcbcIOHUL T3+RbZbe5UfCaWeRCWBQHQ5OsVO3+Zn2byNV32M1+ku75n+RL8ZFs+Z2ZgvTYezRH8AO ckYi0MyZTrdrP1QIxI47nWdiYAiUV+Vy4iK6fyznFAOK+sMGU0SDAqH36r5ItHf0wbT/ 6qNFBUGNhgqkkS97LWm6dX3Vc/GG5vX4yP+InMENZ0130/lZ859eyH4neMT5Ql2Obj5U CzW/vriNhctY/ujES2sG0fiTOhsUn/pVoXdT1LRPSKVcVg5d0mBw/gGJPH/ulXk8RnC4 vtNg== Original-Received: by 10.50.197.170 with SMTP id iv10mr2054698igc.17.1339099173075; Thu, 07 Jun 2012 12:59:33 -0700 (PDT) Original-Received: by 10.50.60.5 with HTTP; Thu, 7 Jun 2012 12:59:33 -0700 (PDT) In-Reply-To: <87ipf3jyof.fsf@pobox.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.213.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14563 Archived-At: --bcaec5396b5087ba6b04c1e752c9 Content-Type: text/plain; charset=ISO-8859-1 Great! So If you code the new VM interpreter you get 2x improvement If you generate code and compile with no optimization about another 3x If you are able to generate code that compiles with optimisation bsically using a register you will get ? Using a register as a storage on my machine yields 0.4s and the above c code was using about 2.6s. About a further 6x in performance. Great work! /Stefan On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo wrote: > Hi, > > Some ballpark measurements of the overhead of the old VM, the new VM, > and C (compiled with gcc -g -O0). > > Old interpreter: > > $ guile --no-debug > > (define (countdown* n) > (let lp ((n n)) > (if (zero? n) > #t > (lp (1- n))))) > > ,time (countdown* 1000000000) > ;; 14.054572s real time, 14.033213s run time. 0.000000s spent in GC. > > New interpreter: > > > (use-modules (system vm rtl)) > > (define countdown > (assemble-program > '((begin-program countdown 1) > (assert-nargs-ee/locals 1 2) > (br fix-body) > (label loop-head) > (load-constant 2 0) > (br-if-= 1 2 out) > (sub1 1 1) > (br loop-head) > (label fix-body) > (mov 1 0) > (br loop-head) > (label out) > (load-constant 0 #t) > (return 0)))) > > ,time (countdown 1000000000) > ;; 6.023658s real time, 6.014166s run time. 0.000000s spent in GC. > > Note that this is not the ideal bytecode -- there are two branches per > loop iteration when there could just be one. But it's what the existing > tree-il compiler would produce. > > C, with gcc -O0, disassembled: > > #include > > int > main (int argc, char *argv[]) > { > 400514: 55 push %rbp > 400515: 48 89 e5 mov %rsp,%rbp > 400518: 48 83 ec 20 sub $0x20,%rsp > 40051c: 89 7d ec mov %edi,-0x14(%rbp) > 40051f: 48 89 75 e0 mov %rsi,-0x20(%rbp) > if (argc != 2) > 400523: 83 7d ec 02 cmpl $0x2,-0x14(%rbp) > 400527: 74 07 je 400530 > return 1; > 400529: b8 01 00 00 00 mov $0x1,%eax > 40052e: eb 2e jmp 40055e > long l = atol (argv[1]); > 400530: 48 8b 45 e0 mov -0x20(%rbp),%rax > 400534: 48 83 c0 08 add $0x8,%rax > 400538: 48 8b 00 mov (%rax),%rax > 40053b: 48 89 c7 mov %rax,%rdi > 40053e: e8 dd fe ff ff callq 400420 > 400543: 48 89 45 f8 mov %rax,-0x8(%rbp) > while (l--); > 400547: 90 nop > 400548: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp) > 40054d: 0f 95 c0 setne %al > 400550: 48 83 6d f8 01 subq $0x1,-0x8(%rbp) > 400555: 84 c0 test %al,%al > 400557: 75 ef jne 400548 > return 0; > 400559: b8 00 00 00 00 mov $0x0,%eax > } > 40055e: c9 leaveq > 40055f: c3 retq > > $ time ./a.out 1000000000 > > real 0m2.061s > user 0m2.056s > sys 0m0.000s > > Of course with -O2 the loop goes away entirely ;) But it's an > interesting exercise. > > Andy > -- > http://wingolog.org/ > > --bcaec5396b5087ba6b04c1e752c9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Great!

So If you code the new VM interpreter you get 2x improvement<= br>If you generate code and compile with no optimization about another 3xIf you are able to generate code that compiles with optimisation bsically= using a register
you will get ?

Using a register as a storage on my machine yields 0.= 4s and the above c code
was using about 2.6s. About a further 6x in perf= ormance.

Great work!

/Stefan

On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo <wingo@pobox.com> wrote:
Hi,

Some ballpark measurements of the overhead of the old VM, the new VM,
and C (compiled with gcc -g -O0).

Old interpreter:

=A0$ guile --no-debug
=A0> (define (countdown* n)
=A0 =A0 =A0(let lp ((n n))
=A0 =A0 =A0 =A0(if (zero? n)
=A0 =A0 =A0 =A0 =A0 =A0#t
=A0 =A0 =A0 =A0 =A0 =A0(lp (1- n)))))
=A0> ,time (countdown* 1000000000)
=A0;; 14.054572s real time, 14.033213s run time. =A00.000000s spent in GC.=

New interpreter:

=A0> (use-modules (system vm rtl))
=A0> (define countdown
=A0 =A0 =A0(assemble-program
=A0 =A0 =A0 =A0'((begin-program countdown 1)
=A0 =A0 =A0 =A0 =A0(assert-nargs-ee/locals 1 2)
=A0 =A0 =A0 =A0 =A0(br fix-body)
=A0 =A0 =A0 =A0 =A0(label loop-head)
=A0 =A0 =A0 =A0 =A0(load-constant 2 0)
=A0 =A0 =A0 =A0 =A0(br-if-=3D 1 2 out)
=A0 =A0 =A0 =A0 =A0(sub1 1 1)
=A0 =A0 =A0 =A0 =A0(br loop-head)
=A0 =A0 =A0 =A0 =A0(label fix-body)
=A0 =A0 =A0 =A0 =A0(mov 1 0)
=A0 =A0 =A0 =A0 =A0(br loop-head)
=A0 =A0 =A0 =A0 =A0(label out)
=A0 =A0 =A0 =A0 =A0(load-constant 0 #t)
=A0 =A0 =A0 =A0 =A0(return 0))))
=A0> ,time (countdown 1000000000)
=A0;; 6.023658s real time, 6.014166s run time. =A00.000000s spent in GC.
Note that this is not the ideal bytecode -- there are two branches per
loop iteration when there could just be one. =A0But it's what the exist= ing
tree-il compiler would produce.

C, with gcc -O0, disassembled:

=A0#include <stdlib.h>

=A0int
=A0main (int argc, char *argv[])
=A0{
=A0 =A0400514: =A0 =A0 55 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0push = =A0 %rbp
=A0 =A0400515: =A0 =A0 48 89 e5 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0= %rsp,%rbp
=A0 =A0400518: =A0 =A0 48 83 ec 20 =A0 =A0 =A0 =A0 =A0 =A0 sub =A0 =A0$0x2= 0,%rsp
=A0 =A040051c: =A0 =A0 89 7d ec =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0= %edi,-0x14(%rbp)
=A0 =A040051f: =A0 =A0 48 89 75 e0 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0 =A0%rsi= ,-0x20(%rbp)
=A0 =A0if (argc !=3D 2)
=A0 =A0400523: =A0 =A0 83 7d ec 02 =A0 =A0 =A0 =A0 =A0 =A0 cmpl =A0 $0x2,-= 0x14(%rbp)
=A0 =A0400527: =A0 =A0 74 07 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 je =A0 = =A0 400530 <main+0x1c>
=A0 =A0 =A0return 1;
=A0 =A0400529: =A0 =A0 b8 01 00 00 00 =A0 =A0 =A0 =A0 =A0mov =A0 =A0$0x1,%= eax
=A0 =A040052e: =A0 =A0 eb 2e =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jmp =A0 = =A040055e <main+0x4a>
=A0 =A0long l =3D atol (argv[1]);
=A0 =A0400530: =A0 =A0 48 8b 45 e0 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0 =A0-0x2= 0(%rbp),%rax
=A0 =A0400534: =A0 =A0 48 83 c0 08 =A0 =A0 =A0 =A0 =A0 =A0 add =A0 =A0$0x8= ,%rax
=A0 =A0400538: =A0 =A0 48 8b 00 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0= (%rax),%rax
=A0 =A040053b: =A0 =A0 48 89 c7 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mov =A0 =A0= %rax,%rdi
=A0 =A040053e: =A0 =A0 e8 dd fe ff ff =A0 =A0 =A0 =A0 =A0callq =A0400420 &= lt;atol@plt>
=A0 =A0400543: =A0 =A0 48 89 45 f8 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0 =A0%rax= ,-0x8(%rbp)
=A0 =A0while (l--);
=A0 =A0400547: =A0 =A0 90 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nop =A0 =A0400548: =A0 =A0 48 83 7d f8 00 =A0 =A0 =A0 =A0 =A0cmpq =A0 $0x0,-0x= 8(%rbp)
=A0 =A040054d: =A0 =A0 0f 95 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0setne =A0%a= l
=A0 =A0400550: =A0 =A0 48 83 6d f8 01 =A0 =A0 =A0 =A0 =A0subq =A0 $0x1,-0x= 8(%rbp)
=A0 =A0400555: =A0 =A0 84 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 test =A0 = %al,%al
=A0 =A0400557: =A0 =A0 75 ef =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 jne =A0 = =A0400548 <main+0x34>
=A0 =A0return 0;
=A0 =A0400559: =A0 =A0 b8 00 00 00 00 =A0 =A0 =A0 =A0 =A0mov =A0 =A0$0x0,%= eax
=A0}
=A0 =A040055e: =A0 =A0 c9 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0leave= q
=A0 =A040055f: =A0 =A0 c3 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0retq<= br>
=A0$ time ./a.out 1000000000

=A0real =A00m2.061s
=A0user =A00m2.056s
=A0sys =A0 0m0.000s

Of course with -O2 the loop goes away entirely ;) =A0But it's an
interesting exercise.

Andy
--
http://wingolog.org/=


--bcaec5396b5087ba6b04c1e752c9--