* ballpark rtl speeds
@ 2012-06-07 8:47 Andy Wingo
2012-06-07 19:59 ` Stefan Israelsson Tampe
2012-06-17 22:00 ` Ludovic Courtès
0 siblings, 2 replies; 3+ messages in thread
From: Andy Wingo @ 2012-06-07 8:47 UTC (permalink / raw)
To: guile-devel
Hi,
Some ballpark measurements of the overhead of the old VM, the new VM,
and C (compiled with gcc -g -O0).
Old interpreter:
$ guile --no-debug
> (define (countdown* n)
(let lp ((n n))
(if (zero? n)
#t
(lp (1- n)))))
> ,time (countdown* 1000000000)
;; 14.054572s real time, 14.033213s run time. 0.000000s spent in GC.
New interpreter:
> (use-modules (system vm rtl))
> (define countdown
(assemble-program
'((begin-program countdown 1)
(assert-nargs-ee/locals 1 2)
(br fix-body)
(label loop-head)
(load-constant 2 0)
(br-if-= 1 2 out)
(sub1 1 1)
(br loop-head)
(label fix-body)
(mov 1 0)
(br loop-head)
(label out)
(load-constant 0 #t)
(return 0))))
> ,time (countdown 1000000000)
;; 6.023658s real time, 6.014166s run time. 0.000000s spent in GC.
Note that this is not the ideal bytecode -- there are two branches per
loop iteration when there could just be one. But it's what the existing
tree-il compiler would produce.
C, with gcc -O0, disassembled:
#include <stdlib.h>
int
main (int argc, char *argv[])
{
400514: 55 push %rbp
400515: 48 89 e5 mov %rsp,%rbp
400518: 48 83 ec 20 sub $0x20,%rsp
40051c: 89 7d ec mov %edi,-0x14(%rbp)
40051f: 48 89 75 e0 mov %rsi,-0x20(%rbp)
if (argc != 2)
400523: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
400527: 74 07 je 400530 <main+0x1c>
return 1;
400529: b8 01 00 00 00 mov $0x1,%eax
40052e: eb 2e jmp 40055e <main+0x4a>
long l = atol (argv[1]);
400530: 48 8b 45 e0 mov -0x20(%rbp),%rax
400534: 48 83 c0 08 add $0x8,%rax
400538: 48 8b 00 mov (%rax),%rax
40053b: 48 89 c7 mov %rax,%rdi
40053e: e8 dd fe ff ff callq 400420 <atol@plt>
400543: 48 89 45 f8 mov %rax,-0x8(%rbp)
while (l--);
400547: 90 nop
400548: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp)
40054d: 0f 95 c0 setne %al
400550: 48 83 6d f8 01 subq $0x1,-0x8(%rbp)
400555: 84 c0 test %al,%al
400557: 75 ef jne 400548 <main+0x34>
return 0;
400559: b8 00 00 00 00 mov $0x0,%eax
}
40055e: c9 leaveq
40055f: c3 retq
$ time ./a.out 1000000000
real 0m2.061s
user 0m2.056s
sys 0m0.000s
Of course with -O2 the loop goes away entirely ;) But it's an
interesting exercise.
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ballpark rtl speeds
2012-06-07 8:47 ballpark rtl speeds Andy Wingo
@ 2012-06-07 19:59 ` Stefan Israelsson Tampe
2012-06-17 22:00 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Stefan Israelsson Tampe @ 2012-06-07 19:59 UTC (permalink / raw)
To: Andy Wingo; +Cc: guile-devel
[-- Attachment #1: Type: text/plain, Size: 3511 bytes --]
Great!
So If you code the new VM interpreter you get 2x improvement
If you generate code and compile with no optimization about another 3x
If you are able to generate code that compiles with optimisation bsically
using a register
you will get ?
Using a register as a storage on my machine yields 0.4s and the above c code
was using about 2.6s. About a further 6x in performance.
Great work!
/Stefan
On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo <wingo@pobox.com> wrote:
> Hi,
>
> Some ballpark measurements of the overhead of the old VM, the new VM,
> and C (compiled with gcc -g -O0).
>
> Old interpreter:
>
> $ guile --no-debug
> > (define (countdown* n)
> (let lp ((n n))
> (if (zero? n)
> #t
> (lp (1- n)))))
> > ,time (countdown* 1000000000)
> ;; 14.054572s real time, 14.033213s run time. 0.000000s spent in GC.
>
> New interpreter:
>
> > (use-modules (system vm rtl))
> > (define countdown
> (assemble-program
> '((begin-program countdown 1)
> (assert-nargs-ee/locals 1 2)
> (br fix-body)
> (label loop-head)
> (load-constant 2 0)
> (br-if-= 1 2 out)
> (sub1 1 1)
> (br loop-head)
> (label fix-body)
> (mov 1 0)
> (br loop-head)
> (label out)
> (load-constant 0 #t)
> (return 0))))
> > ,time (countdown 1000000000)
> ;; 6.023658s real time, 6.014166s run time. 0.000000s spent in GC.
>
> Note that this is not the ideal bytecode -- there are two branches per
> loop iteration when there could just be one. But it's what the existing
> tree-il compiler would produce.
>
> C, with gcc -O0, disassembled:
>
> #include <stdlib.h>
>
> int
> main (int argc, char *argv[])
> {
> 400514: 55 push %rbp
> 400515: 48 89 e5 mov %rsp,%rbp
> 400518: 48 83 ec 20 sub $0x20,%rsp
> 40051c: 89 7d ec mov %edi,-0x14(%rbp)
> 40051f: 48 89 75 e0 mov %rsi,-0x20(%rbp)
> if (argc != 2)
> 400523: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
> 400527: 74 07 je 400530 <main+0x1c>
> return 1;
> 400529: b8 01 00 00 00 mov $0x1,%eax
> 40052e: eb 2e jmp 40055e <main+0x4a>
> long l = atol (argv[1]);
> 400530: 48 8b 45 e0 mov -0x20(%rbp),%rax
> 400534: 48 83 c0 08 add $0x8,%rax
> 400538: 48 8b 00 mov (%rax),%rax
> 40053b: 48 89 c7 mov %rax,%rdi
> 40053e: e8 dd fe ff ff callq 400420 <atol@plt>
> 400543: 48 89 45 f8 mov %rax,-0x8(%rbp)
> while (l--);
> 400547: 90 nop
> 400548: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp)
> 40054d: 0f 95 c0 setne %al
> 400550: 48 83 6d f8 01 subq $0x1,-0x8(%rbp)
> 400555: 84 c0 test %al,%al
> 400557: 75 ef jne 400548 <main+0x34>
> return 0;
> 400559: b8 00 00 00 00 mov $0x0,%eax
> }
> 40055e: c9 leaveq
> 40055f: c3 retq
>
> $ time ./a.out 1000000000
>
> real 0m2.061s
> user 0m2.056s
> sys 0m0.000s
>
> Of course with -O2 the loop goes away entirely ;) But it's an
> interesting exercise.
>
> Andy
> --
> http://wingolog.org/
>
>
[-- Attachment #2: Type: text/html, Size: 4316 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ballpark rtl speeds
2012-06-07 8:47 ballpark rtl speeds Andy Wingo
2012-06-07 19:59 ` Stefan Israelsson Tampe
@ 2012-06-17 22:00 ` Ludovic Courtès
1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2012-06-17 22:00 UTC (permalink / raw)
To: guile-devel
Hi!
Andy Wingo <wingo@pobox.com> skribis:
> Some ballpark measurements of the overhead of the old VM, the new VM,
> and C (compiled with gcc -g -O0).
Woow, that looks promising. Thanks for the news.
Ludo’.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-17 22:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-07 8:47 ballpark rtl speeds Andy Wingo
2012-06-07 19:59 ` Stefan Israelsson Tampe
2012-06-17 22:00 ` Ludovic Courtès
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).