unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* ballpark rtl speeds
@ 2012-06-07  8:47 Andy Wingo
  2012-06-07 19:59 ` Stefan Israelsson Tampe
  2012-06-17 22:00 ` Ludovic Courtès
  0 siblings, 2 replies; 3+ messages in thread
From: Andy Wingo @ 2012-06-07  8:47 UTC (permalink / raw)
  To: guile-devel

Hi,

Some ballpark measurements of the overhead of the old VM, the new VM,
and C (compiled with gcc -g -O0).

Old interpreter:

  $ guile --no-debug
  > (define (countdown* n)
      (let lp ((n n))
        (if (zero? n)
            #t
            (lp (1- n)))))
  > ,time (countdown* 1000000000)
  ;; 14.054572s real time, 14.033213s run time.  0.000000s spent in GC.

New interpreter:

  > (use-modules (system vm rtl))
  > (define countdown
      (assemble-program
        '((begin-program countdown 1)
          (assert-nargs-ee/locals 1 2)
          (br fix-body)
          (label loop-head)
          (load-constant 2 0)
          (br-if-= 1 2 out)
          (sub1 1 1)
          (br loop-head)
          (label fix-body)
          (mov 1 0)
          (br loop-head)
          (label out)
          (load-constant 0 #t)
          (return 0))))
  > ,time (countdown 1000000000)
  ;; 6.023658s real time, 6.014166s run time.  0.000000s spent in GC.

Note that this is not the ideal bytecode -- there are two branches per
loop iteration when there could just be one.  But it's what the existing
tree-il compiler would produce.

C, with gcc -O0, disassembled:

  #include <stdlib.h>

  int
  main (int argc, char *argv[])
  {
    400514:	55                   	push   %rbp
    400515:	48 89 e5             	mov    %rsp,%rbp
    400518:	48 83 ec 20          	sub    $0x20,%rsp
    40051c:	89 7d ec             	mov    %edi,-0x14(%rbp)
    40051f:	48 89 75 e0          	mov    %rsi,-0x20(%rbp)
    if (argc != 2)
    400523:	83 7d ec 02          	cmpl   $0x2,-0x14(%rbp)
    400527:	74 07                	je     400530 <main+0x1c>
      return 1;
    400529:	b8 01 00 00 00       	mov    $0x1,%eax
    40052e:	eb 2e                	jmp    40055e <main+0x4a>
    long l = atol (argv[1]);
    400530:	48 8b 45 e0          	mov    -0x20(%rbp),%rax
    400534:	48 83 c0 08          	add    $0x8,%rax
    400538:	48 8b 00             	mov    (%rax),%rax
    40053b:	48 89 c7             	mov    %rax,%rdi
    40053e:	e8 dd fe ff ff       	callq  400420 <atol@plt>
    400543:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    while (l--);
    400547:	90                   	nop
    400548:	48 83 7d f8 00       	cmpq   $0x0,-0x8(%rbp)
    40054d:	0f 95 c0             	setne  %al
    400550:	48 83 6d f8 01       	subq   $0x1,-0x8(%rbp)
    400555:	84 c0                	test   %al,%al
    400557:	75 ef                	jne    400548 <main+0x34>
    return 0;
    400559:	b8 00 00 00 00       	mov    $0x0,%eax
  }
    40055e:	c9                   	leaveq 
    40055f:	c3                   	retq   

  $ time ./a.out 1000000000

  real	0m2.061s
  user	0m2.056s
  sys	0m0.000s

Of course with -O2 the loop goes away entirely ;)  But it's an
interesting exercise.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ballpark rtl speeds
  2012-06-07  8:47 ballpark rtl speeds Andy Wingo
@ 2012-06-07 19:59 ` Stefan Israelsson Tampe
  2012-06-17 22:00 ` Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Israelsson Tampe @ 2012-06-07 19:59 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3511 bytes --]

Great!

So If you code the new VM interpreter you get 2x improvement
If you generate code and compile with no optimization about another 3x
If you are able to generate code that compiles with optimisation bsically
using a register
you will get ?

Using a register as a storage on my machine yields 0.4s and the above c code
was using about 2.6s. About a further 6x in performance.

Great work!

/Stefan

On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo <wingo@pobox.com> wrote:

> Hi,
>
> Some ballpark measurements of the overhead of the old VM, the new VM,
> and C (compiled with gcc -g -O0).
>
> Old interpreter:
>
>  $ guile --no-debug
>  > (define (countdown* n)
>      (let lp ((n n))
>        (if (zero? n)
>            #t
>            (lp (1- n)))))
>  > ,time (countdown* 1000000000)
>  ;; 14.054572s real time, 14.033213s run time.  0.000000s spent in GC.
>
> New interpreter:
>
>  > (use-modules (system vm rtl))
>  > (define countdown
>      (assemble-program
>        '((begin-program countdown 1)
>          (assert-nargs-ee/locals 1 2)
>          (br fix-body)
>          (label loop-head)
>          (load-constant 2 0)
>          (br-if-= 1 2 out)
>          (sub1 1 1)
>          (br loop-head)
>          (label fix-body)
>          (mov 1 0)
>          (br loop-head)
>          (label out)
>          (load-constant 0 #t)
>          (return 0))))
>  > ,time (countdown 1000000000)
>  ;; 6.023658s real time, 6.014166s run time.  0.000000s spent in GC.
>
> Note that this is not the ideal bytecode -- there are two branches per
> loop iteration when there could just be one.  But it's what the existing
> tree-il compiler would produce.
>
> C, with gcc -O0, disassembled:
>
>  #include <stdlib.h>
>
>  int
>  main (int argc, char *argv[])
>  {
>    400514:     55                      push   %rbp
>    400515:     48 89 e5                mov    %rsp,%rbp
>    400518:     48 83 ec 20             sub    $0x20,%rsp
>    40051c:     89 7d ec                mov    %edi,-0x14(%rbp)
>    40051f:     48 89 75 e0             mov    %rsi,-0x20(%rbp)
>    if (argc != 2)
>    400523:     83 7d ec 02             cmpl   $0x2,-0x14(%rbp)
>    400527:     74 07                   je     400530 <main+0x1c>
>      return 1;
>    400529:     b8 01 00 00 00          mov    $0x1,%eax
>    40052e:     eb 2e                   jmp    40055e <main+0x4a>
>    long l = atol (argv[1]);
>    400530:     48 8b 45 e0             mov    -0x20(%rbp),%rax
>    400534:     48 83 c0 08             add    $0x8,%rax
>    400538:     48 8b 00                mov    (%rax),%rax
>    40053b:     48 89 c7                mov    %rax,%rdi
>    40053e:     e8 dd fe ff ff          callq  400420 <atol@plt>
>    400543:     48 89 45 f8             mov    %rax,-0x8(%rbp)
>    while (l--);
>    400547:     90                      nop
>    400548:     48 83 7d f8 00          cmpq   $0x0,-0x8(%rbp)
>    40054d:     0f 95 c0                setne  %al
>    400550:     48 83 6d f8 01          subq   $0x1,-0x8(%rbp)
>    400555:     84 c0                   test   %al,%al
>    400557:     75 ef                   jne    400548 <main+0x34>
>    return 0;
>    400559:     b8 00 00 00 00          mov    $0x0,%eax
>  }
>    40055e:     c9                      leaveq
>    40055f:     c3                      retq
>
>  $ time ./a.out 1000000000
>
>  real  0m2.061s
>  user  0m2.056s
>  sys   0m0.000s
>
> Of course with -O2 the loop goes away entirely ;)  But it's an
> interesting exercise.
>
> Andy
> --
> http://wingolog.org/
>
>

[-- Attachment #2: Type: text/html, Size: 4316 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ballpark rtl speeds
  2012-06-07  8:47 ballpark rtl speeds Andy Wingo
  2012-06-07 19:59 ` Stefan Israelsson Tampe
@ 2012-06-17 22:00 ` Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2012-06-17 22:00 UTC (permalink / raw)
  To: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> Some ballpark measurements of the overhead of the old VM, the new VM,
> and C (compiled with gcc -g -O0).

Woow, that looks promising.  Thanks for the news.

Ludo’.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-17 22:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-07  8:47 ballpark rtl speeds Andy Wingo
2012-06-07 19:59 ` Stefan Israelsson Tampe
2012-06-17 22:00 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).