guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ballpark rtl speeds


From: Stefan Israelsson Tampe
Subject: Re: ballpark rtl speeds
Date: Thu, 7 Jun 2012 21:59:33 +0200

Great!

So If you code the new VM interpreter you get 2x improvement
If you generate code and compile with no optimization about another 3x
If you are able to generate code that compiles with optimisation bsically using a register
you will get ?

Using a register as a storage on my machine yields 0.4s and the above c code
was using about 2.6s. About a further 6x in performance.

Great work!

/Stefan

On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo <address@hidden> wrote:
Hi,

Some ballpark measurements of the overhead of the old VM, the new VM,
and C (compiled with gcc -g -O0).

Old interpreter:

 $ guile --no-debug
 > (define (countdown* n)
     (let lp ((n n))
       (if (zero? n)
           #t
           (lp (1- n)))))
 > ,time (countdown* 1000000000)
 ;; 14.054572s real time, 14.033213s run time.  0.000000s spent in GC.

New interpreter:

 > (use-modules (system vm rtl))
 > (define countdown
     (assemble-program
       '((begin-program countdown 1)
         (assert-nargs-ee/locals 1 2)
         (br fix-body)
         (label loop-head)
         (load-constant 2 0)
         (br-if-= 1 2 out)
         (sub1 1 1)
         (br loop-head)
         (label fix-body)
         (mov 1 0)
         (br loop-head)
         (label out)
         (load-constant 0 #t)
         (return 0))))
 > ,time (countdown 1000000000)
 ;; 6.023658s real time, 6.014166s run time.  0.000000s spent in GC.

Note that this is not the ideal bytecode -- there are two branches per
loop iteration when there could just be one.  But it's what the existing
tree-il compiler would produce.

C, with gcc -O0, disassembled:

 #include <stdlib.h>

 int
 main (int argc, char *argv[])
 {
   400514:     55                      push   %rbp
   400515:     48 89 e5                mov    %rsp,%rbp
   400518:     48 83 ec 20             sub    $0x20,%rsp
   40051c:     89 7d ec                mov    %edi,-0x14(%rbp)
   40051f:     48 89 75 e0             mov    %rsi,-0x20(%rbp)
   if (argc != 2)
   400523:     83 7d ec 02             cmpl   $0x2,-0x14(%rbp)
   400527:     74 07                   je     400530 <main+0x1c>
     return 1;
   400529:     b8 01 00 00 00          mov    $0x1,%eax
   40052e:     eb 2e                   jmp    40055e <main+0x4a>
   long l = atol (argv[1]);
   400530:     48 8b 45 e0             mov    -0x20(%rbp),%rax
   400534:     48 83 c0 08             add    $0x8,%rax
   400538:     48 8b 00                mov    (%rax),%rax
   40053b:     48 89 c7                mov    %rax,%rdi
   40053e:     e8 dd fe ff ff          callq  400420 <address@hidden>
   400543:     48 89 45 f8             mov    %rax,-0x8(%rbp)
   while (l--);
   400547:     90                      nop
   400548:     48 83 7d f8 00          cmpq   $0x0,-0x8(%rbp)
   40054d:     0f 95 c0                setne  %al
   400550:     48 83 6d f8 01          subq   $0x1,-0x8(%rbp)
   400555:     84 c0                   test   %al,%al
   400557:     75 ef                   jne    400548 <main+0x34>
   return 0;
   400559:     b8 00 00 00 00          mov    $0x0,%eax
 }
   40055e:     c9                      leaveq
   40055f:     c3                      retq

 $ time ./a.out 1000000000

 real  0m2.061s
 user  0m2.056s
 sys   0m0.000s

Of course with -O2 the loop goes away entirely ;)  But it's an
interesting exercise.

Andy
--
http://wingolog.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]