Re: ballpark rtl speeds

On Thu, Jun 7, 2012 at 10:47 AM, Andy Wingo <address@hidden> wrote:

Hi,

Some ballpark measurements of the overhead of the old VM, the new VM,
and C (compiled with gcc -g -O0).

Old interpreter:

$ guile --no-debug
> (define (countdown* n)
(let lp ((n n))
(if (zero? n)
#t
(lp (1- n)))))
> ,time (countdown* 1000000000)
;; 14.054572s real time, 14.033213s run time. 0.000000s spent in GC.

New interpreter:

> (use-modules (system vm rtl))
> (define countdown
(assemble-program
'((begin-program countdown 1)
(assert-nargs-ee/locals 1 2)
(br fix-body)
(label loop-head)
(load-constant 2 0)
(br-if-= 1 2 out)
(sub1 1 1)
(br loop-head)
(label fix-body)
(mov 1 0)
(br loop-head)
(label out)
(load-constant 0 #t)
(return 0))))
> ,time (countdown 1000000000)
;; 6.023658s real time, 6.014166s run time. 0.000000s spent in GC.

Note that this is not the ideal bytecode -- there are two branches per
loop iteration when there could just be one. But it's what the existing
tree-il compiler would produce.

C, with gcc -O0, disassembled:

#include <stdlib.h>

int
main (int argc, char *argv[])
{
400514: 55 push %rbp
400515: 48 89 e5 mov %rsp,%rbp
400518: 48 83 ec 20 sub $0x20,%rsp
40051c: 89 7d ec mov %edi,-0x14(%rbp)
40051f: 48 89 75 e0 mov %rsi,-0x20(%rbp)
if (argc != 2)
400523: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
400527: 74 07 je 400530 <main+0x1c>
return 1;
400529: b8 01 00 00 00 mov $0x1,%eax
40052e: eb 2e jmp 40055e <main+0x4a>
long l = atol (argv[1]);
400530: 48 8b 45 e0 mov -0x20(%rbp),%rax
400534: 48 83 c0 08 add $0x8,%rax
400538: 48 8b 00 mov (%rax),%rax
40053b: 48 89 c7 mov %rax,%rdi
40053e: e8 dd fe ff ff callq 400420 <address@hidden>
400543: 48 89 45 f8 mov %rax,-0x8(%rbp)
while (l--);
400547: 90 nop
400548: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp)
40054d: 0f 95 c0 setne %al
400550: 48 83 6d f8 01 subq $0x1,-0x8(%rbp)
400555: 84 c0 test %al,%al
400557: 75 ef jne 400548 <main+0x34>
return 0;
400559: b8 00 00 00 00 mov $0x0,%eax
}
40055e: c9 leaveq
40055f: c3 retq

$ time ./a.out 1000000000

real 0m2.061s
user 0m2.056s
sys 0m0.000s

Of course with -O2 the loop goes away entirely ;) But it's an
interesting exercise.

Andy
--
http://wingolog.org/

From:	Stefan Israelsson Tampe
Subject:	Re: ballpark rtl speeds
Date:	Thu, 7 Jun 2012 21:59:33 +0200