guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Take some lowhanging fruit to speed up R6RS fixnum operations


From: Andreas Rottmann
Subject: Re: Take some lowhanging fruit to speed up R6RS fixnum operations
Date: Wed, 30 Mar 2011 12:58:13 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

[ Sorry for the incomplete mail I just sent; hit the wrong key combo ]

Andy Wingo <address@hidden> writes:

> On Wed 23 Mar 2011 00:20, Andreas Rottmann <address@hidden> writes:
>
>> In porting dorodango[0], I have noticed that Guile's R6RS fixnum
>> operations are quite slow; here's a patch to remedy that a bit.
>
> What is the state of things here?  I'm a bit lost in the discussion :)
>
> You said that decompressing the zip file takes 74 seconds with Guile as
> it is now.  What if you change to use Guile's arithmetic instead of
> fxops?  That would give is a good baseline so that we can know how much
> overhead the fxops have.
>
As promised, here are a bunch of numbers that will hopefully help to
determine the desired course of action:

| Codebase           | fixnum? | fxxor | unzip |
|--------------------+---------+-------+-------|
| vanilla            |    5.66 | 24.69 |  65.9 |
| generic            |    5.62 |  1.92 |   7.7 |
| smart              |    3.42 | 21.29 |  56.3 |
| smart + opt        |    3.42 |  6.99 |  21.9 |
| smart + opt-macros |     3.4 |  5.56 |  21.0 |
| vmop               |    1.78 | 18.83 |  47.9 |
| vmop + opt         |    1.77 |  3.94 |  14.1 |
| vmop + opt-macros  |    1.78 |   3.7 |  13.3 |

As you can see, in the "vanilla" case (i.e. current stable-2.0), it's
65.9 seconds -- I do not know where I got the 74 second number from,
perhaps it was from a profiling run.

The "generic" case is the one where all fixnum checks have been
eliminated (i.e. using the generic procedures, such as logxor), thus
allowing the compiler to emit VM primitives for most of the operations.

Now, with these baselines established, I implemented the following
optimizations, and benchmarked combinations of them as can be seen from
the above table:

- "smart" refers to having `fixnum?' implemented via `define-inline' and
  using `object-address'.

- "opt" refers to having many fixnum procedures (e.g. fx+, fxxor, ...)
  going through a case-lambda, thus optimizing the common, two-argument
  case not to cons.

- "opt-macros" refers to making the same procedures as in "opt" actually
  macros, expanding to a `(let ((x x-expr) (y y-expr)) (OP x y))' in the
  binary case.

- "vmop" means that the newly-introduced VM primitive `fixnum?' is used.

Regarding the columns, "fixnum?" is the time needed for 10 million
`fixnum?' checks, "fxxor" for 10 million binary `fxxor' operations, and
"unzip" times the unzipping of a not-so-large (675KiB) ZIP file using
Göran Weinholts ZIP implementation.

Now there are several questions:

- Can we put the `fixnum?' VM primitive into stable-2.0?  AFAIK, yes we
  can, but we will break things for users mixing 2.0.0 and 2.0.1 on the
  same machine (or sharing $HOME).  So do we want to?

- Should we complicate the implementation and take a potential code size
  hit by using the "opt-macros" hack, for a 5-6% gain in speed? Of
  course, that's just for the ZIP benchmark, but I think it gives a good
  estimate in what kind of ballpark we are in here.

  My preference would be to go with the simpler implementation ("opt"
  instead of "opt-macros").

- Should further improvements be considered?  I guess if we put the
  fixnum operations into the VM, we could get as fast as the "generic"
  case, and perhaps even a bit faster (i.e. about a 2x speedup on the
  ZIP benchmark compared to vmop + opt).  I'm not sure if that's worth
  it.

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.yi.org/>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]