[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] Performance

From: Camm Maguire
Subject: [Gcl-devel] Performance
Date: Fri, 19 Jun 2009 16:18:10 -0400
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

Greetings!  I'm in the middle of a major modification to the cvs head
tree, and am reviewing the performance implications.  Non-vanilla
function calls are about to get much faster.  But there are many
questions remaining.

I'm doing a quick comparison of the standard benchmark results
vs. sbcl.  We win on quite a few, but there remain some long standing
clunker areas.

1) catch/throw.  (e.g. the ctak benchmark)  GCL handles this via
setjmp and longjmp.  Right now there are way too many function calls
in the implementation, but in my tree, using the new dladdress
external function pointer mechanism, I can get this down to one a
piece for throw and catch.  This appears commensurate with the sbcl
disassembly.  We remain significantly slower even with these changes.
A big piece is likely that we return the values on the lisp stack
instead of the C stack.  It would be possible to make use of setjmp's
int return on 32bit machines only to return the primary value.  Is
there no analog on 64bit?  Are there any other secrets to the sbcl
implementation which might carry over to C?

For this one benchmark, it is possible for the compiler to optimize
away the catch/throw and simply emulate with a return (see below).  I
don't think this is what sbcl is doing.

(defun ctak (x y z)
  (catch 'ctak (ctak-aux x y z)))

(defun ctak-aux (x y z)
  (declare (fixnum x y z))
  (cond ((not (< y x))
         (throw 'ctak z))
        (t (ctak-aux
             (catch 'ctak
               (ctak-aux (the fixnum (1- x))
             (catch 'ctak
               (ctak-aux (the fixnum (1- y))
             (catch 'ctak
               (ctak-aux (the fixnum (1- z))

2) list cache prefetching -- sbcl does not show any explicit prefetch
instructions in its disassembly of #'length, yet contiguous lists are
accessed much faster than randomly located lists, whereas our
performance is even with the random performance in both cases.  I
tried putting in explicit prefetch calls some time ago without much
benefit -- this is awfully fiddly stuff which varies significantly
with machine.  I suspect this is due to non-optimal data compression
in the GC algorithm. ...  Just ran a test where sorting the list by
cons address is very fast without prefetch confirming this hunch.
Does anyone know of a good succinct exposition of copying and or
compressing gc algorithms?  Some time ago we discussed a 'shrink-wrap'
gc idea for gcl -- right now our memory pages are grow only.

Take care,

Camm Maguire                                        address@hidden
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]