gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] efficiency of package symbol tables


From: Peter Wood
Subject: Re: [Gcl-devel] efficiency of package symbol tables
Date: Tue, 9 Mar 2004 05:42:58 +0100
User-agent: Mutt/1.4i

Hi

On Thu, Mar 04, 2004 at 10:05:04PM +0100, Bruno Haible wrote:
> Hi,
> 
> Sam Steingold and I found a performance problem in clisp, and it appears
> that GCL has the same problem: When tens of thousands of symbols are added
> to a package, further package lookups and symbol additions become slower
> and slower. This is of course not the proper behaviour for good hash tables.
> 
> Here is the test code:
> 
> (defun random-symbol ()
>   (intern (write-to-string (random 1000000000000000000) :base 36)
>           *package*))
> (compile 'random-symbol)
> 
> (defun fill-symtab (n)
>   (dotimes (i n) (random-symbol)))
> (compile 'fill-symtab)

...

> What helped us in order to understand the problem, was to look at the
> internal distribution, how many symbols in each bucket of the hash table
> on average...

GCL computes the hash index based on the length of the
symbol-name... Collecting some statistics on your #'random-symbol:

(defun symlen-stat (n)
  (let ((statarr (make-array 13 :element-type 'fixnum :initial-element 0)))
    (dotimes (i n)
      (incf (aref statarr (length (write-to-string (random 1000000000000000000) 
:base 36)))))
    statarr))

>(symlen-stat 10)
#(0 0 0 0 0 0 0 0 0 0 0 1 9)

>(symlen-stat 100)
#(0 0 0 0 0 0 0 0 0 0 0 15 85)

>(symlen-stat 1000)
#(0 0 0 0 0 0 0 0 0 0 1 130 869)

>(symlen-stat 10000)
#(0 0 0 0 0 0 0 0 0 3 41 1299 8657)

>(symlen-stat 100000)
#(0 0 0 0 0 0 0 0 0 14 379 12797 86810)

So although the symbols generated by #'random-symbol differ randomly
in their symbol-name they don't do so in their length, but are heavily
weighted towards the longest length possible.  Presumably, one could
work around this bug (if it is a bug) by generating symbols which
differed randomly in their length over a wide range.

In the best case, with a variation of 1 to 12 chars per symbol, you
will still only be storing the symbols in 12 different buckets.

Eg, if it doesn't matter if some symbols will be ||, you could do:
        
(defun random-symbol ()
  (let ((symlen (random 50)))
    (intern (map 'string #'(lambda (c) (setf c (code-char (+ 48 (random (- 122 
48)))))) 
                 (make-string symlen)))))


Now I get the following results (with gc notification turned on so we
know where the time is really being spent).


>(time (fill-symtab 10000))
[GC for 0 RELOCATABLE-BLOCKS pages..(T=1).GC finished]
real time : 1.000 secs
run time  : 0.390 secs
NIL

>(time (fill-symtab 10000))
real time : 0.000 secs
run time  : 0.420 secs
NIL

>(time (fill-symtab 10000))
[GC for 0 RELOCATABLE-BLOCKS pages..(T=10).GC finished]
real time : 0.000 secs
run time  : 0.510 secs
NIL

>(time (fill-symtab 10000))
real time : 0.000 secs
run time  : 0.400 secs
NIL

>(time (fill-symtab 10000))
real time : 1.000 secs
run time  : 0.410 secs
NIL

>(time (fill-symtab 10000))
[GC for 1000 SYMBOL pages..(T=12).GC finished]
real time : 0.000 secs
run time  : 0.530 secs
NIL

>(time (fill-symtab 10000))
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=18).GC finished]
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=16).GC finished]
real time : 1.000 secs
run time  : 0.790 secs
NIL

>(time (fill-symtab 10000))
real time : 1.000 secs
run time  : 0.400 secs
NIL

>(time (fill-symtab 10000))
[GC for 1500 SYMBOL pages..(T=14).GC finished]
real time : 1.000 secs
run time  : 0.550 secs
NIL

>(time (fill-symtab 10000))
real time : 0.000 secs
run time  : 0.410 secs
NIL

>>(time (fill-symtab 10000))
real time : 0.000 secs
run time  : 0.410 secs
NIL

>(time (fill-symtab 10000))
real time : 0.000 secs
run time  : 0.410 secs
NIL

>(time (fill-symtab 10000))
[GC for 1000 STRING pages..(T=22).GC finished]
[GC for 2250 SYMBOL pages..(T=20).GC finished]
real time : 0.000 secs
run time  : 0.830 secs
NIL

>(time (fill-symtab 10000))
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=33).GC finished]
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=29).GC finished]
real time : 1.000 secs
run time  : 1.170 secs
NIL

>(time (fill-symtab 10000))
real time : 1.000 secs
run time  : 0.400 secs
NIL


Regards,

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]