guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

hastbables in scheme, are they slow? using goops is this madness


From: Stefan Israelsson Tampe
Subject: hastbables in scheme, are they slow? using goops is this madness
Date: Tue, 22 Feb 2022 00:05:43 +0100

Hi,

I did some research about guile's hashtables and have tried a few versions of it in guile scheme from using enormous macro expansions to goops object oriented layers. What I found is

1. For small hashtables, scheme based solutions are superior to C based
2-3X faster for a hastables of size 40

(let lp ((i 0)) (if (< i 2000000) (let lp2 ((j 0)) (if (< j 40) (begin (hashq-set! h j (+ 1 (hashq-ref h j 0)) (lp2 (+ j 2)) (lp (+ i 1)))


2. For very large hash tables C based solutions are about 1.5-2.0 faster. 
(for-each (lambda (i) (hashq-set! h i i)) (iota 20000000))

3. iterating over the hashtable is 15X faster with a scheme based solution and then it is also preserving the order of insertion, we are not entirely restricted to fold/for-each but all proper loops are this speedy. This means that the overhead is about as much as using spaghetti stack one shot generators. This is actually 5X faster than python3 (but python are better on large examples due to the fact that (hash n) = n for numbers in python.

(hash-fold (lambda (k v s) (+ v s)) 0 h)

4. I experimented with goops and a more functional approach and a pure crazy macro everything approach. Some function needs to be done as a macro, but compartmetizing and organizing the hash environment with goops is about 10% slower than an expand everything strategy. We use struct-ref extensible to cheat a little and and also, although tha accessors and remove functions are goops methods, we can grovel the pure functions inside the goops architecture like so

(with-hash-accessors (H : href hset hremove)
     ... 
     (hset k (+ 12 (href k 0)))
     ...)

So with the goops database we easily construct all the normal accessors and have a nice extensible system in case one would like to use more general hash and checking predicates. 

Code:
https://gitlab.com/python-on-guile/python-on-guile/-/blob/master/modules/ice-9/hashish.scm

For small hashes, it is better to use a vector or assoc and the system dispatches between those two regions, here a C based helper function could speed things up, but that's the only application of C that is valuable and it is pretty limited in scope.

Removing is done by setting the cdr of the handle to a special tag and a counter is incremented. When iterating those slots are jumped over and when there is enough space, the hashtable is copied over to a fresh new one. This means that removing and restoring the same key is not allocating if done close in time and also it simplifies the managing of the insertion order property.

The class is mainly
assoc  = vector which contains the handles in insertion order
bins     = vector where the hash index the assoc which is searched
n          = number of elements in the assoc insertion order vector
m         = number of deleted elements

Fun fact, we can store a hash as well in the class if we like, because the order is not important in many applications semantically, one can just take the xor of all the elements in the datastructure, there are different variants here, like only for the keys to get more of a set hash or both to get the total hash. This means that it is very fast usually to discover if two hashmaps or sets are different.

Any thoughts or comments are welcomed.

LICENSE
LGPL v2 or v3

reply via email to

[Prev in Thread] Current Thread [Next in Thread]