gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] hashing


From: Camm Maguire
Subject: [Gcl-devel] hashing
Date: Thu, 14 Nov 2013 11:25:46 -0500

Greetings!  This is just a note for posterity on some of the recent
learnings regarding hashing in GCL.  Except for symbol lookups in
packages, gcl uses 'open addressing' hash tables, as is customary for a
one word stored 'value', i.e. a lisp object pointer.  With this
strategy, collisions result in a linear probe of the table until either
an empty slot or the desired key is found.  The negative performance
effects of the collision therefore occur not only for keys hashing to
the same index, but for those hashing to *adjacent* positions.  Using a
cons address directly as an index will thus suffer, as conses are
frequently allocated in bulk in a contiguous region of memory.

The current solution relies on our 'universal' hash function, i.e. a 256
wide table of random 64bit numbers indirected by bytes of the key and
xored together, to not only distribute the hashed items uniformly
throughout the table, but also to minimize the possibility of
consecutive indices.  I think these criteria should be fulfilled at
present, though there are explicit algorithmic solutions to the
consecutive index problem which do not impinge on the hash function.
These require more memory and complexity, and seem unwarranted at
present.

What we were seeing with the recent performance degradation was a strong
dependence on the data layout as represented by the addresses in use.
As an historical aside we should note that GCL did randomize these
addresses in the past, but it was found to be 'too good' in the sense
that there were apparently ideal memory layouts which could be indexed
faster if the addresses were used directly and the randomization
skipped.  I believe this discussion was in an ACL2 context some years
ago.  This is just to note that we are now 'reverting that reversion',
and opting for a good average performance which (should be) independent
of data layout.

Take care,
-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]