Re: weak hash tables

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: weak hash tables

From:	Andy Wingo
Subject:	Re: weak hash tables
Date:	Tue, 15 Nov 2011 22:33:41 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)

Hi Ludo,

On Fri 11 Nov 2011 21:30, address@hidden (Ludovic Courtès) writes:

> Andy Wingo <address@hidden> skribis:
>
>> While doing the recent retagging work I realized that I really wanted to
>> kill weak pairs.  They look like pairs but you can't actually access
>> their fields using car and cdr -- terrible things.
>
> Agreed.  The reason was to keep the weak hash table API unchanged.

Understood.  However given that we have now prohibited the get-handle
API from being used on weak tables, we do have a bit more freedom of
implementation.

> Could you provide an executive summary ;-) of:

Hehe, OK :)

>   - how good ol’ weak hash tables related to the new weak sets and weak
>     tables?

All of the uses of weak hash tables in Guile were in a multithreaded
context.  Weak sets and weak tables now include locking, so that user
code doesn't have to lock.

Weak sets and tables store the hash value in the set or table entries,
which is obviously more efficient, and avoids the need to store hash /
equality functions in the table itself (as the old hash tables need to
do, for purposes of rehashing).

The old weak table implementation had to vacuum the ribs and buckets
after GC to avoid taking up memory on behalf of disappeared items.  The
new tables store the cells in a linear array, which can have empty
entries, meaning that adding an entry to the table doesn't allocate
memory -- besides the libgc-internal memory allocated when creating a
disappearing link.  (We could have fixed the old weak hash table with
chained finalizers, but that's tricky too.)

A weak table with N available slots takes up N*3 words: one for the hash
of the key, one for the key, and one for the value.  A weak hash table
with N buckets and N/2 entries takes up N words for the buckets, and
N/2*(2+2) = N*2 words for the entries (one each for the rib, one for the
handle), yielding N*3.  So they are similar regarding memory usage,
though they are different GC loads.

The old implementation used disappearing leaks incorrectly, as it
accessed the weak car and cdr without the GC alloc lock.  The new
implementation takes the alloc lock when getting the key and value.  It
doesn't take the lock when comparing hash values though.

>   - what the deal is wrt. performance, linear probing, and rehashing?

In theory linear probing is faster, if you can keep the max search
length under control.  (Robin hood hashing provides this property.)
This is because sequential buckets are likely to be on the same cache
lines, unlike the buckets and ribs.  SBCL hacker Paul Khoung argues that
here:

  http://www.pvk.ca/Blog/more_numerical_experiments_in_hashing.html

However in our case, adding disappearing links and taking the alloc lock
likely destroys these advantages.

> Is there any advantage to using the new weak table API instead of the
> old weak hash table API?  The names suggest that they’re not quite
> equivalent, but maybe that’s just the names?

Not really.  It's just that using the old API introduces a sort of
pseudo-generic interface for hash tables: weak tables don't share any
implementation bits at all, but they use the same functions, except the
ones they don't use.

The thing that would make the most sense would be some sort of generic
map interface, like the array implementation interface.  But that's a
different project, eh...

Cheers,

Andy
-- 
http://wingolog.org/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: weak hash tables, Ludovic Courtès, 2011/11/11
- Re: weak hash tables, Andy Wingo <=

Prev by Date: Re: GNU Guile branch, master, updated. v2.1.0-102-g0f9f51a
Next by Date: Guildhall
Previous by thread: Re: weak hash tables
Next by thread: Guildhall
Index(es):
- Date
- Thread