[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[MIT-Scheme-devel] specifying hash table weakness
From: |
Taylor R Campbell |
Subject: |
[MIT-Scheme-devel] specifying hash table weakness |
Date: |
Sat, 14 Aug 2010 20:15:19 +0000 |
User-agent: |
IMAIL/1.21; Edwin/3.116; MIT-Scheme/9.0.1 |
Bike shed time!
There are many different options in hash tables for weakly referenced
keys and data. When I implement these, we could multiply the various
classes of procedures for constructing hash table constructors and
types and instances by all the possible weakness options:
x-HASH-TABLE/CONSTRUCTOR
x-EQ-HASH-TABLE-TYPE strong
x-EQV-HASH-TABLE-TYPE key-weak (formerly `weak')
x-EQUAL-HASH-TABLE-TYPE \ / datum-weak
x-STRING-HASH-TABLE-TYPE X key-or-datum-weak
MAKE-x-HASH-TABLE-TYPE / \ key-ephemeral
MAKE-x-EQ-HASH-TABLE datum-ephemeral
MAKE-x-EQV-HASH-TABLE key-and-datum-ephemeral
MAKE-x-EQUAL-HASH-TABLE
MAKE-x-STRING-HASH-TABLE
This leads to 70 different bindings. Some combinations make little
sense together (e.g., key-weak string hash tables), but even if those
combinations are pruned, the result would be a *trifle* excessive. So
what should we do instead?
Instead of a myriad of different hash table type and constructor
constructors, we could have just two procedures that take an extra
argument to specify the weakness option:
(HASH-TABLE-CONSTRUCTOR <key-hash> <key=?> <rehash-after-gc?> <weakness>)
(MAKE-HASH-TABLE-TYPE <key-hash> <key=?> <rehash-after-gc?> <weakness>)
Instead of a myriad of different hash table constructors for
particular hash functions, we could have one procedure per hash
function that takes an extra argument to specify the weakness option:
(MAKE-x-HASH-TABLE* <weakness> #!OPTIONAL <initial-size>),
x \in {EQ, EQV, EQUAL, STRING}
None of these names conflict with existing names. This leads to six
bindings, plus all the old ones, and perhaps seven bindings for the
weakness types (e.g., WEAKNESS:STRONG, WEAKNESS:KEY-WEAK, &c.).
Questions? Comments? Flames? Other ideas?
Here are all the weakness options:
- strong: Entries are never removed but with HASH-TABLE/REMOVE!.
- key-weak: The keys are referenced weakly, and entries whose keys
have been dropped are removed when the table is rehashed or cleaned.
Note that this can leave references to effectively unreachable data
in the hash table for arbitrarily long times, particularly for non-
address-hashed tables. These hash tables are currently just called
`weak' in MIT Scheme.
- datum-weak: The data are referenced weakly, and entries whose data
have been dropped are removed when the table is rehashed or cleaned.
Note that this can leave references to effectively unreachable keys
in the hash table for arbitrarily long times, particularly for non-
address-hashed tables.
- key-or-datum-weak: The keys and data are referenced weakly, and any
entry whose key or datum has been dropped is removed when the table
is rehashed or cleaned. Broken entries can stay dormant in the
table for arbitrarily long times, but not the references to their
keys and data.
- key-ephemeral: The keys and data are referenced weakly, and for any
entry whose key has been dropped, its datum is dropped too; any
entry whose key (and datum) has been dropped is removed when the
table is rehashed or cleaned. Broken entries can stay dormant in
the table for arbitrarily long times, but not the references to
their keys or data. References to the key through the datum don't
count if the only reference to the datum is through an ephemeron.
- datum-ephemeral: The keys and data are referenced weakly, and for
any entry whose datum has been dropped, its key is dropped too; any
entry whose datum (and key) has been dropped is removed when the
table is rehashed or cleaned. Broken entries can stay dormant in
the table for arbitrarily long times, but not the references to
their keys or data. References to the datum through the key don't
count if the only reference to the key is through an ephemeron.
- key-and-datum-ephemeral: The keys and data are referenced weakly,
and an entry is dropped if and only if neither its key nor its datum
has any strong reference; any entry whose key and datum have been
dropped is removed when the table is rehashed or cleaned. Broken
entries can stay dormant for arbitrarily long times, but not the
references to their keys or data.
Weak hash tables are lighter-weight than ephemeral hash tables,
requiring approximately half the storage, but there are space leaks
that ephemeral hash tables can solve which weak hash tables cannot.
Most Lisp systems do not provide both -- either they provide what I
have called weak hash tables here, or what they call weak hash tables
are what I have called ephemeral hash tables here. You should use
ephemeral hash tables unless you are sure that weak hash tables are
safe. For example, if you want to hang a small number property on
arbitrary objects, a weak hash table may be appropriate.
It may turn out that there's a nice way to make ephemeral hash tables
just as cheap as weak hash tables, but I haven't persuaded myself that
teaching the garbage collector about hash tables, rather than simple
objects such as weak pairs and ephemerons, qualifies as `nice'. The
GC is slow enough as is.
- [MIT-Scheme-devel] specifying hash table weakness,
Taylor R Campbell <=