guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scm_string_hash suboptimal?


From: Florian Weimer
Subject: Re: scm_string_hash suboptimal?
Date: 29 May 2001 23:54:11 +0200
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/20.7

Rob Browning <address@hidden> writes:

> Actually the current hash in symbols.c actually downcases all the
> characters, so it's even less discriminating. 

Downcase?  Okay, forget the portion of my previous message which talks
about the capital letter case. :-/

> (Shouldn't we have a separate string hasher that's case sensitive?)

The actual hash function is case sensitive, I think.

> After you posted I was kind of curious, knowing that a faster hash
> algorithm (and one that minimizes collisions) might help the
> performance of guile and many guile apps substantially, and so I
> started poking around on the web for hash table related information.

I've written a small test program which implements a few hash
functions (those used by Guile, Python, RScheme, Emacs, GCC, gettext,
libc).

The GCC hash function is pretty good (the multiplication can usually
be implemented as a bit shift, in contrast to the Python hash
function):

static hash_t
gcc_hash (octet_t *str, length_t len)
{
  unsigned int n = len;
  unsigned int r = 0;
#define HASHSTEP(r, c) ((r) * 67 + ((c) - 113));

  while (n--)
    r = HASHSTEP (r, *str++);

  return r + len;
#undef HASHSTEP
}

It's simple, and it doesn't pollute neither the data nor the
instruction cache, and it doesn't affect branch prediction, I suppose.
Anyway, I've got the feeling that this hash function is suitable
mostly for ASCII identifiers and truely random binary data, and not
something in between.

The RScheme hash function is fast, but there is the data table issue.

The other hash functions (Emacs, gettext) are not very interesting for
us, I think.

> I also realized that most of the hash algorithms only work on one byte
> of data at a time, and on modern CPU's, unless the compiler's *really*
> smart, that might not give you the best performance (though it'd be
> hard to say without testing).

In general, the strings aren't aligned on a word boundary, such
optimization are therefore quite difficult and processor-specific.

> Given that, it seemed like working on 32 bit chunks until you get
> down to the last few bytes might be preferable

This calls for an assembler implementation. :-/

> So is anyone here well versed in the "state of the art" hash-table
> wise?  If there is a "right thing", I'd be happy to read up on it and
> perhaps implement it.

Is there some large, non-interactive, easy-to-install Scheme
application out there?  I would like to do some real-world test with
the interesting functions (Guile, RScheme, GCC, and the one mentioned
below).

> It would also be nice if I could add dynamically resizing tables, if
> anyone knows of a good documented algorithm.

I could ask someone for a reference, this was discussed on another
list a while ago (but was off-topic there).

> Just poking around randomly with google, I came across a somewhat
> interesting algorithm, but it's a lot more thorough than what we're
> doing, and I have no idea if it's actually any good -- hasn't been
> debunked, or whatever:
> 
>   http://burtleburtle.net/bob/hash/evahash.html

Interesting.  The approach of supporting superscalar processors 

According to a quick test, the hash function is quite fast.  However,
it does produce significantly more collisions than the GCC hash
function cited above (and the )



reply via email to

[Prev in Thread] Current Thread [Next in Thread]