bug-gperf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gperf] gperf


From: Mike Emmel
Subject: Re: [bug-gperf] gperf
Date: Sat, 14 May 2011 12:47:55 -0700

Thanks not sure if top posting is acceptable or not:

I'll start with CMPH and do some research.
I'm call by name so the fastest hash function is also preferable thus
my interest in gperf.
And of course it has to be generated fast thus my interest in CMPH.

But I'm a newbie and your post was invaluable I can get started now.
I suspect that if I can find a perfect hash rapidly that I can also
convert it rapidly to
one that is fast i.e  there is a function that converts one known set
of perfect hash constants
to another. And also works with minimal hashs. So all you have to do
is find one.
In fact the set of minimal hash functions that are not trivial is I'm
guessing small.


On Sat, May 14, 2011 at 3:42 AM, Bruno Haible <address@hidden> wrote:
> [CCing bug-gperf]
>
> Hi,
>
> Mike Emmel wrote:
>> Hello Bruno I found your name as one of the authors of gperf.
>
> It's nevertheless better to write to the mailing list.
>
>> I'm interested in creating a perfect hash library for a dynamic language.
>> Keys are of course known but only at runtime.
>>
>> I found http://cmph.sourceforge.net/  however my problem domain
>> matches gperf small sets of keys generally less than 20.
>>
>> My language can grok C structures at runtime.
>>
>> I'm not sure how much of gperf I can use for this
>
> Let me be clear about it: gperf has a scalability problem. Its expected
> running time is not polynomial. If you feed it 20 keys, you will get an
> answer quickly. If you feed it 1000 keys, you may get an answer within
> a day. If you feed it 100000 keys, you'll have to wait for an answer for
> millions of years.
>
> This makes the gperf algorithm unusable for use at runtime.
>
> For use at runtime, you need an algorithm that is not only polynomial,
> but linear in the input. There are a couple of them that I know of:
>
>  - You can use plain hash tables, without a perfect hash function,
>    that is, with collision handling.
>
>  - You can use a hash function where the perfectness is not proven
>    but where collisions are extremely unlikely. The SHA-1 or SHA-256
>    hash functions are usable for this (but they map to 160-bit or
>    256-bit integers, not to 32-bit integers).
>
>  - You can use a hash function that can be constructed in linear time. [1]
>    [2][3]
>
> Some of these are implemented in the 'cmph' library that you mention.
>
> But the difference between cmph and gperf is that the hash function used
> by cmph uses at least 36 machine instructions. This is certainly much less
> than you need for an SHA-1 sum. But gperf's hash function is much faster:
> often only 2 machine instructions per relevant byte from the input.
>
> So, gperf is optimized for a super-fast hash function, but does not
> scale. Whereas cmph scales but has a slower hash function.
>
>> I was hoping to get permission to create a LGPL library inspired by gperf
>
> It makes no sense, technically. I don't even see the point in working in that
> direction.
>
> Bruno
>
> [1] Bernhard von Stengel: Construction of a perfect hash table with linear
>    space in linear expected time. http://www.maths.lse.ac.uk/
> [2] Zbigniev Czech, George Havas, Bohdan S. Majewski: An optimal algorithm
>    for generating minimal perfect hash functions.
> [3] Fabiano C. Botelho, Rasmus Pagh, Nivio Ziviani: Perfect Hashing for Data
>    Management.
> --
> In memoriam Maurice Bavaud <http://en.wikipedia.org/wiki/Maurice_Bavaud>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]