bug-gperf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gperf] gperf


From: Bruno Haible
Subject: Re: [bug-gperf] gperf
Date: Sat, 14 May 2011 12:42:50 +0200
User-agent: KMail/1.9.9

[CCing bug-gperf]

Hi,

Mike Emmel wrote:
> Hello Bruno I found your name as one of the authors of gperf.

It's nevertheless better to write to the mailing list.

> I'm interested in creating a perfect hash library for a dynamic language.
> Keys are of course known but only at runtime.
> 
> I found http://cmph.sourceforge.net/  however my problem domain
> matches gperf small sets of keys generally less than 20.
> 
> My language can grok C structures at runtime.
> 
> I'm not sure how much of gperf I can use for this

Let me be clear about it: gperf has a scalability problem. Its expected
running time is not polynomial. If you feed it 20 keys, you will get an
answer quickly. If you feed it 1000 keys, you may get an answer within
a day. If you feed it 100000 keys, you'll have to wait for an answer for
millions of years.

This makes the gperf algorithm unusable for use at runtime.

For use at runtime, you need an algorithm that is not only polynomial,
but linear in the input. There are a couple of them that I know of:

  - You can use plain hash tables, without a perfect hash function,
    that is, with collision handling.

  - You can use a hash function where the perfectness is not proven
    but where collisions are extremely unlikely. The SHA-1 or SHA-256
    hash functions are usable for this (but they map to 160-bit or
    256-bit integers, not to 32-bit integers).

  - You can use a hash function that can be constructed in linear time. [1]
    [2][3]

Some of these are implemented in the 'cmph' library that you mention.

But the difference between cmph and gperf is that the hash function used
by cmph uses at least 36 machine instructions. This is certainly much less
than you need for an SHA-1 sum. But gperf's hash function is much faster:
often only 2 machine instructions per relevant byte from the input.

So, gperf is optimized for a super-fast hash function, but does not
scale. Whereas cmph scales but has a slower hash function.

> I was hoping to get permission to create a LGPL library inspired by gperf

It makes no sense, technically. I don't even see the point in working in that
direction.

Bruno

[1] Bernhard von Stengel: Construction of a perfect hash table with linear
    space in linear expected time. http://www.maths.lse.ac.uk/
[2] Zbigniev Czech, George Havas, Bohdan S. Majewski: An optimal algorithm
    for generating minimal perfect hash functions.
[3] Fabiano C. Botelho, Rasmus Pagh, Nivio Ziviani: Perfect Hashing for Data
    Management.
-- 
In memoriam Maurice Bavaud <http://en.wikipedia.org/wiki/Maurice_Bavaud>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]