ifile-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-dev] ifile + NDBM


From: Dave Marquardt
Subject: [Ifile-dev] ifile + NDBM
Date: 21 Feb 2003 16:59:24 -0600
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley)

In a continuing effort to improve performance of ifile, I've tried a
couple of things beyond the recent changes that I committed to CVS,
and became ifile 1.3.0.

First, I tried using mmap() instead of read() to read the database.
This actually hurt on Solaris, my main development platform, so I
abandoned that effort.

The second thought I had was whether we really need to read the whole
database, or could we just read the words that are in the messages
we're analyzing.  Using something like DBM, NDBM, gdbm or Berkeley DB,
I could just fetch the particular words as needed.  I've implemented
this with NDBM, but haven't really played much with it beyond
verifying that it *appears* to work.  It compiles, it runs without
crashing, it creates and later updates an NDBM database, and the
database appears to have the right stuff in it.

When working on this DBM idea, the issue of aging the words comes
up.  Since we currently read the whole database, it's easy to cull the
infrequently used words.  With a DBM scheme, we only access the words
that are used, so infrequently used words will never be culled, unless
we do something different.

I have a couple of ideas for this.  First, rather than add 1 to each
word's age each time, I could keep a single serial number in the
database that is incremented for each message read.  For each word, we
record the value of the serial number when the word is added to the
database the first time.  When we age words, we get the age of the
word by subtracting the current serial number from the word's creation
serial number.  This should solve the aging problem for words that are
used occasionally.

For words that are used once, they're just stuck in the database.  One
idea I had was to add another function to ifile to cull the database
of old words.

Anybody have any preferences of NDBM vs. gdbm vs. Berkely DB?  I
started this effort with NDBM only because it's in Solaris' libc.  If
someone has experience with some of these or some reason to use one of
the others over NDBM, I'm all ears.
-- 
Dave Marquardt
Round Rock, TX




reply via email to

[Prev in Thread] Current Thread [Next in Thread]