bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] GSoC: Ideas for Recutils


From: Jose E. Marchesi
Subject: Re: [bug-recutils] GSoC: Ideas for Recutils
Date: Tue, 27 Mar 2012 20:06:48 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)

Hi.

    For complex queries there are many ways to use indices, there are also
    different performance benefits of tree or hash indices.  This depends on
    data.  Maybe the index could be built in a way optimized for previously
    done queries, without any manual specification of what to store
    there.

    Since any write practically requires rewriting the database (indices are
    optional), maybe index formats which needs a complete rebuild on change
    wouldn't be too slow for use with recutils, although they aren't used in
    traditional database systems.

We can assume that changing the recfile in any way will require a
complete rebuild of the corresponding index file.  The operation will be
performed by recfix, and it must be considered as an "offline"
operation.  This implies that generating index file will be a slow
operation, but recutils users will probably use indexes only in files
which are rarely updated.
    
    Writing good performance tests, which might approximate what a real
    useful program does with a big database, is probably necessary for this
    task.  I don't know existing uses of recutils with database sizes for
    which this task would be significant.

Yes, would be nice to have realistic performance tests.  There are some
simple performance tests for recsel in torture/utils/p-recsel.sh, but
they could not be considered as "realistic".
    
    The only problem which I already found is that the database is
    completely read and parsed for use, changing this would be needed to
    make indices useful with recsel.  I don't expect this to be more
    difficult than other parts of the task.

That will require changes in the internal design of librec, which must
be carefully studied.

This will basically require a change in the rec_rset_t ADT in order to 
    
    The ideas page mentions determining if the index is up to date, I don't
    see other practical solutions than using filesystem metadata of the
    database file (checksumming the file contents should be much slower than
    doing a simple query using a tree index).

We could have a "checksum" comment at the end of the rec file, which
would be generated by recfix when creating the index file.  The problem
with this approach is that the creation of the index wont be completely
decoupled from the recfile itself, but that may not be really a problem.
    
    (I'm writing this as a student interested in implementing this; I don't
    have practical experience in implementing databases, I know C and I can
    implement structures useful for indices.)

More than enough.  The analysis you just did proves that you could do
the task if you wanted to :)

-- 
Jose E. Marchesi         http://www.jemarch.net
GNU Project              http://www.gnu.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]