bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] Indexing plan


From: Jose E. Marchesi
Subject: Re: [bug-recutils] Indexing plan
Date: Thu, 10 May 2012 00:48:43 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)

    >     - delay reading the recfile
    >     
    >     If indices are found, do it on first operation requiring whole rsets
    >     or not being doable with the indices found.
    >
    > Sorry, I don't understand.  Do what?
    
    Instead of parsing a file and inserting rsets found, recutl_build_db
    would find that the file has a valid index and add an rset object with a
    descriptor specified in the index file and information needed to access
    its records from the index.

Interesting.  But the "lazy" reading of records can also be done even if
there is not an index file: it is much more efficient to search for
^%rec: than to parse the whole file.  So the lazy reading of records
must not depend on having an index for the recfile.
    
    Now I think it would be simpler to store all records in the index file,
    so parsing the recfile wouldn't be needed.  My other idea was to store
    only the offsets in the recfile of their beginning, it would require
    e.g. supporting seeks in parsers and keeping the file open while the
    database is used.

I much prefer the second option.  To keep the file open must not be an
issue.
    
    >     - design and implement tree-based index of value of a specific field
    >       -> record mapping
    >     
    >     Could use a memory-mapped btree with offline build and search
    >     operations.
    >     
    >     A new record descriptor field would be added to specify which fields
    >     (or tuples of fields) to index using a specific index type.
    >
    > I would go simple and use %key: for that purpose.  The support for
    > multi-field keys is another sub-project that will probably be done
    > before 2.0.
    
    I believe queries filtered by non-primary key fields are common in SQL
    databases and indices are done for them, so they might be useful here.
    Although it shouldn't be more difficult to add it later.

That makes sense, but I don't see why do we need to mark the fields for
which indexes exist in the record descriptors.  The idea is that the
user can use recfix to create indexes for fields which are often used,
like:

$ recfix add-index -f AField

In the example, AField is not required to be the key of the record.  We
can also create a "default" field for any %key defined.  Is there a
reason why we would need to mark AField in the record descriptor?
recsel must be able to determine if there is an index I on the rset S
for the field F in a given index file.

    >     - other index structures if they would be useful
    >
    > Please keep in mind that we want to use as much as stuff from gnulib as
    > possible.  This applies to that kind of data structures as well.  Since
    > you will probably find problems/improvements in the modules providing
    > data structures, it would be a good idea if you fill a copyright
    > assignment for gnulib right now.  Would that be ok for you?  As soon as
    > you confirm I will start the process with the gnulib maintainers.
    
    I haven't seen any gnulib modules specific to external data structures
    and I think most programs wouldn't use them directly (instead of
    e.g. via librec or gdbm), so I haven't considered them being useful to
    be included there.

Gnulib provides b-trees and hashes backends for the list ADT.  In any
case, we must never make gnulib staff directly visible to the user
through librec: we must provide our own abstractions instead.
    
-- 
Jose E. Marchesi         http://www.jemarch.net
GNU Project              http://www.gnu.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]