[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] Indexing plan

From: Michał Masłowski
Subject: Re: [bug-recutils] Indexing plan
Date: Wed, 09 May 2012 21:58:50 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

>     - delay reading the recfile
>     If indices are found, do it on first operation requiring whole rsets
>     or not being doable with the indices found.
> Sorry, I don't understand.  Do what?

Instead of parsing a file and inserting rsets found, recutl_build_db
would find that the file has a valid index and add an rset object with a
descriptor specified in the index file and information needed to access
its records from the index.

Now I think it would be simpler to store all records in the index file,
so parsing the recfile wouldn't be needed.  My other idea was to store
only the offsets in the recfile of their beginning, it would require
e.g. supporting seeks in parsers and keeping the file open while the
database is used.

>     - design and implement tree-based index of value of a specific field
>       -> record mapping
>     Could use a memory-mapped btree with offline build and search
>     operations.
>     A new record descriptor field would be added to specify which fields
>     (or tuples of fields) to index using a specific index type.
> I would go simple and use %key: for that purpose.  The support for
> multi-field keys is another sub-project that will probably be done
> before 2.0.

I believe queries filtered by non-primary key fields are common in SQL
databases and indices are done for them, so they might be useful here.
Although it shouldn't be more difficult to add it later.

>     - other index structures if they would be useful
> Please keep in mind that we want to use as much as stuff from gnulib as
> possible.  This applies to that kind of data structures as well.  Since
> you will probably find problems/improvements in the modules providing
> data structures, it would be a good idea if you fill a copyright
> assignment for gnulib right now.  Would that be ok for you?  As soon as
> you confirm I will start the process with the gnulib maintainers.

I haven't seen any gnulib modules specific to external data structures
and I think most programs wouldn't use them directly (instead of
e.g. via librec or gdbm), so I haven't considered them being useful to
be included there.

We don't need incremental modifications of these structures, while these
require additional complexity (like tree rebalancing) and would be
probably needed in other packages using them.

Attachment: pgpyS4NzSRW5P.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]