[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-recutils] Indexing plan
Jose E. Marchesi
Re: [bug-recutils] Indexing plan
Thu, 10 May 2012 00:48:43 +0200
Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)
> - delay reading the recfile
> If indices are found, do it on first operation requiring whole rsets
> or not being doable with the indices found.
> Sorry, I don't understand. Do what?
Instead of parsing a file and inserting rsets found, recutl_build_db
would find that the file has a valid index and add an rset object with a
descriptor specified in the index file and information needed to access
its records from the index.
Interesting. But the "lazy" reading of records can also be done even if
there is not an index file: it is much more efficient to search for
^%rec: than to parse the whole file. So the lazy reading of records
must not depend on having an index for the recfile.
Now I think it would be simpler to store all records in the index file,
so parsing the recfile wouldn't be needed. My other idea was to store
only the offsets in the recfile of their beginning, it would require
e.g. supporting seeks in parsers and keeping the file open while the
database is used.
I much prefer the second option. To keep the file open must not be an
> - design and implement tree-based index of value of a specific field
> -> record mapping
> Could use a memory-mapped btree with offline build and search
> A new record descriptor field would be added to specify which fields
> (or tuples of fields) to index using a specific index type.
> I would go simple and use %key: for that purpose. The support for
> multi-field keys is another sub-project that will probably be done
> before 2.0.
I believe queries filtered by non-primary key fields are common in SQL
databases and indices are done for them, so they might be useful here.
Although it shouldn't be more difficult to add it later.
That makes sense, but I don't see why do we need to mark the fields for
which indexes exist in the record descriptors. The idea is that the
user can use recfix to create indexes for fields which are often used,
$ recfix add-index -f AField
In the example, AField is not required to be the key of the record. We
can also create a "default" field for any %key defined. Is there a
reason why we would need to mark AField in the record descriptor?
recsel must be able to determine if there is an index I on the rset S
for the field F in a given index file.
> - other index structures if they would be useful
> Please keep in mind that we want to use as much as stuff from gnulib as
> possible. This applies to that kind of data structures as well. Since
> you will probably find problems/improvements in the modules providing
> data structures, it would be a good idea if you fill a copyright
> assignment for gnulib right now. Would that be ok for you? As soon as
> you confirm I will start the process with the gnulib maintainers.
I haven't seen any gnulib modules specific to external data structures
and I think most programs wouldn't use them directly (instead of
e.g. via librec or gdbm), so I haven't considered them being useful to
be included there.
Gnulib provides b-trees and hashes backends for the list ADT. In any
case, we must never make gnulib staff directly visible to the user
through librec: we must provide our own abstractions instead.
Jose E. Marchesi http://www.jemarch.net
GNU Project http://www.gnu.org