[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-recutils] Indexing plan
From: |
Michał Masłowski |
Subject: |
[bug-recutils] Indexing plan |
Date: |
Sun, 06 May 2012 22:21:28 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
Hello,
The design and implementation of indexing support could be split into
the following parts (each of them would also include tests and
documentation if it's user-visible):
- design index file structure
A single file would be used, with record descriptors in binary form
and several indices of various types.
- find the index file, using recfile metadata to check if it's up to
date
- delay reading the recfile
If indices are found, do it on first operation requiring whole rsets
or not being doable with the indices found.
- API for building indices and a recfix command to use it
- API for parsing indices
Maybe support a textual serialization of an index file for debugging
and tests.
The above parts could be done without indexing of records. There will
be other uses for indexing rset descriptors and maybe their offsets in
the file, so this can be used to test the above parts before choosing
record index data structures.
- design and implement tree-based index of value of a specific field
-> record mapping
Could use a memory-mapped btree with offline build and search
operations.
A new record descriptor field would be added to specify which fields
(or tuples of fields) to index using a specific index type.
The same structure with different data and ordering could be used for
string, number or date keys.
- find what index to use for a query
An easy solution would be to find a comparison operation on a field
that has an index. I think much more could be done, but this would be
enough to work for simple queries and to test other parts.
E.g. for selection expression '(field1 = X) && (field2 = Y) && ...'
an index could be used to find all records with a field1 equal to X
and evaluate the expression only on them.
- more performance tests
Should help make indexing improve specific operations. Won't find
which operations to improve, unlike real applications of big recfiles.
- other index structures if they would be useful
Examples:
- hash table index for string equality queries?
Might be faster in some cases than btrees, unsure if it would be
needed.
- a suffix tree for fast_string searches
- btree indices for index or random queries
I think most of these changes could be designed and implemented in the
above order, I'll write a separate mail to discuss the index file
structure next week.
pgpAQhEewKNpa.pgp
Description: PGP signature
- [bug-recutils] Indexing plan,
Michał Masłowski <=