[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-recutils] Indexing plan

From: Michał Masłowski
Subject: [bug-recutils] Indexing plan
Date: Sun, 06 May 2012 22:21:28 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)


The design and implementation of indexing support could be split into
the following parts (each of them would also include tests and
documentation if it's user-visible):

- design index file structure

A single file would be used, with record descriptors in binary form
and several indices of various types.

- find the index file, using recfile metadata to check if it's up to

- delay reading the recfile

If indices are found, do it on first operation requiring whole rsets
or not being doable with the indices found.

- API for building indices and a recfix command to use it

- API for parsing indices

Maybe support a textual serialization of an index file for debugging
and tests.

The above parts could be done without indexing of records.  There will
be other uses for indexing rset descriptors and maybe their offsets in
the file, so this can be used to test the above parts before choosing
record index data structures.

- design and implement tree-based index of value of a specific field
  -> record mapping

Could use a memory-mapped btree with offline build and search

A new record descriptor field would be added to specify which fields
(or tuples of fields) to index using a specific index type.

The same structure with different data and ordering could be used for
string, number or date keys.

- find what index to use for a query

An easy solution would be to find a comparison operation on a field
that has an index.  I think much more could be done, but this would be
enough to work for simple queries and to test other parts.

E.g. for selection expression '(field1 = X) && (field2 = Y) && ...'
an index could be used to find all records with a field1 equal to X
and evaluate the expression only on them.

- more performance tests

Should help make indexing improve specific operations.  Won't find
which operations to improve, unlike real applications of big recfiles.

- other index structures if they would be useful


  - hash table index for string equality queries?

    Might be faster in some cases than btrees, unsure if it would be

  - a suffix tree for fast_string searches

  - btree indices for index or random queries

I think most of these changes could be designed and implemented in the
above order, I'll write a separate mail to discuss the index file
structure next week.

Attachment: pgpAQhEewKNpa.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]