[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] GSoC: Ideas for Recutils

From: Michał Masłowski
Subject: Re: [bug-recutils] GSoC: Ideas for Recutils
Date: Tue, 27 Mar 2012 23:01:02 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

> We can assume that changing the recfile in any way will require a
> complete rebuild of the corresponding index file.  The operation will be
> performed by recfix, and it must be considered as an "offline"
> operation.  This implies that generating index file will be a slow
> operation, but recutils users will probably use indexes only in files
> which are rarely updated.

We could avoid rebuilding the index in some cases, since a program
changing the database could change a small number of index entries for
the modified records.  This would introduce additional complexity and
probably wouldn't have big benefits (saves processes the whole recfile),
so just ignoring the outdated index until recfix is run is probably a
better solution.

>     The only problem which I already found is that the database is
>     completely read and parsed for use, changing this would be needed to
>     make indices useful with recsel.  I don't expect this to be more
>     difficult than other parts of the task.
> That will require changes in the internal design of librec, which must
> be carefully studied.
> This will basically require a change in the rec_rset_t ADT in order to 

It needs to be done carefully, although it probably can be tested before
implementing support for indices.

>     The ideas page mentions determining if the index is up to date, I don't
>     see other practical solutions than using filesystem metadata of the
>     database file (checksumming the file contents should be much slower than
>     doing a simple query using a tree index).
> We could have a "checksum" comment at the end of the rec file, which
> would be generated by recfix when creating the index file.  The problem
> with this approach is that the creation of the index wont be completely
> decoupled from the recfile itself, but that may not be really a problem.

Another problem is that users editing recfiles with a text editor might
forget to change the comment, leading to incorrect query results (while
other solutions would give correct results slowly).  Python and
Mercurial uses modification timestamps to avoid reading files (modules
to compile into cached bytecode or versioned files to diff), I haven't
observed any problems with reliability of this solution.

Attachment: pgpAeutIp36nE.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]