[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-recutils] Index file structure

From: Michał Masłowski
Subject: [bug-recutils] Index file structure
Date: Tue, 15 May 2012 14:41:56 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)


I was thinking what the binary index file structure would need to
specify without having indices of records.  It could be used for lazy
loading of rsets.  I believe implementing it won't be useful now, before
supporting seekable parsing (probably of mmapped files) and other
changes needed to lazily load rsets.

In the following description, all numbers are little endian, probably
64 bit, and aligned to their size so they can be quickly accessed in
memory mapped files.

All offsets in the recfile would point to the empty line before the
start of a record (unless it's the start of the file), to be more sure
that the index is up to date.

The file would start with these fields:

- a magic number

- recfile modification time in seconds since the start of 1970

- recfile size

- recfile name length

- number of rsets

- number of indices

Then for each rset:

- rset type name length

- offset in the recfile to the "\n\n%rec" starting it

The file name and rset names would follow.

Then (maybe after padding) for each index its binary descriptor would be
included, it would start with its type and length so descriptors of
unknown types could be skipped.

The index file would be ignored if the specified recfile modification
time, size and name don't match the ones of opened recfile, or when a
record offset used doesn't point to an empty line.

I think these issues need discussing (and probably aren't the only

- should more precise timestamps be used?  Python uses only whole
  seconds and doesn't check file size, I had no problems with
  reliability of this check.

- should we use 64 bit or 32 bit offsets in the file?  I think most
  advantages of recutils apply only to files that are small enough to be
  edited in a text editor and index preparation would be too slow for
  larger files, so SQL databases or other solutions would be more
  practical for larger files.

- is too much or too little done for portability of index files to other
  systems or recutils versions?

Attachment: pgpjiFrXvW0xu.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]