[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-recutils] Index file structure
From: |
Michał Masłowski |
Subject: |
[bug-recutils] Index file structure |
Date: |
Tue, 15 May 2012 14:41:56 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
Hello,
I was thinking what the binary index file structure would need to
specify without having indices of records. It could be used for lazy
loading of rsets. I believe implementing it won't be useful now, before
supporting seekable parsing (probably of mmapped files) and other
changes needed to lazily load rsets.
In the following description, all numbers are little endian, probably
64 bit, and aligned to their size so they can be quickly accessed in
memory mapped files.
All offsets in the recfile would point to the empty line before the
start of a record (unless it's the start of the file), to be more sure
that the index is up to date.
The file would start with these fields:
- a magic number
- recfile modification time in seconds since the start of 1970
- recfile size
- recfile name length
- number of rsets
- number of indices
Then for each rset:
- rset type name length
- offset in the recfile to the "\n\n%rec" starting it
The file name and rset names would follow.
Then (maybe after padding) for each index its binary descriptor would be
included, it would start with its type and length so descriptors of
unknown types could be skipped.
The index file would be ignored if the specified recfile modification
time, size and name don't match the ones of opened recfile, or when a
record offset used doesn't point to an empty line.
I think these issues need discussing (and probably aren't the only
ones):
- should more precise timestamps be used? Python uses only whole
seconds and doesn't check file size, I had no problems with
reliability of this check.
- should we use 64 bit or 32 bit offsets in the file? I think most
advantages of recutils apply only to files that are small enough to be
edited in a text editor and index preparation would be too slow for
larger files, so SQL databases or other solutions would be more
practical for larger files.
- is too much or too little done for portability of index files to other
systems or recutils versions?
pgpjiFrXvW0xu.pgp
Description: PGP signature
- [bug-recutils] Index file structure,
Michał Masłowski <=