libextractor
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[libextractor] Hachoir project and some comments about libextractor


From: Victor STINNER
Subject: [libextractor] Hachoir project and some comments about libextractor
Date: Wed, 2 Aug 2006 17:16:27 +0200 (CEST)
User-agent: SquirrelMail/1.4.3a

Hi,

I'm one of the authors of Hachoir project:
  http://hachoir.python-hosting.com/

This project is a generic binary (and only binary) file parser. It's in
development since 10 months, but it's already interesting to test it.

I'm writting to you because I wrote a small tool based on Hachoir:
hachoir-metadata which extract many informations from known files. "known"
means that it needs a Hachoir parser and a metadata extractor. List of
supported files is here:
http://hachoir.python-hosting.com/wiki/Metadata

It's hard to say if it's fast or not since I don't have good test, but on
supported files it gives more informations than extract. I don't know if
your goal is to extract the more informations as possible or just to
extract informations useful to search a specific file.

We worked on optimisation last weeks. Best result was with svn version 479
: on one file, Hachoir was just 4 times slower than extract. Test is "time
extract file.png" and "time hachoir --metadata file.png". But this test is
stupid because Python take some millisecond to load (whereas extract is
pure C code).

--

I think that you use Hachoir source code to improve your parsers. Example:
PNG parser is poor. It doesn't extract create date not comments. You can
look at "parser/image/png.py" and "metadata/image.py".

To download Hachoir:
  svn co https://svn.hachoir.python-hosting.com/hachoir/trunk hachoir

To test Hachoir:
  cd <hachoir directory>
  export PYTHONPATH=$(cd src; pwd)
  script/hachoir-metadata file
  script/hachoir-metadata file1 file2 ...
Options:
  script/hachoir-metadata --level LEVEL file, filter informations
  script/hachoir-metadata --mime LEVEL file, just display MIME type

You can also test file explorer (need python "urwid" module):
  script/hachoir-urwid file

Or you can install it using "./setup.py install" ;-) (but now it's broken,
I will fix it next hours)

Haypo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]