[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
recognizing a file by scanning it
From: |
Thien-Thi Nguyen |
Subject: |
recognizing a file by scanning it |
Date: |
Sun, 27 Apr 2008 13:36:50 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) |
Some time back (within the last half-year or so) there was discussion
about Emacs being able to recognize file types by scanning content
rather than (or in addition to) using name-based heuristics.
One model for such a capability is the external command file(1), which
takes as its data a magic(5) file containing (possibly-chained) rules
specifying where and what to look for in the target file in order to
make a match, and additionally what to display on match.
For example, here is a fragment of ~/.magic ("|"-prefixed):
|# Emacs 18 - this is always correct, but not very magical.
|0 string \012( Emacs v18 byte-compiled Lisp data
|# Emacs 19+ - ver. recognition added by Ian Springer
|# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp
|# support or similar - Chris Chittleborough <address@hidden>
|0 string ;ELC
|>4 byte >19
|>4 byte <32 Emacs/XEmacs v%d byte-compiled Lisp data
I have written a Scheme program to translate this into sexps amenable
to both Scheme and Emacs Lisp `read'. To continue the example:
|(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data")
|(0 0 string (= . ";ELC") "")
|(1 4 byte (> 19) "")
|(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data")
(See <http://www.gnuvola.org/data/> for the complete translation.)
The Scheme program also mimics basic file(1) functionality; it can
recognize an unknown bag of bytes using the rules in either the original
magic(5) format or the translated-to-sexps variant, displaying output
indistinguishable (for the most part) from that of "file -n -N".
|$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc"
|$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for
GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1,
not stripped
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
Although it lacks advanced file(1) functionality (integrated ELF
grokking, charset guesstimation, fancy printf(3) output, etc), i
consider it complete enough to be a good starting point for a port to
Emacs Lisp. (Indeed, Emacs is much nicer for implementing such features
as charset guesstimation.)
But before continuing, i would like to discover if anyone else is
working on something similar, to avoid (more?) duplicate effort.
thi
- recognizing a file by scanning it,
Thien-Thi Nguyen <=
- recognizing a file by scanning it, Stephen J. Turnbull, 2008/04/27
- Re: recognizing a file by scanning it, Jason Rumney, 2008/04/27
- Re: recognizing a file by scanning it, Chong Yidong, 2008/04/27
- Re: recognizing a file by scanning it, Jason Rumney, 2008/04/27
- Re: recognizing a file by scanning it, Thien-Thi Nguyen, 2008/04/27
- Re: recognizing a file by scanning it, Jason Rumney, 2008/04/28
- Re: recognizing a file by scanning it, Reiner Steib, 2008/04/28
- Re: recognizing a file by scanning it, Jason Rumney, 2008/04/28