|
From: | madmurphy |
Subject: | libextractor - key-value pairs and mime types |
Date: | Mon, 7 Feb 2022 21:01:23 +0000 |
Hi again, GNUnet people.
Is this the place where to discuss about libextractor? I have two points.
#1 I often see something interesting. Key-value pairs are categorized as EXTRACTOR_METATYPE_UNKNOWN
:
unknown: chroma-format=4:2:0 unknown: bit-depth-chroma=8 unknown: colorimetry=bt709 unknown: stream-format=avc unknown: stream-format=raw unknown: bit-depth-luma=8 unknown: base-profile="" unknown: mpegversion=4 unknown: profile="" unknown: alignment=au unknown: parsed=true unknown: framed=true unknown: variant=iso unknown: profile="" unknown: level=4.1
But one point is that they are often
numerous, and another point is that that of a key-value type is a really
interesting metatype to have (and is not really “unknown”, since the
key is self-explanatory). Would it not make sense to add an EXTRACTOR_METATYPE_KEY_VALUE_PAIR
to the list of MetaTypes?
... /* generic attributes */ EXTRACTOR_METATYPE_UNKNOWN = 45, EXTRACTOR_METATYPE_DESCRIPTION = 46, EXTRACTOR_METATYPE_COPYRIGHT = 47, EXTRACTOR_METATYPE_RIGHTS = 48, EXTRACTOR_METATYPE_KEYWORDS = 49, EXTRACTOR_METATYPE_ABSTRACT = 50, EXTRACTOR_METATYPE_SUMMARY = 51, EXTRACTOR_METATYPE_SUBJECT = 52, EXTRACTOR_METATYPE_CREATOR = 53, EXTRACTOR_METATYPE_FORMAT = 54, EXTRACTOR_METATYPE_FORMAT_VERSION = 55, EXTRACTOR_METATYPE_KEY_VALUE_PAIR = XXX, ...
#2 I often see that files get tagged with multiple mime types according to libextractor:
mimetype: video/quicktime mimetype: video/x-h264 mimetype: audio/mpeg mimetype: video/mp4
But that never reflects the reality, since
files should have only one mime type (or at most, multiple mime types
that mean the same thing). But then I see what happens with file names:
there is only one EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME
, but there can be many EXTRACTOR_METATYPE_FILENAME
s (in the case of archives, for example):
EXTRACTOR_METATYPE_FILENAME = 2, ... EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME = 180,
Would it not make sense to do something similar for mime types? Only one “original mime type”, and an infinity of secondary mime types…?
EXTRACTOR_METATYPE_MIMETYPE = 1, ... EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE = XXX,
So, two simple proposals:
EXTRACTOR_METATYPE_KEY_VALUE_PAIR
EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE
What do you think? Does it make sense?
--madmurphy
[Prev in Thread] | Current Thread | [Next in Thread] |