gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

libextractor - key-value pairs and mime types


From: madmurphy
Subject: libextractor - key-value pairs and mime types
Date: Mon, 7 Feb 2022 21:01:23 +0000

Hi again, GNUnet people.

Is this the place where to discuss about libextractor? I have two points.

#1 I often see something interesting. Key-value pairs are categorized as EXTRACTOR_METATYPE_UNKNOWN:

unknown: chroma-format=4:2:0
unknown: bit-depth-chroma=8
unknown: colorimetry=bt709
unknown: stream-format=avc
unknown: stream-format=raw
unknown: bit-depth-luma=8
unknown: base-profile=""
unknown: mpegversion=4
unknown: profile=""
unknown: alignment=au
unknown: parsed=true
unknown: framed=true
unknown: variant=iso
unknown: profile=""
unknown: level=4.1

But one point is that they are often numerous, and another point is that that of a key-value type is a really interesting metatype to have (and is not really “unknown”, since the key is self-explanatory). Would it not make sense to add an EXTRACTOR_METATYPE_KEY_VALUE_PAIR to the list of MetaTypes?

...

  /* generic attributes */
  EXTRACTOR_METATYPE_UNKNOWN = 45,
  EXTRACTOR_METATYPE_DESCRIPTION = 46,
  EXTRACTOR_METATYPE_COPYRIGHT = 47,
  EXTRACTOR_METATYPE_RIGHTS = 48,
  EXTRACTOR_METATYPE_KEYWORDS = 49,
  EXTRACTOR_METATYPE_ABSTRACT = 50,
  EXTRACTOR_METATYPE_SUMMARY = 51,
  EXTRACTOR_METATYPE_SUBJECT = 52,
  EXTRACTOR_METATYPE_CREATOR = 53,
  EXTRACTOR_METATYPE_FORMAT = 54,
  EXTRACTOR_METATYPE_FORMAT_VERSION = 55,
  EXTRACTOR_METATYPE_KEY_VALUE_PAIR = XXX,

...

#2 I often see that files get tagged with multiple mime types according to libextractor:

mimetype: video/quicktime
mimetype: video/x-h264
mimetype: audio/mpeg
mimetype: video/mp4

But that never reflects the reality, since files should have only one mime type (or at most, multiple mime types that mean the same thing). But then I see what happens with file names: there is only one EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME, but there can be many EXTRACTOR_METATYPE_FILENAMEs (in the case of archives, for example):

EXTRACTOR_METATYPE_FILENAME = 2,
...
EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME = 180,

Would it not make sense to do something similar for mime types? Only one “original mime type”, and an infinity of secondary mime types…?

EXTRACTOR_METATYPE_MIMETYPE = 1,
...
EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE = XXX,

So, two simple proposals:

  1. Create EXTRACTOR_METATYPE_KEY_VALUE_PAIR
  2. Create EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE

What do you think? Does it make sense?

--madmurphy


reply via email to

[Prev in Thread] Current Thread [Next in Thread]