[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-GIFT] Searching for similarities

From: Wolfgang Mueller
Subject: Re: [help-GIFT] Searching for similarities
Date: Wed, 17 Oct 2001 13:59:23 +0200

This is just to add a little bit more to what Henning said already...
On Wednesday 17 October 2001 09:36, Andreas Enge wrote:
> Dear developers,
> after playing around a little bit with gift, I realised that it does not
> quite serve the purpose I had intended to use it for. Maybe I am just
> overlooking something in the handling, 
As Henning said, you're not.
> maybe I am going to formulate a
> feature wishlist. Let me present two scenarioes.
> First, one might have a given image (say, of the Christmas tree of 2000)
> and look for similar ones (Christmas trees from previous years) in the
> indexed collection. Then it does not help too much to start with a
> random choice and select tree like forms until the picture one has in
> hand pops up. 
This is known as the page 0 problem in the literature (how to start the 
search if you do not have a suitable first image).

> Second, one might wish to clean the hard drive from (almost) duplicate
> images (the same image saved with different file formats, slightly
> varying brightness, added or removed text, etc.). To do so, one would
> need the list of the most resembling pairs of images in the collection,
> and I am not sure if this is a problem easily solved with the techniques
> used in gift. (Admittedly, I have no experience whatsoever with image
> retrieval.) When looking for exact duplicates, the most efficient way
> of doing so is to compute a hash value (md5sum) of each image, sorting
> the list of hash values and looking for duplicates. This makes up for
> a one-line shell script which is executed in basically linear time in
> the number of images. 

Yes, if you use hashes.

> To imitate this procedure when looking for
> similar images, one would need to associate to each file some number so
> that similar images get the same or close numbers. Is this a possibility,
> or does gift only associate a degree of similarness to pairs of images?

No, this is not a possibility, as you would need a computable, well-defined 
measure of similarity. In the context of *redundancy* of images this might 
work. I would think of something like:

resize (to a standard size) and compress (to a low quality factor) all images 
then calculate a md5sum of each. 

Would you like to try that at your place? It would be interesting to know 
what became of it.

> This would result in a rather inefficient quadratic time algorithm to
> compare everything with everything.

No, in fact not, there are efficient algorithms for kth-next-neighbour search 
on trees containing vectors. Try to look for things like k-d-trees, r-trees. 
The bottom line is, that you can get to log(n) search times for small 
dimensionalities. However, for large vector dimensionalities you get the 
"curse of dimensionality", and the efficiency of such algorithms goes down to 

Viper (the GIFT's query engine), as you maybe read already, uses inverted 
files. They have linear time complexity, but with a small constant factor 
(which matters, as we operate on finite sets ;-)  ).


Dr. Wolfgang Müller, assistant == teaching assistant
Personal page: 
Maintainer, GNU Image Finding Tool (

reply via email to

[Prev in Thread] Current Thread [Next in Thread]