[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-GIFT] patch-fu

From: David Squire
Subject: Re: [help-GIFT] patch-fu
Date: Thu, 17 Aug 2006 14:45:03 +0100
User-agent: Thunderbird (Macintosh/20060719)

address@hidden wrote:
Yes, not doing a calloc() every time through improves performance by arround 5%.


it is my intention to expand the image size handling of the feature extractor as soon as i'm done with performance hacks, however, the curent code wont properly process an image larger than 256x256. therefore, i think allocating
the array automatically, to the MAX_WIDTH and MAX_HEIGHT is appropriate.
OK.... but I am concerned about a direction of modification that seeks a few percentage points of speed improvement here and there at the expense of good design (i.e. low coupling and extensibility) - not that I am suggesting that the feature extraction code as it was is an example of good design. We should be aiming always for good design.

The above approach tightly couples parts of the code to others via an essentially arbitrary choice of image size, that *happens* to be constant in the current version, but need not be. I would be *much* happier seeing things dynamic and parameterized, even if that makes things a little slower.

Remember the "rules" of optimization:

1. Don't optimize by hand.
2. If you think you need to optimize by hand, think again.
3. If you still really think you need to optimize by hand, optimize late.

This is particularly true of research software (such as the GIFT essentially is), where the requirements are moving targets. For example, I have my own versions of the feature extraction code that also look for and index text files associated with the images. The feature extraction code is intended to be as separate as possible from the indexing and query engine, so that others can write and use their own feature extraction code, using whatever features they like.

Also, the typical IR scenario is that you extract features and index once, and query many times. Consequently, optimizing query performance is much more important than optimizing feature extraction. Users are often quite happy for indexing to take hours or days.

I am not saying that optimization of the feature code is not appreciated, just that the trade-offs need to be kept closely in mind. IMHO, extending to arbitrary image sizes and shapes first would make more sense.



Dr David McG. Squire, Senior Lecturer, on sabbatical in 2006
Caulfield School of Information Technology, Monash University, Australia
CRICOS Provider No. 00008C

reply via email to

[Prev in Thread] Current Thread [Next in Thread]