I am posting this just to the mailing list and assuming that Henning, Stephane,
David and Wolfgang are all on the list.
First off, thanks for all your comments. It seems that the best way for me to get started is to get something installed and working on my system like GIFT or the benchathlon kit and then play around with it. I'll inevitably run into a few problems since I'm primarily a Windows C programmer and it looks like you guys prefer Perl and Linux.
Now to reply to some of the comments-
Integrating music retrieval into the Benchathlon is an excellent idea. Indeed
benchmarking issues have become a serious goal of the music IR community. And
MRML is also useful because most proposed testbeds circumvent the copyright
issues by storing audio on a secure server where high quality audio is never
allowed to leave. Thus we need a good communications protocol for sending
queries to the secure server.
Interfaces for music retrieval are basically set up in different ways for every single music retrieval package that has been set up. An excellent list has been compiled at http://mirsystems.info/ . Most of the systems are experimental (a few are proprietary, commercial systems) so little attention has been paid to user interface.
Since the MIR community comprises musicologists, librarians, computer scientists and signal processing people, it is safe to assume that systems exist on a wide variety of platforms and implemented in many different development environments. As a minimum, we should be able to work with people who use Windows, Mac and Linux, and who develop in Matlab, C, C++, Java and Perl. There are also some development environments that are very specific to audio, such as CSound and VST, and those may require plug-ins.
Whether we do query by example depends on your definition of QBE. One branch of
MIR systems are known as query by humming, query by singing, query by
whistling, ..., systems. Here, one hums a few seconds of music. This is then
translated into a monophonic pitch representation. Retrieved documents are
typically best matched midi files (i.e., a symbolic, not audio representation)
or else just the song title. Other systems use a small snippet of polyphonic
music (5 -15 seconds) and then use feature extraction to find a best match. One
problem with this is that the best features capture timbral information, not
rhythmic or melodic information. This implies that two very different songs
played on the same instruments is usually a higher match than the same song
played on different instruments.
The biggest difference between Music Retrieval and CBIR is that there is
typically no feedback. The reasoning is simple- it takes seconds to look at
images and attach relevance judgements, but it would take minutes to listen to
several snippets of music and determine their relevance.
Retrieval quality can be judged in different ways depending on the type of system. Most queries
(such as "find me music that sounds like..." ) would best be evaluated TREC style. But a
few are binary questions (such as "what is the name of this song") and the system either
finds it or doesn't. I suppose that can be evaluated by determination of how many errors there are
in the query before the question cannot be answered.
But all those questions about the uniqueness of music retrieval systems are
interested and fun, and must wait until I have some implementation of MRML
working before I deal with them
------------------------------------------------------------------------
_______________________________________________
help-GIFT mailing list
address@hidden
http://mail.gnu.org/mailman/listinfo/help-gift