[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[help-GIFT] Adding text features to Viper/GIFT
From: |
David Squire |
Subject: |
[help-GIFT] Adding text features to Viper/GIFT |
Date: |
Tue, 15 May 2001 11:14:57 +1000 |
Hi all,
I am just about to spend a few hours integrating my text indexing code with the
feature extraction code for Viper/GIFT. One of the fundamental issues here (as
has been discussed earlier) is that the number and nature of the features (word
stems) which will be encountered in indexing a collection is not known in
advance.
The currently suggested solution is to maintain a file with each collection
which maps words to feature IDs - feature IDs would not orrespond directly
between collections (whereas they do now).
My current (quick and dirty) text indexing software accepts *all* the .txt
files to index as command line arguments. Statistics are then gathered for term
frequencies in the documents (in fact they are presently treated on a paragraph
by paragraph basis) and the entire collection as a whole. The advantage of this
is that a single hash mapping terms to their IDs and collection frequencies can
be maintained throughout the entire process.
If this were to be changed to work on a file by file basis, as the image
indexing currently works, then a file storing this hash would have to be
loaded, updated and then saved each time features were extracted for a given
.txt file.
I am planning a work-around where an initial text indexing phase will index all
.txt files in a collection, and write a summary file
containing term ID and term document frequency information for each .txt file.
These can then be read when the individual images are indexed. I think that
this will work quite well, but I think that we should think about how this
should be handled in the gift-add-collection.pl, gift-extract-features,
gift-generate-inverted-file, framework.
Any thoughts much appreciated.
Cheers,
David
--
Dr. David McG. Squire
Computer Science and Software Engineering, Monash University, Australia
http://www.csse.monash.edu.au/~davids/ http://viper.unige.ch/
Do/Don't want HTML mail? Let me know.
- Re: [help-GIFT] gift will not compile, Gregg Morris, 2001/05/01
- Re: [help-GIFT] gift will not compile, Wolfgang Müller, 2001/05/01
- Message not available
- Re: [help-GIFT] gift will not compile, Wolfgang Müller, 2001/05/01
- Re: [help-GIFT] gift will not compile, Gregg Morris, 2001/05/02
- Re: [help-GIFT] gift will not compile, Wolfgang Mueller, 2001/05/03
- [help-GIFT] Adding text features to Viper/GIFT,
David Squire <=
- Re: [help-GIFT] Adding text features to Viper/GIFT, Wolfgang Müller, 2001/05/15
- Re: [help-GIFT] Adding text features to Viper/GIFT, David Squire, 2001/05/15
- Re: [help-GIFT] Adding text features to Viper/GIFT, Wolfgang Müller, 2001/05/15