[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Ifile-discuss] Effect of widely differing volumes on ifile classifi
From: |
Jason Rennie |
Subject: |
Re: [Ifile-discuss] Effect of widely differing volumes on ifile classification |
Date: |
Thu, 20 Mar 2003 13:23:52 -0500 |
address@hidden said:
> Recently, the rate, which had been consistent for some time, began to
> plunge to about 50% and stayed there, until I deleted .idata and
> rebuilt it from scratch, and it's now classifying better than before.
> (Data attached at bottom for completeness)
Are there any discernible differences between your current collection of
e-mail (what you used to rebuild .idata from scratch) and the collection
used to build the old .idata? Do you keep all of your e-mails? Can you
tell us about the types of misclassifications? Did it look pretty random
or were there certain folders that ifile seemed to send everything to?
address@hidden said:
> This didn't happen - it actually started to misclassify the mailing
> lists which receive all the volume.
That's strange, the mailing lists should be very easy for it to correctly
classify.
FYI, the anomalous behavior may just be a relic of the fact that ifile
uses Naive Bayes to do classification. I could go into details, but they
may be more confusing than illuminating... Naive Bayes can do some weird
things when the training data is highly skewed.
Jason