ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] Effect of widely differing volumes on ifile classifi


From: Jason Rennie
Subject: Re: [Ifile-discuss] Effect of widely differing volumes on ifile classification
Date: Thu, 20 Mar 2003 13:23:52 -0500

address@hidden said:
> Recently, the rate, which had been consistent for some time, began to
> plunge to about 50% and stayed there, until I deleted .idata and
> rebuilt it from scratch, and it's now classifying better than before.
> (Data attached at bottom for completeness) 

Are there any discernible differences between your current collection of 
e-mail (what you used to rebuild .idata from scratch) and the collection 
used to build the old .idata?  Do you keep all of your e-mails?  Can you 
tell us about the types of misclassifications?  Did it look pretty random 
or were there certain folders that ifile seemed to send everything to?

address@hidden said:
> This didn't happen - it actually started to misclassify the mailing
> lists which receive all the volume.

That's strange, the mailing lists should be very easy for it to correctly 
classify.

FYI, the anomalous behavior may just be a relic of the fact that ifile
uses Naive Bayes to do classification.  I could go into details, but they
may be more confusing than illuminating...  Naive Bayes can do some weird
things when the training data is highly skewed.

Jason







reply via email to

[Prev in Thread] Current Thread [Next in Thread]