[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Ifile-discuss] Effect of widely differing volumes on ifile classifi
From: |
Jack Bertram |
Subject: |
Re: [Ifile-discuss] Effect of widely differing volumes on ifile classification |
Date: |
Fri, 21 Mar 2003 14:31:53 +0000 |
User-agent: |
Mutt/1.4i |
* Brett Nemeroff <address@hidden> [030320 18:43]:
> I know this is slightly off topic, but I was wondering if you had custom
> scripts you could share that would report accuracy as you have in this
> post?
I have modified Martin's scripts ifile.inject-learn.header and
ifile.relearn.message to create a new empty temporary file in two
different directories every time they are called.
So, for example, in ifile.inject-learn.header, the last line is
mktemp /home/jack/.ifile.stats/learned/lXXXXXX >/dev/null 2>/dev/null
If I didn't do anything else, this would give me two directories one
which contained a file for every message, and one which contained a file
for every incorrect message. Doing a ls | wc -l would therefore be
enough to generate the list below on a total basis.
What I actually do is to use a script ifile.stats.count which is run
nightly after refile.learn, which counts up the messages from the
previous day and stores them in a couple of files, one of which keeps
the current total, and one of which keeps a record of historic totals.
The script emails me a short summary of success rates over the last day
and since records began. I don't keep track of statistics for an
individual folder - although I could do by counting the number of
X-Ifile-Learned-To headers and subtracting from the number of emails.
I attach the script below for your reference.
The output in my previous email was just a Perl one-liner which took the
contents of the last 30 days of each of the historic record files and
processed them.
jack
---- historic record file: example for "relearned" mail ----
Wed Mar 12 06:00:01 GMT 2003 : 837
Thu Mar 13 06:00:01 GMT 2003 : 841
Fri Mar 14 06:00:00 GMT 2003 : 841
Sat Mar 15 06:00:00 GMT 2003 : 848
Sun Mar 16 06:00:00 GMT 2003 : 850
Mon Mar 17 06:00:01 GMT 2003 : 851
Tue Mar 18 06:00:00 GMT 2003 : 855
Wed Mar 19 06:00:01 GMT 2003 : 858
Thu Mar 20 06:00:00 GMT 2003 : 862
Fri Mar 21 06:00:00 GMT 2003 : 863
---- ifile.stats.count ----
#!/bin/bash
DATE=`date`
STATS_DIR="$HOME/.ifile.stats"
LEARN_DIR="$STATS_DIR/learned"
RELEARN_DIR="$STATS_DIR/relearned"
L_CURRENT_FILE="$STATS_DIR/l_current"
R_CURRENT_FILE="$STATS_DIR/r_current"
L_ALL_FILE="$STATS_DIR/l_all"
R_ALL_FILE="$STATS_DIR/r_all"
L_CURRENT=`cat $L_CURRENT_FILE`
R_CURRENT=`cat $R_CURRENT_FILE`
LEARN=0
RELEARN=0
for file in $(ls $LEARN_DIR)
do
LEARN=$(($LEARN+1))
rm $LEARN_DIR/$file
done
for file in $(ls $RELEARN_DIR)
do
RELEARN=$(($RELEARN+1))
rm $RELEARN_DIR/$file
done
SUCCESS_TODAY=`echo "scale=2; 100-(100*$RELEARN/$LEARN)" | bc -l`
L_CURRENT=$(($L_CURRENT+$LEARN))
R_CURRENT=$(($R_CURRENT+$RELEARN))
SUCCESS_CURRENT=`echo "scale=2; 100-(100*$R_CURRENT/$L_CURRENT)" | bc -l`
printf $L_CURRENT > $L_CURRENT_FILE
printf $R_CURRENT > $R_CURRENT_FILE
printf "$DATE : $L_CURRENT\n" >> $L_ALL_FILE
printf "$DATE : $R_CURRENT\n" >> $R_ALL_FILE
printf "Ifile statistics: $DATE\n"
printf "~~~~~~~~~~~~~~~~~\n"
printf "\n"
printf "Last day:"
printf " Received: $LEARN\n"
printf " Relearned: $RELEARN\n"
printf " Success: $SUCCESS_TODAY%%\n"
printf "\n"
printf "Total:"
printf " Received: $L_CURRENT\n"
printf " Relearned: $R_CURRENT\n"
printf " Success: $SUCCESS_CURRENT%%\n"