[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ifile-discuss] Re: Updated ifile writeup
From: |
Karl Vogel |
Subject: |
[Ifile-discuss] Re: Updated ifile writeup |
Date: |
19 Feb 2003 15:12:27 -0500 |
>> On 18 Feb 2003 10:26:13 +0100,
>> "clemens fischer" <address@hidden> said:
C> a spam-corpus this large definitely deserves special care. did you
C> think about making it a sourceforge project? there are several sf
C> projects about bayesian text-classifiers, and some of them have links to
C> spam corpora, but there's no project collecting spam systematically.
That has potential. My private collection doesn't grow by leaps and
bounds, but it does grow. The net-abuse collection just mirrors the
"net-abuse" newsgroup, and I haven't been grabbing that lately, so I
don't know how "systematic" this really is.
Also, stuff posted to "net-abuse" is often munged by the sender enough
to throw off ifile, unless I do something to clean up the message first.
That's why I don't automatically include that collection when generating
a new idata file.
Does anyone on this list have a spamtrap running? I have enough
diskspace on one of my local systems to collect spam, but not on my ISP.
Raw messages are best; I don't need any header fields except To:, Date:,
and Message-ID, and I can fake those if necessary.
C> and btw, what is "gtaylor" spam?
That's the stuff from Grant Taylor's spam archive at
http://www2.picante.com:81/~gtaylor/download/spam.tar.gz
I keep any usable collection with more than a few thousand messages
separate from the stuff I collect. "Keep your namespace clean", and all
that...
The most recent spam trick that seems to beat ifile is inserting stuff
that looks like HTML (but often isn't) in the middle of a message,
presumably so your browser or mailreader will drop it on the floor:
Prepare for the prof<!--vogelke-->essional advancement you deserve!
I have a short C program to strip all HTML tags from a file, so my next
change will be to add that program as a new filtering stage before ifile
is run.
--
Karl Vogel I don't speak for the USAF or my company
address@hidden http://www.pobox.com/~vogelke
That married couples can live together day after day is a
miracle that the Vatican has overlooked. --Bill Cosby