[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Ifile-discuss] Mailing List Filtering
From: |
clemens fischer |
Subject: |
Re: [Ifile-discuss] Mailing List Filtering |
Date: |
Thu, 06 Mar 2003 16:57:46 +0100 (CET) |
"William E. Kempf" <address@hidden>:
> 1) Mail comes in to the system and is passed to procmail.
now i'm completely lost. didn't you say you don't have access to the
server? what do you mean by server here? the MTA part of it? how
does the email come into your PC?
> So, again, the "category" *IS* the "folder".
no, it's not. for example, i have categories from way back when i
started using ifile, which go like "INBOX" or "spam" etc. since
using gnus, i had to append some string, like ".spool" to it. this
makes the categories "INBOX" and "spam", and the associated folders
"INBOX.spool" and "spam.spool". i like this about ifile.
> 2.3) Create a unique string based on various headers (I'll use a list of
> possible headers sorted in descending order of probability, and will
> likely include the From: header to both pick up those few lists that don't
> use any other headers, as well as to categorize personal e-mail from some
> folks). The unique string will be based on the address of the header, with
> all punctuation stripped (and possibly with some unique characters
> appended just to ensure no possibility for clashes with any other
> occurrences of the "word").
>
> 2.4) Call ifile again to query/learn a category for this message based
> solely on this unique string.
>
> 2.5) Place the final category in X-Ifile-Hint (which will either be Spam,
> or the folder in which we want the message to go).
is this what people call "feature extraction"? if the string is
unique, then it couldn't be a category, right? is it that you want
later emails to return the same string? then this string can't be
unique. or if it were, it wouldn't be much use to teach a category
with always only one item (unique string) in it.
but the way you preprocess emails to return better distinguishing
features is neat. it has some properties of emacs/mews
refile-guessing feature. this goes like:
(setq mew-refile-guess-alist
'(
("List-Unsubscribe:"
("<mailto:debian-\\([^<>, \t\n]+\\)address@hidden>" .
"+G/debian-jp-\\1")
("<mailto:\\([^<>, \t\n]+\\)-ctl@" . "+G/\\1")
("<mailto:\\([^<>, \t\n]+\\)-request@" . "+G/\\1")
("<mailto:\\([^<>, \t\n]+\\)-unsubscribe@" . "+G/\\1")
("=unsubscribe%20\\([^<>, \t\n]+\\)>" . "+G/\\1")
)
("Mailing-List:"
("^contact \\([^<>, \t\n]+\\)-help@" . "+G/\\1")
)
("X-ML-Info:"
("to the address \\([^<>, \t\n]+\\)-ctl@" . "+G/\\1")
("<mailto:\\([^<>, \t\n]+\\)-admin@" . "+G/\\1")
)
("X-ML-Name:"
("^\\([^<>, \t\n]+\\)" . "+G/\\1")
)
("X-Sequence:"
("^\\([^<>, \t\n]+\\) [0-9]+$" . "+G/\\1")
)
("Newsgroups:"
("^fj\\.\\([^<>, \t\n]+\\)" . "+G/fj.\\1")
("^gnu\\.\\([^<>, \t\n]+\\)" . "+G/gnu.\\1")
;;("^\\([^<>, \t\n]+\\)" . "+G/\\1")
)
))
this list describes the way mew guesses where people would want to
refile messages to, and it is specialized on typical mailinglist
headers. of course you can pick any header/value combination there
and even extract strings using regular expressions.
mews goes on with a variable called
(setq mew-refile-guess-control '(
mew-refile-guess-by-alist
mew-refile-ctrl-throw
mew-refile-guess-by-newsgroups
mew-refile-guess-by-folder
mew-refile-ctrl-throw
mew-refile-ctrl-auto-boundary
mew-refile-guess-by-thread
mew-refile-ctrl-throw
mew-refile-guess-by-from-folder
mew-refile-ctrl-throw
mew-refile-guess-by-from
mew-refile-ctrl-throw
mew-refile-guess-by-default))
these are functions using data like from the list above, trying to
guess if there's anything usable to determine the appropriate
folder. the function list is configurable, and you can put any
self-made function you can think of there. also, a function
refile-ctrl-throw is provided, which checks if something solid has
turned up (i think :).
isn't this nice? btw, there's an emacs-package for integrating ifile
into emacs/gnus.
> Anyone spot any flaws in this logic, or further steps that should be
> taken? This shouldn't be too difficult to implement in a Python
> script (I dislike Perl), so I'll likely take this route. If there's
> interest, I can share the results when I'm done.
although i didn't yet fully understand this, the list of headers you
want to base distinguishing features of emails on should be carefully
selected.
one last question: you said you can't do anything on the server, but
you can run procmail and python? or, if you do this at home, you
could do anything, including preprocessing your email with awk
scripts, right?
clemens
- Re: [Ifile-discuss] Mailing List Filtering, (continued)
- Re: [Ifile-discuss] Mailing List Filtering, Jason Rennie, 2003/03/05
- Re: [Ifile-discuss] Mailing List Filtering, Jack Bertram, 2003/03/05
- Re: [Ifile-discuss] Mailing List Filtering, clemens fischer, 2003/03/05
- Re: [Ifile-discuss] Mailing List Filtering, jack, 2003/03/05
- Re: [Ifile-discuss] Mailing List Filtering, clemens fischer, 2003/03/05
- Re: [Ifile-discuss] Mailing List Filtering, jack, 2003/03/06
- Re: [Ifile-discuss] Mailing List Filtering, clemens fischer, 2003/03/06
- Re: [Ifile-discuss] Mailing List Filtering, Jack Bertram, 2003/03/06
- Re: [Ifile-discuss] Mailing List Filtering, clemens fischer, 2003/03/06
- Re: [Ifile-discuss] Mailing List Filtering, William E. Kempf, 2003/03/06
- Re: [Ifile-discuss] Mailing List Filtering,
clemens fischer <=
- [Ifile-discuss] Adding a "plugin" parser to ifile, Booker Bense, 2003/03/06
- Re: [Ifile-discuss] Adding a "plugin" parser to ifile, clemens fischer, 2003/03/06
- Re: [Ifile-discuss] Adding a "plugin" parser to ifile, Karl Vogel, 2003/03/06
- Re: [Ifile-discuss] Adding a "plugin" parser to ifile, Booker Bense, 2003/03/07
Re: [Ifile-discuss] Mailing List Filtering, clemens fischer, 2003/03/05