ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] Mailing List Filtering


From: clemens fischer
Subject: Re: [Ifile-discuss] Mailing List Filtering
Date: Thu, 06 Mar 2003 16:57:46 +0100 (CET)

"William E. Kempf" <address@hidden>:

> 1) Mail comes in to the system and is passed to procmail.

now i'm completely lost.  didn't you say you don't have access to the
server?  what do you mean by server here?  the MTA part of it?  how
does the email come into your PC?

> So, again, the "category" *IS* the "folder".

no, it's not.  for example, i have categories from way back when i
started using ifile, which go like "INBOX" or "spam" etc.  since
using gnus, i had to append some string, like ".spool" to it.  this
makes the categories "INBOX" and "spam", and the associated folders
"INBOX.spool" and "spam.spool".  i like this about ifile.

> 2.3) Create a unique string based on various headers (I'll use a list of
> possible headers sorted in descending order of probability, and will
> likely include the From: header to both pick up those few lists that don't
> use any other headers, as well as to categorize personal e-mail from some
> folks). The unique string will be based on the address of the header, with
> all punctuation stripped (and possibly with some unique characters
> appended just to ensure no possibility for clashes with any other
> occurrences of the "word").
> 
> 2.4) Call ifile again to query/learn a category for this message based
> solely on this unique string.
> 
> 2.5) Place the final category in X-Ifile-Hint (which will either be Spam,
> or the folder in which we want the message to go).

is this what people call "feature extraction"?  if the string is
unique, then it couldn't be a category, right?  is it that you want
later emails to return the same string?  then this string can't be
unique.  or if it were, it wouldn't be much use to teach a category
with always only one item (unique string) in it.

but the way you preprocess emails to return better distinguishing
features is neat.  it has some properties of emacs/mews
refile-guessing feature.  this goes like:

(setq mew-refile-guess-alist
      '(
        ("List-Unsubscribe:"
         ("<mailto:debian-\\([^<>, \t\n]+\\)address@hidden>" . 
"+G/debian-jp-\\1")
         ("<mailto:\\([^<>, \t\n]+\\)-ctl@" . "+G/\\1")
         ("<mailto:\\([^<>, \t\n]+\\)-request@" . "+G/\\1")
         ("<mailto:\\([^<>, \t\n]+\\)-unsubscribe@" . "+G/\\1")
         ("=unsubscribe%20\\([^<>, \t\n]+\\)>" . "+G/\\1")
         )
        ("Mailing-List:"
         ("^contact \\([^<>, \t\n]+\\)-help@" . "+G/\\1")
         )
        ("X-ML-Info:"
         ("to the address \\([^<>, \t\n]+\\)-ctl@" . "+G/\\1")
         ("<mailto:\\([^<>, \t\n]+\\)-admin@" . "+G/\\1")
         )
        ("X-ML-Name:"
         ("^\\([^<>, \t\n]+\\)" . "+G/\\1")
         )
        ("X-Sequence:"
         ("^\\([^<>, \t\n]+\\) [0-9]+$" . "+G/\\1")
         )
        ("Newsgroups:"
         ("^fj\\.\\([^<>, \t\n]+\\)" . "+G/fj.\\1")
         ("^gnu\\.\\([^<>, \t\n]+\\)" . "+G/gnu.\\1")
         ;;("^\\([^<>, \t\n]+\\)" . "+G/\\1")
         )
        ))

this list describes the way mew guesses where people would want to
refile messages to, and it is specialized on typical mailinglist
headers.  of course you can pick any header/value combination there
and even extract strings using regular expressions.

mews goes on with a variable called

  (setq mew-refile-guess-control '(
        mew-refile-guess-by-alist
        mew-refile-ctrl-throw
        mew-refile-guess-by-newsgroups
        mew-refile-guess-by-folder
        mew-refile-ctrl-throw
        mew-refile-ctrl-auto-boundary
        mew-refile-guess-by-thread
        mew-refile-ctrl-throw
        mew-refile-guess-by-from-folder
        mew-refile-ctrl-throw
        mew-refile-guess-by-from
        mew-refile-ctrl-throw
        mew-refile-guess-by-default))

these are functions using data like from the list above, trying to
guess if there's anything usable to determine the appropriate
folder.  the function list is configurable, and you can put any
self-made function you can think of there.  also, a function
refile-ctrl-throw is provided, which checks if something solid has
turned up (i think :).

isn't this nice?  btw, there's an emacs-package for integrating ifile
into emacs/gnus.

> Anyone spot any flaws in this logic, or further steps that should be
> taken?  This shouldn't be too difficult to implement in a Python
> script (I dislike Perl), so I'll likely take this route.  If there's
> interest, I can share the results when I'm done.

although i didn't yet fully understand this, the list of headers you
want to base distinguishing features of emails on should be carefully
selected.

one last question: you said you can't do anything on the server, but
you can run procmail and python?  or, if you do this at home, you
could do anything, including preprocessing your email with awk
scripts, right?

  clemens




reply via email to

[Prev in Thread] Current Thread [Next in Thread]