[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <44BA240C.12130.55988FE@stuart.cyberdelix.net>
Date: Sun Jul 16 11:33:45 2006
From: stuart at cyberdelix.net (lsi)
Subject: throwing the book at spam
http://www.cyberdelix.net/tech/kaboom.htm
This page is to help you kill spammers, err, I mean spam, here's the
blueprints to my silver bullet, which (when combined with my other
filters) kills 99.75% of my spam. That 0.25% corresponds to about 1
message per day (and it is the target of further work).
where this filter fits
This is the Filter of Last Resort. Reason being, it's very
aggressive. You'll see why below. For this reason, this filter should
be used last, after all the other filters. This way, this filter will
only ever deal with the dregs, which means it's not so dangerous.
As this is the Filter of Last Resort, it does NOT include filtering
for all kinds of spam. Rather, it is designed to kill the spams that
the other filters miss.
It is dangerous. Most filters mark the spam and let it be. This
filter kills it. No Deleted Items, no Recycle Bin, no undo, no 'are
you sure'. Bang bang, dead dead. All that is left is a logfile entry.
automatic whitelist maintenance
To minimise the chances of a legit mail being terminated, this filter
includes a "make whitelist" command. This command tells the filter to
collect all email addresses from inside a number of other files (my
address books), eliminate the duplicates and save the list to disk.
This list is then used by the "move whitelisted" command, which moves
any message containing whitelisted strings to a separate folder (the
"whitebox").
This filter also supports three other whitelists, these are
good_senders, good_recipients and good_subjects. Any mail containing
a whitelisted string in the correct location is automatically
"whiteboxed" (moved to the whitebox).
how it works
This filter is actually built of 11 special-purpose filters. Any mail
matching one of these tests is deleted. The filters are as follows:
missing_addressee (missing 'To:' or 'for' field)
missing_sender (missing 'From:' field)
unlikely_chars (non-alphabetic subject or sender)
unlikely_dates (message date too old, or in future)
bounces (mail delivery failure, etc)
blacklisted (bad_senders/bad_recipients/bad_subjects)
gifs_attached (message has an attached GIF image)
X-RBL (message contains X-RBL-Warning: headerline)
X-DNS (message contains X-DNS-Warning: headerline)
X-SVF (message contains X-Sender-Verification-Failed: headerline)
analyse_received (Received: line invalid - see below)
These tests are fairly self-explanatory, with the exception of the
analyse_received test. This test analyses the significant Received:
headerline inside each mail (there are usually several Received:
lines, but only one is relevant for our purpose). Any mail with an
invalid Received: line is deleted. The tests for validity are as
follows:
IP_missing
IP_obfuscation
IP_unreversible
by-line_not_present
sending_SMTP_server_unresolvable
sending_hostname_not_provided
If these tests all pass, the message is then tested for a mismatch
between the sender's hostname and the hostname of the sender recorded
by the receiver. Again, a fail results in the message being deleted.
why it works
Spammers try and get their messages through by hiding, disguising or
armouring their spams. This filter spends most of its time looking
for evidence of armour. It assumes that an attempt at armouring means
the mail is spam.
Note that this approach is very unforgiving toward badly configured,
but legitimate systems, or systems using non-standard data formats.
Another reason to run this filter last. Mistakes can be minimised by
keeping the whitelists up-to-date, and encouraging all to run RFC-
compliant nodes.
And no, I'm not worried about posting my blueprints. Spammers are
welcome to use less obfuscation - this will send them straight into
the jaws of standard spam filters, but life's a bitch eh. They can
use more obfuscation, be my guest cos I need some extra handles to
kill that last 0.25%.
caveats
These notes are posted ahead of any software release, so as to
maximise the damage they can cause.
This is developmental software. It works, but only in the development
environment. In particular, it supports Pegasus Mail ONLY.
Addressbooks must be TEXT files, or they will not be processed by the
whitelister.
The whitebox must currently be processed manually.
greetz
Thanks have got to go to the twits out there sending me 1000+ spams a
day. Without your contribution, I would never have had the sample
size I needed.
---
Stuart Udall
stuart at@...erdelix.dot net - http://www.cyberdelix.net/
---
* Origin: lsi: revolution through evolution (192:168/0.2)
Powered by blists - more mailing lists