[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1065552343.3151.81.camel@tantor.nuclearelephant.com>
From: jonathan at nuclearelephant.com (Jonathan A. Zdziarski)
Subject: Spam with PGP
Actually the way SA does it weakens filtering. SA's bayesian filtering
is only a very small piece of SA, and unfortunately not much attention
has been given to it. The filter's final calculation is only a small
percentage of the actual final score. Because true Bayesian filtering
performs a huge majority of the same tests that SA performs, SA's own
ruleset easily waters down any bayesian findings whenever there are
opposing values between the two. For example, a pine MUA...SA thinks a
pine MUA suggests an innocent message, but a majority of the emails with
a pine MUA my wife receives are spams. In this case, the hard-coded MUA
rule will unfortunately water down the score, even if Bayes thinks a
pine MUA is spam. Obviously the pine MUA is just a small rule, but if
you apply this to the other rules, you get the same results.
What's worse is that last time I looked (this may have changed), SA's
bayesian filter did not appear to have a mechanism for learning, but was
just a static dictionary. If users got spam there was no way for the
user to forward their spams into the system for processing. Again, this
may have changed and if it has, that's great.
The product of Bayesian filtering includes all the heuristic tests as
well, so having both _hurts_ you, and is not something you benefit
from. It is much better to focus on creating a strong probability-based
filter IMHO...and I think the statistics agree with me.
> Of course, SpamAssassin does bayesian filtering as well.
>
> heuristic + bayesian is better than either alone, IMHO.
Powered by blists - more mailing lists