lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <Pine.LNX.4.58.0310071442360.27612@soyokaze.cynistar.net> From: apthorpe+fd at cynistar.net (Bob Apthorpe) Subject: Spam with PGP Hi, I suggest that before you start explaining what SpamAssassin does and how it does it that you visit http://www.spamassassin.org/, specifically the README at http://www.spamassassin.org/full/2.7x/dist/README On Tue, 7 Oct 2003, Jonathan A. Zdziarski wrote: > [missing attribution] wrote: > > Of course, SpamAssassin does bayesian filtering as well. > > > > heuristic + bayesian is better than either alone, IMHO. > Actually the way SA does it weakens filtering. SA's bayesian filtering > is only a very small piece of SA, and unfortunately not much attention > has been given to it. The filter's final calculation is only a small > percentage of the actual final score. Here are SA's Bayesian scores; the four columns of scores are: 1: no network tests (DNSBLs, Razor, DCC, Pyzor), no Bayes 2: network tests, no Bayes 3: no network test, Bayes 4: network tests, Bayes score BAYES_00 0 0 -4.901 -4.900 score BAYES_01 0 0 -0.600 -1.524 score BAYES_10 0 0 -0.734 -0.908 score BAYES_20 0 0 -0.127 -1.428 score BAYES_30 0 0 -0.349 -0.904 score BAYES_40 0 0 -0.001 -0.001 score BAYES_44 0 0 -0.001 -0.001 score BAYES_50 0 0 0.001 0.001 score BAYES_56 0 0 0.001 0.001 score BAYES_60 0 0 1.789 1.592 score BAYES_70 0 0 2.142 2.255 score BAYES_80 0 0 2.442 1.657 score BAYES_90 0 0 2.454 2.101 score BAYES_99 0 0 5.400 5.400 The lowest positive Bayesian score (BAYES_60 w/network tests) is 1.592, providing ~32% of the (default) 5 points necessary for a message to be flagged as spam. This would appear to counter your claims that SA's Bayesian classifier provides only a small fraction of the total score. > Because true Bayesian filtering > performs a huge majority of the same tests that SA performs, SA's own > ruleset easily waters down any bayesian findings whenever there are > opposing values between the two. The Bayesian classifier does not perform the same rule-based heuristic tests. Depending on how vigilant the end-user was in training the Bayesian classifier, it's rare that the statistical scores and the heuristic scores are both large and of opposite signs. > For example, a pine MUA...SA thinks a > pine MUA suggests an innocent message, but a majority of the emails with > a pine MUA my wife receives are spams. In this case, the hard-coded MUA > rule will unfortunately water down the score, even if Bayes thinks a > pine MUA is spam. Obviously the pine MUA is just a small rule, but if > you apply this to the other rules, you get the same results. SA 2.5x had a number of negative-scoring tests that were easily forged (various MUA signatures, REFERENCES, IN_REP_TO, PGP signatures, etc.) These rules have been dropped from SA 2.60 or have had their scores far reduced to counter this known problem. > What's worse is that last time I looked (this may have changed), SA's > bayesian filter did not appear to have a mechanism for learning, but was > just a static dictionary. If users got spam there was no way for the > user to forward their spams into the system for processing. Again, this > may have changed and if it has, that's great. SA has included sa-learn for manual training ever since the Bayesian classifier was incorporated into the code (v2.50.) Additionally, SA contains thresholds above/below which messages will be automatically learned as spam/ham so the system trains itself (albeit slowly) without user intervention. > The product of Bayesian filtering includes all the heuristic tests as > well, so having both _hurts_ you, and is not something you benefit > from. No it does not, on all counts. You need to review the difference between heuristic and statistical classifiers. > It is much better to focus on creating a strong probability-based > filter IMHO...and I think the statistics agree with me. Then perhaps you should join forces with the people already performing such statistical comparisons between SpamAssassin, CRM114, bogofilter, and the like. The SA development list is at http://lists.sourceforge.net/mailman/listinfo/spamassassin-devel This problem (evading spam-filtering by including a bogus PGP sig) is a recognized and dead issue. The solution is to keep your security tools up-to-date. As SA filters more spam, spammers will find new ways around the filters, heuristic, statistical, or otherwise. -- Bob Apthorpe
Powered by blists - more mailing lists