full-disclosure - spam with anti-bayesian parts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

From: Bojan.Zdrnja at LSS.hr (Bojan Zdrnja)
Subject: spam with anti-bayesian parts

> -----Original Message-----
> From: full-disclosure-admin@...ts.netsys.com 
> [mailto:full-disclosure-admin@...ts.netsys.com] On Behalf Of 
> Suresh Ponnusami
> Sent: Tuesday, 13 January 2004 12:30 a.m.
> To: vogt@...senet.com; full-disclosure@...ts.netsys.com
> Subject: Re: [Full-Disclosure] spam with anti-bayesian parts
> 
> Actually most of the spammers use automated tools that contains some
> scriptable plugins to evade the spam filters. Since they spam more that
> 1000's of users at a time, picking something real might be a bit slow and
> requires extra processing. Even if they create a template for all the
mails,
> that'll take up some time which they may not want to waste on. Also,
> introducing random gibberish noise might be able to get through bayesian
> filters because, that particular gibberish junk may not be in 
> the database.

That shouldn't help them if spam marking software is written properly (and
many aren't). If something is not in the bayesian database, it will be given
a neutral value (like 0.5 probability of being spam). However, their
marketing words (viagra or whatever) will give a high probability (more than
0.9). As software should take only first 6 or so most spammy tokens into
play, all that gibberish won't matter.

One thing when that will help them is when they only include one IMG SRC in
HTML e-mail, and if that link wasn't before in the database. That way
Bayesian classifier won't have a change to know what's going on as possibly
everything could get neutral values. In that case, other rules should help
(like RDBs and so on).

For more info I'd suggest checking wonderful Paul Graham's Web page at:

http://www.paulgraham.com/antispam.html

Also, Jonathan's DSPAM has some nice documents at the Web (I'm pretty sure
he'll reply to this as well). You can find this at:

http://www.nuclearelephant.com/projects/dspam/

Cheers,

Bojan