full-disclosure - Spam with PGP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3F839994.90207@bridgecomm.net>
From: devin.nate at bridgecomm.net (Devin Nate)
Subject: Spam with PGP

Jonathan A. Zdziarski wrote:

>>Bayesian filters have had some amazing successes. The problem we (the 
>>company I work for) continue to have, and the reason we continue to 
>>choose SA, is that training a thousand users on how to use a Bayes 
>>system is pretty much impossible (and we're small compared to many!) 
>>Assuming that I give you (I'm do not believe it, but will give it for 
>>the sake of argument) that Bayes is the best theoretical solution, the 
>>Bayes folks have a problem in implementation. Training users is not 
>>easy; think about training your mother or grandmother but multiply by 1000.
>>    
>>
>
>This is why two features exist, both which I think are components of any
>good Bayesian solution:
>
>1. User groups. ...
>2. A merge tool. ...
>
Excellent points. There is definitely R&D to be done in sharing Bayes 
info. Just as antivirus is able to LiveUpdate the braindead easy to 
define viruses, so should Spam Software. Regrettably, one of the key 
points of Bayes is that it is individualized. A common 'Bayes' DB is 
somewhat more difficult.

>Global tools are also an invaluable asset to fighting spam.  We're
>working on a magical blacklisting tool that will capture source ips from
>incoming spam...when a threshhold is exceeded, all incoming messages
>from that source ip are marked/learned as spam for all users (system
>wide) for whatever time period we specify.
>
Indeed, many are working on such a solution. We have a similar system in 
production for our users, and have commented on similar ideas for the SA 
system. Regrettably, the number of IP addresses is actually fairly large 
in terms of tracking spam status. And the variety of ways that spam can 
be transmitted complicates matters. Nevertheless, a bug has been opened 
at SA to attack the IP addresses that spammers use. Also note that a 
number of high profile anti-spam DNS services have been DoS'ed into 
oblivion (a couple in the last 2 months). So whatever solution needs to 
be resilient (either by having a holy ton of bandwidth, or peer to peer).

One of our ideas is a probability based system relating to the 
'closeness' of an IP address to the subnet as a spammer. The closer to 
the spammer, the more probable. As issue is, IPv4 has 2^32 addresses. 
Yes, I know many of those aren't used - lets assume that only 1/2 of the 
addresses are internet valid. That leaves you with 2^31 addresses. That 
requires a minimum of 2.1GB of disk space to represent. Again we get to 
cost/benefit. 2.1GB of disk space is not that expensive, but sysadmin, 
backup, etc of all that disk is. (How would you feel that your AntiSpam 
solution just cost you 2.1GB?) That doesn't even account for the CPU 
required to access a 2.1 GB database. It also doesn't account for the 
fact that spammers are slime and rotate IP addresses, use relays, etc. 
Complicating matters is that 'once a spam relay' does not mean 'always a 
spam relay'. We've needed to retest IP addresses to verify their status.

>Note, however, that the learning process does not need to be
>tech-savvy.  For example, we specifically sculpted our tool to be brain
>dead easy for grandma.  You get your mail like normal, and if you get a
>spam you forward it to grandma-spam@...rdomain.com.  There are even
>tools such as SpamSource (for Outlook) that can make this process a
>simple click of a button.  The signature mechanism we use stores the
>original tokenset in binary format in a temporary database on the server
>(or in the form of message attachments), which our tool will then use to
>relearn the message as spam.
>
You've done better than us. How have you managed to train your users to 
forward the email as the full email, incl all headers, etc? We've found 
most forwarded messages do not include all headers, and therefore 
forwarded messages train the spam database with semi legit emails (i.e. 
headers are legit because they are forwarded).

It sounds like you've moved about 8 steps beyond us, with some kind of a 
spam button interface. IMHO that's what SMTP really needs - a 'feedback 
loop' protocol to teach the server. Such a protocol would be similar to 
POP3 in the reverse direction (in particular, provide some form of 
authentication and then push a message), so that you could push a button 
and teach a central server (by whatever mechanism it chooses to learn 
by) that a message is SPAM. Nevertheless, we have not found a way to 
train our users to appropriately forward messages- they usually don't 
include the full headers, and therefore we miss the majority of the spam 
data.

>Anyhow, my point is, we're trying to improve the ease-of-use factor,
>which is a big reason tools like SA are still useful...out-of-the-box
>functionality...however that doesn't necessarily mean heuristics are not
>obsolete from a scientific perspective.  I think we're getting to a
>point where enough tools exist to make a deployment just as easy, and
>hopefully if things continue at the rate they're going, companies like
>yours that require this level of ease will be able to use Bayesian
>solutions
>
I love it, increasing spam protection is great. My perspective is that 
filtering 90% of spam for 1000 users (via SA, or whatever) is better 
than filtering 99% of spam for 1 user. Yes, the individual number is 
better in terms of percentage, however by doing the whole group of 
users, we block several hundred to a few thousand spam messages a day. 
It remains a difficult problem.

-- 

____________________________________________________________

Devin Nate
Chief Consultant & General Manager
BridgeComm Corporation
http://www.bridgecomm.net/
mailto:devin.nate@...dgecomm.net
____________________________________________________________ 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4663 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.grok.org.uk/pipermail/full-disclosure/attachments/20031007/4bc096f3/smime.bin