full-disclosure - Spam with PGP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1065617669.3155.11.camel@tantor.nuclearelephant.com>
From: jonathan at nuclearelephant.com (Jonathan A. Zdziarski)
Subject: Spam with PGP

> You've done better than us. How have you managed to train your users to 
> forward the email as the full email, incl all headers, etc? We've found 
> most forwarded messages do not include all headers, and therefore 
> forwarded messages train the spam database with semi legit emails (i.e. 
> headers are legit because they are forwarded).

Full headers are not necessary; DSPAM created a binary signature of
every message that goes through its system and hangs onto it for
[default] 14 days.  Depending on how the software is configured, this
binary signature will either be attached to each message (and forwarded
on automatically) or [default] kept on the server, with a very small
signature embedded into text portions of every email message that goes
out.  Regrettably, this signature must be visible or some email clients
will break it, but I find a majority of users can get used to it in the
same way they get used to in-line PGP signatures.  When they forward the
message, the signature gets forwarded just like any other part of the
message...DSPAM then looks up the binary signature on the server and
reverses the tokens.  This obviously uses some additional disk space on
the server-side, but not as much as you'd think.  Since attachments are
ignored, and tokens are deduplicated, there are only about 200-300
tokens in the average legitimate email.  These are stored in crc64
poly's on the server, which consume 8 bytes per token.

> It sounds like you've moved about 8 steps beyond us, with some kind of a 
> spam button interface. IMHO that's what SMTP really needs - a 'feedback 
> loop' protocol to teach the server. Such a protocol would be similar to 
> POP3 in the reverse direction (in particular, provide some form of 
> authentication and then push a message), so that you could push a button 
> and teach a central server (by whatever mechanism it chooses to learn 
> by) that a message is SPAM. Nevertheless, we have not found a way to 
> train our users to appropriately forward messages- they usually don't 
> include the full headers, and therefore we miss the majority of the spam 
> data.

If they use outlook, tools like SpamSource should do the trick for you -
it puts a little button in Outlook you click and automatically forwards
the entire original message.  One thing you should be concerned about
though - you definitely dont want to be paying any attention to the
user's own headers when they forward.  We did this about 20 versions ago
and it started spam-damning any email that were replies to a message
they sent out.

> I love it, increasing spam protection is great. My perspective is that 
> filtering 90% of spam for 1000 users (via SA, or whatever) is better 
> than filtering 99% of spam for 1 user. Yes, the individual number is 
> better in terms of percentage, however by doing the whole group of 
> users, we block several hundred to a few thousand spam messages a day. 
> It remains a difficult problem.

I'm not opposed to using both - but it ought to be set up in a way that
anything SA catches gets fed into your Bayesian tool as a guilty corpus
- this way your Bayesian tool doesn't suffer by not receiving that 90%
of email...and will shortly be able to filter the other 10% out for you.

Jonathan