[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.42.0209082318110.26057-100000@nimue.bos.bindview.com>
From: lcamtuf at ghettot.org (Michal Zalewski)
Subject: Snowdrop: a leak tracking tool
Hello list,
First of all, sorry if I picked a wrong place for shameless promotion of
own stuff. Looking at the charter, this seems to be an appropriate forum,
and since SECTOOLS@...urityfocus.com seems to be largely extinct, so I
don't think there's much choice... :-) To the point - in my spare time, I
have been working (big word) on a project that, while is not really a
security tool per se, may be of some interest to some readers and the
security industry as such.
I wanted to announce a new tool, called Snowdrop; it is supposed to
provide an interesting protection scheme for raw text documents and C
sources, so that it is possible to identify and track down a person who
disclosed a portion of the document or code to the public, even if the
document has been modified, truncated, reformatted or otherwise badly
hurt. Possible applications:
- internal memos and sensitive documents, even in e-mails,
- vulnerability data that can be leaked by one of vendors too early,
- proprietary sources,
- non-public exploits,
- etc, etc.
The goal is to make it possible to accurately determine who disclosed the
information, and, if necessary, to demonstrate to the public that the
disclosed information originated from you. The main concept, as you
probably guessed, is to embed a specific type of a watermark in the
document - but that ain't your typical file watermarking utility. The
ideas behind Snowdrop:
- using the content, instead of the medium; we introduce slight
changes to the written text, instead of introducing a payload;
this makes the technique much less prone to conversions,
copy-and-paste, and so on,
- using steganography to make the watermark non-evident and
non-intrusive,
- using several separate channels (synonyms, variable names,
formatting, typos, punctuation style, code logic) to make the
information less vulnerable to casual modifications, such as
reformatting, spell checking, simple edits, etc,
- using MD5 in a manner that makes watermarks (nearly) impossible
to tamper with in a meaningful way - for example, to make the leak
look like it's a fault of an innocent third party,
- using short, highly redundant watermarks to make it possible
to recover the watermark even from as little as a single paragraph
of text.
While the idea isn't new, I think that's the first open-source project
that uses non-trivial watermarking on this level. I realize the
description above is painfully vague, and I strongly encourage you to read
the documentation before asking "what the heck?".
Yes, as you are most likely aware, it is next to impossible to create a
watermark that cannot be purposefully removed or destroyed, and I am not
trying to say Snowdrop is trying to do that. That is not the point. Since
the watermark presence is not evident, the watermark itself is fairly
difficult to remove by accident and pretty small - in most cases, only
people who routinely run some anti-Snowdrop software on all outgoing
documents would be safe. So while it is possible that another person would
outsmart you and delete the watermark, chances are, this won't be the
case.
Of course, since Snowdrop was just something I coded as a PoC in my free
time, it's far from being perfect. Current code is pretty much beta, with
English language support working fairly well, and C code support still
not fully functional. The point of this announcement, as usual, is to
probe for interest, feedback and ideas, and to look for developers willing
to spend some time on this tool with me.
Download the beta code at http://lcamtuf.coredump.cx/snowdrop.tgz
Things that are broken or nasty the current version, and where your help
is welcome:
a) awful resynchronization code; it's too slow.
b) poor man's synonyms; it would be good to support multi-word
substitutions and use a better database of entries - the one
that is used right now is entirely homebrew,
c) certain channels still not supported by the C module; C module
broken for many language constructions; consider using a
smart parser (flex or such) to handle this.
PS. Since the noise ratio on this group is already high, I'd like to ask
you to reply directly to me, unless you really think this is a matter
others will be interested in :-) In particular, I don't think it makes any
sense to report bugs, compilation problems, etc, to the list.
--
Michal Zalewski
Powered by blists - more mailing lists