bugtraq - Re: Wisecracker 1.0 - A high performance distributed cryptanalysis framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date: Sun, 11 Nov 2012 15:30:10 -0500
From: Vikas N Kumar <walburn@...il.com>
To: Jann Horn <jannhorn@...glemail.com>
Cc: bugtraq@...urityfocus.com
Subject: Re: Wisecracker 1.0 - A high performance distributed cryptanalysis framework

On Sat, Nov 10, 2012 at 9:45 AM, Jann Horn <jannhorn@...glemail.com> wrote:
> I don't think this statement on your website makes much sense:
>
>     A user can download Wisecracker™ on a GPU cluster virtual machine
>     provided by Amazon EC2® and reverse an MD5 cryptographic hash for
>     a 6 character password in about 20 minutes if using 1 virtual
>     machine or in about 3 minutes if using 2 or more.
>
> What does "2 or more" mean here? If I use two machines in parallel, that's
> more than six times as fast as only using one machine? Seems weird to me.

Hi

I have actually updated the white paper with more clarity on that statement.
The time of 3 minutes is not the time taken to generate hashes for all
the possibilities. It is the average time taken to hit upon the first
successful solution. Once you get the solution the software sends a
stop signal to stop computation. This is part of the MD5 example
though, and the framework API is more generic and allows the user to
design the problem however they want. The framework's advantage is the
communications between systems and task distribution.

Wisecracker uses a concept of "tasks" for distribution of work load
across processors (CPUs, GPUs). The algorithm is a "divide and
conquer" algorithm similar to "bucket sort" and "quick sort".
So if you want to reverse an MD5 sum into a 6-character string of all
printable ASCII characters (94 of them) that would be about 94^6 which
is approximately 690 billion combinations. Wisecracker internally
creates an index based handling of these tasks. Each task is just an
index, and each index will end up being a string of 6-characters based
on which combination in the 94^6 it represents. (You can refer to the
md5.cl OpenCL file to see how it is done).

However, when more than 1 system is used the tasks get distributed per
system based on each system's OpenCL capabilities based on compute
units and work group size. Amazon's VMs have 2 GPUs each.
 When a single system is being used the tasks are distributed in the
range [1, 690billion] between 2 GPUs  based on the product of the
compute units and workgroup size.
So if a GPU's compute units are 32 in number and have a work group
size of 2048, it gets 32 x 2048 task blocks to work on. So one by one
each GPU will keep computing on it successive task blocks that it is
given until it finds the solution.

When you use 2 systems, the tasks are distributed between each system
as the ranges [1, 345billion] and [345billion, 690billion]. Once each
system gets its task range, it distributes work between its GPUs in a
similar fashion as task blocks based on compute units and work group
size.

Let's say you want to recover the string 'z@...g' and it might be in
the index range [345billion - 690billion].

If you were to run this on 1 Amazon GPU VM the program will have to
compute for [1-345billion] range first and then get to the
[345billion-690billion] range.

However, if you distribute this on 2 VMs you will hit upon the
solution faster because the second system is starting from 345billion
and you might not need to compute all the 345billion possible values
on each VM. You are saved from the needless computation of the
[1-345billion] range in full as done in the single system operation.

Hence the 2 VMs give a bigger decline in time rather than 1 VM because
of the way the work is distributed for the MD5 example.

On an average with different sets of strings the runtime drastically
goes down because of the fact that the search buckets are smaller and
start at different points.

The task distribution is a "divide and conquer" algorithm and does
have a worst case scenario run value of 10 minutes for a 6-character
string if the string is the 345billionth or 690billionth possibility.
But more often than not you will see an average run value of 3 minutes
to find the solution string.

I shall correct the ambiguous message on the website.

Thanks
Vikas
Selective Intellect LLC