phc-discussions - Re: [PHC] Another PHC candidates "mechanical" tests (ROUND2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p50qZdhd0Mq6rijyq5LG7E04N+46R4HVnsZVa+Ato1v1Q@mail.gmail.com>
Date: Thu, 26 Mar 2015 10:59:25 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Another PHC candidates "mechanical" tests (ROUND2)

On Thu, Mar 26, 2015 at 6:44 AM, Solar Designer <solar@...nwall.com> wrote:

> Perhaps there are such real life cases, but I think they're relatively
> uncommon.  For those special cases, similar impact may be seen e.g. by
> simply happening to run on a CPU with a twice smaller L3 cache.  So not
> that big a deal for most real world cases.
>
> Alexander
>

I tested your suggestion of using 1/2 L3 cache size.  You were right.  When
the workers use a total of L3 size/2 and hashing also uses L3 size/2, there
is almost no performance impact when running them together vs separately.
For generic servers that may be running multiple services at the same time
as password hashing, it's probably best to choose a memory size
significantly smaller than the L3 size.

I was curious about performance for LUKS-sized hashes, closer to 1GiB or
more, when multiple threads run.  My "worker" threads do unpredictable
4-byte reads, compared to my hashing thread that reads/writes 16KiB at
once.  I was surprised to see that there is not much interference between
memory hashing and the worker threads.  Apparently, threads who's runtime
is dominated by cache misses play well with threads who are memory
bandwidth limited.  For my unpredictable 4-byte read/write loop, I was only
able to do 1.9 1GiB passes per second, with or without competing TwoCats
threads.  A single TwoCat thread did 4.2 1GiB hashes/second, and two
threads did 7.8 1GiB hashes/second.  4 threads did 11.1 1GiB
hashes/second.  This is on my new work machine, which is a beastly 6-core
E5-1650 v2 @ 3.5 GHz CPU with 12MiB L3 cache.  I don't know it's memory
configuration, but it has 32 GiB.

For this particular machine and algorithm, I would use a 4GiB hash on 4
threads to unlock FDE, which runs in 0.373 seconds.  If I had do use only
one thread, I'd have to drop to 1GiB to run as fast.  I guess my main point
is that multiple threads really are a critical parameter for FDE, to get
the best protection in a given runtime.  Even 2X improvement matters, IMO.

I would prefer to see each algorithm benchmarked in it's best mode for the
FDE case, not the PHS defaults.

Bill

Content of type "text/html" skipped