lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 26 Mar 2015 16:44:09 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Another PHC candidates "mechanical" tests (ROUND2)

On Thu, Mar 26, 2015 at 06:33:54AM -0700, Bill Cox wrote:
> 10X is exaggerated, but not by much.  Results from my laptop this morning
> say that my "worker' threads are slowed down by up to 4.97X when TwoCats is
> running.

Wow.

> I wrote up a simple "testwork" program that runs "workers" in parallel with
> TwoCats hashing.  The worker threads do non-SIMD read/write to L3 cache in
> a loop, and increment a counter once they've done it all.  They do some
> multiplies and adds in each loop iteration.
> 
> The worst case is when I have only 1 worker, using 4MiB of memory (my L3
> cache size on my laptop), while TwoCats uses 2 threads to hash 4MiB.  The
> workers slow down by less each when I add more workers.  With 2 workers,
> the slow-down is less than 2X for the workers.

[...]

> Here's the inner loop of the worker threads:
> 
> for(uint32_t i = 0; i < len; i++) {
>             mem[i] ^= (mem[(i*i*i*i) % len] + i) * (i | 1);
> }
> 
> The actual "work" done does not have a huge impact on the outcome.  The
> important thing is that the worker needs all of its L3 data.  I see use
> cases in real life that suffer from this problem when running SSE-optimized
> Scrypt.

Perhaps there are such real life cases, but I think they're relatively
uncommon.  For those special cases, similar impact may be seen e.g. by
simply happening to run on a CPU with a twice smaller L3 cache.  So not
that big a deal for most real world cases.

Alexander

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ