lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 6 Oct 2015 23:48:49 +0300
From: Solar Designer <>
To: Massimo Del Zotto <>
Subject: Re: [PHC] yescrypt on GPU


On Tue, Oct 06, 2015 at 01:27:00PM +0300, Solar Designer wrote:
> In fact, it doesn't make any sense to waste an entire CU on one hash,
> yet keep the S-boxes in global memory.  You have 64 KB of local memory
> per CU, enough for several yescrypt's.  Have you tried keeping the
> S-boxes in local memory?
> (For bcrypt, which is in many ways similar to pwxform, but uses 4 KB
> S-boxes, it is more optimal to keep them in local memory on GCN.)
> I think you happen to achieve decent performance (compared to other
> results for yescrypt on GPU) at all because the S-boxes are actually
> loaded from the same CU's cache.  Well, the cache is 16 KB (and there's
> also L2) - enough for the 8 KB S-boxes - but you certainly can do better
> by simply keeping the S-boxes in local memory.

I was wrong in the above - while writing it, I forgot that while you're
wasting the entire CU on one yescrypt hash in one wavefront, you're
probably running many wavefronts to hide the global memory and
instruction latencies.  With this, you might in fact be achieving same
or better performance than with keeping the S-boxes for a few instances
in local memory yet incurring the local memory and instruction latencies
all the time.  Still, I think there's much room for improvement, and we
need to know performance for S-boxes in local memory to have a baseline.


Powered by blists - more mailing lists