[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151006204849.GA10501@openwall.com>
Date: Tue, 6 Oct 2015 23:48:49 +0300
From: Solar Designer <solar@...nwall.com>
To: Massimo Del Zotto <massimodz8@...il.com>
Cc: discussions@...sword-hashing.net
Subject: Re: [PHC] yescrypt on GPU
Massimo,
On Tue, Oct 06, 2015 at 01:27:00PM +0300, Solar Designer wrote:
> In fact, it doesn't make any sense to waste an entire CU on one hash,
> yet keep the S-boxes in global memory. You have 64 KB of local memory
> per CU, enough for several yescrypt's. Have you tried keeping the
> S-boxes in local memory?
>
> (For bcrypt, which is in many ways similar to pwxform, but uses 4 KB
> S-boxes, it is more optimal to keep them in local memory on GCN.)
>
> I think you happen to achieve decent performance (compared to other
> results for yescrypt on GPU) at all because the S-boxes are actually
> loaded from the same CU's cache. Well, the cache is 16 KB (and there's
> also L2) - enough for the 8 KB S-boxes - but you certainly can do better
> by simply keeping the S-boxes in local memory.
I was wrong in the above - while writing it, I forgot that while you're
wasting the entire CU on one yescrypt hash in one wavefront, you're
probably running many wavefronts to hide the global memory and
instruction latencies. With this, you might in fact be achieving same
or better performance than with keeping the S-boxes for a few instances
in local memory yet incurring the local memory and instruction latencies
all the time. Still, I think there's much room for improvement, and we
need to know performance for S-boxes in local memory to have a baseline.
Alexander
Powered by blists - more mailing lists