lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 7 Oct 2015 11:42:19 +0300
From: Solar Designer <solar@...nwall.com>
To: Massimo Del Zotto <massimodz8@...il.com>
Cc: discussions@...sword-hashing.net
Subject: Re: [PHC] yescrypt on GPU

On Wed, Oct 07, 2015 at 09:29:09AM +0200, Massimo Del Zotto wrote:
> I have been told on the AMD CL forum (by 'realhet', who appears very
> proficient and up to date with GCN ASM) that GCN has no instruction
> latencies (i.e. it can consume a result in the instruction immediately
> following), probably a nice implication of the instructions being processed
> in 4-clock 'ticks'.

realhet wrote an assembler/Pascal/IDE targeting GCN, so got to be very
proficient with GCN.  Can you post an URL for that specific forum posting?

In the following old thread, it was said that while 4 wavefronts could
be sufficient for ALU bound problems, more are needed for memory bound
problems (not surprisingly), and I think this might include local memory
(but I don't really know):

https://community.amd.com/thread/159171

(In that old thread, I think comments by jeff_golds are authoritative,
while realhet was just learning then-new GCN at the time.)

Anyway, I think this means that if we put the S-boxes into local memory,
we do incur at least those 4 cycle minimum latencies.

Alexander

Powered by blists - more mailing lists