[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151007084219.GA16480@openwall.com>
Date: Wed, 7 Oct 2015 11:42:19 +0300
From: Solar Designer <solar@...nwall.com>
To: Massimo Del Zotto <massimodz8@...il.com>
Cc: discussions@...sword-hashing.net
Subject: Re: [PHC] yescrypt on GPU
On Wed, Oct 07, 2015 at 09:29:09AM +0200, Massimo Del Zotto wrote:
> I have been told on the AMD CL forum (by 'realhet', who appears very
> proficient and up to date with GCN ASM) that GCN has no instruction
> latencies (i.e. it can consume a result in the instruction immediately
> following), probably a nice implication of the instructions being processed
> in 4-clock 'ticks'.
realhet wrote an assembler/Pascal/IDE targeting GCN, so got to be very
proficient with GCN. Can you post an URL for that specific forum posting?
In the following old thread, it was said that while 4 wavefronts could
be sufficient for ALU bound problems, more are needed for memory bound
problems (not surprisingly), and I think this might include local memory
(but I don't really know):
https://community.amd.com/thread/159171
(In that old thread, I think comments by jeff_golds are authoritative,
while realhet was just learning then-new GCN at the time.)
Anyway, I think this means that if we put the S-boxes into local memory,
we do incur at least those 4 cycle minimum latencies.
Alexander
Powered by blists - more mailing lists