lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 11 Oct 2015 09:29:45 +0300
From: Solar Designer <solar@...nwall.com>
To: Massimo Del Zotto <massimodz8@...il.com>
Cc: discussions@...sword-hashing.net
Subject: Re: [PHC] yescrypt on GPU

Massimo,

As a trivial change to your existing code, you could try moving the
barrier.  You have:

				barrier(CLK_LOCAL_MEM_FENCE);
			}
			xo = (ulong)(xo >> 32) * (uint)xo;
			xo += gather[0 + 0];
			xo ^= gather[2 + 0]; // do this uint for slightly improved perf?
			xi = (ulong)(xi >> 32) * (uint)xi;
			xi += gather[0 + 1];
			xi ^= gather[2 + 1];

but you could have:

			xo = (ulong)(xo >> 32) * (uint)xo;
			xi = (ulong)(xi >> 32) * (uint)xi;
			barrier(CLK_LOCAL_MEM_FENCE);
			xo += gather[0 + 0];
			xo ^= gather[2 + 0]; // do this uint for slightly improved perf?
			xi += gather[0 + 1];
			xi ^= gather[2 + 1];

Maybe the compiler does this for you anyway, or maybe not.

The way I designed pwxform, the multiplications and gather loads are
supposed to work in parallel.

Alexander

Powered by blists - more mailing lists