lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 11 Oct 2015 16:12:54 +0200
From: Massimo Del Zotto <massimodz8@...il.com>
To: Solar Designer <solar@...nwall.com>
Cc: discussions@...sword-hashing.net
Subject: Re: [PHC] yescrypt on GPU

You are spot on - barrier removal is one of the effects
of __attribute__((reqd_work_group_size(x, y, z))) when x*y*z equals native
wavefront size.
I put the barrier here mostly for aesthetic reasons.
Moved as suggested, as this is an important design decision which deserves
some emphasis.

Massimo

2015-10-11 8:29 GMT+02:00 Solar Designer <solar@...nwall.com>:

> Massimo,
>
> As a trivial change to your existing code, you could try moving the
> barrier.  You have:
>
>                                 barrier(CLK_LOCAL_MEM_FENCE);
>                         }
>                         xo = (ulong)(xo >> 32) * (uint)xo;
>                         xo += gather[0 + 0];
>                         xo ^= gather[2 + 0]; // do this uint for slightly
> improved perf?
>                         xi = (ulong)(xi >> 32) * (uint)xi;
>                         xi += gather[0 + 1];
>                         xi ^= gather[2 + 1];
>
> but you could have:
>
>                         xo = (ulong)(xo >> 32) * (uint)xo;
>                         xi = (ulong)(xi >> 32) * (uint)xi;
>                         barrier(CLK_LOCAL_MEM_FENCE);
>                         xo += gather[0 + 0];
>                         xo ^= gather[2 + 0]; // do this uint for slightly
> improved perf?
>                         xi += gather[0 + 1];
>                         xi ^= gather[2 + 1];
>
> Maybe the compiler does this for you anyway, or maybe not.
>
> The way I designed pwxform, the multiplications and gather loads are
> supposed to work in parallel.
>
> Alexander
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ