lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140917194109.689ab0f0@lambda>
Date: Wed, 17 Sep 2014 19:41:09 +0000
From: Brandon Enright <bmenrigh@...ndonenright.net>
To: Steve Thomas <steve@...tu.com>
Cc: discussions@...sword-hashing.net, bmenrigh@...ndonenright.net
Subject: Re: [PHC] omegacrypt and timing

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 17 Sep 2014 13:50:01 -0500 (CDT)
Steve Thomas <steve@...tu.com> wrote:

> > To avoid misalignment, if you ran all 4 for round 1, and then
> > selected the right one, then all 4 for round 2, then selected the
> > right one, etc., you'd be doing 4x as many memory operations and
> > you'd need a way of discarding the memory changes made by the 3
> > wrong branches. Is this the attack you're suggesting?
> >  
> 
> No, I'm saying that a GPU will waste clock cycles while not
> calculating the wrong data paths. This is do to it's conditional
> execution of instructions. If a thread is not suppose to run an
> instruction it will do a nop (no operation) instead.

Interesting.  So let me make sure I understand what this attack would
look like.

You'd N instances of OmegaCrypt on the GPU by allocating N ChaCha
states and N large regions of memory.  Then you'd allocate 4 threads
(or maybe 5 if you need a master thread) for each OmegaCrypt
instance and then 3N of the 4N threads would be able to
data-dependently disable (nop) themselves each round. In this way you'd
be able to keep 4N threads in sync with each other even though only 1N
worth are doing useful work.

If this is the case, using my current design I'd have to increase the
branch paths a lot, something that seems hacky and I really don't want
to do.

So a question about GPU memory.  If you have a ton of threads each
accessing memory at random, how well does this scale?  It won't exhaust
memory bandwidth but won't even a small number of threads exhaust
the rate at which the memory can serve access from "cold"
banks/blocks/regions?

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlQZ4+AACgkQqaGPzAsl94J0PwCgkK84p89W3q/W+MsbX1q5MJa4
pfUAoMXvcm5wjrGh7s2EoFxWbrewc4uz
=WxpC
-----END PGP SIGNATURE-----

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ