phc-discussions - Re: [PHC] omegacrypt and timing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <361056288.465690.1410974097650.open-xchange@oxuslxltgw10.lxa.perfora.net>
Date: Wed, 17 Sep 2014 12:14:57 -0500 (CDT)
From: Steve Thomas <steve@...tu.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] omegacrypt and timing

> On September 17, 2014 at 11:14 AM Brandon Enright
> <bmenrigh@...ndonenright.net> wrote:
>
> On Wed, 17 Sep 2014 06:30:36 -0400
> Bill Cox <waywardgeek@...hershed.org> wrote:
>
> > I think OmegaCrypt, POMELO,
> > and Schvrch do too little data-based branching to badly hurt a GPU
> > attack. The GPU can simply execute each case serially, and only write
> > data for the selected case, so with N cases, a GPU will not slow down
> > by more than a factor of N.
>
> This attack doesn't work against OmegaCrypt. Each branch consumes a
> different amount of the ChaCha stream used to guide future branching.
> Also, there is a carry value from the previous round.
>

That's is the exact reason this is easier to attack than the other algorithms.
Each branch using different amounts of ChaCha stream means run-times of
different passwords are going to have a larger run time deviations. Thus if you
know exactly how long it takes to run then without the hash you can run tests
narrowing the number of online password guesses. This assumes you can get
several timings of a valid login and the salt (either by obtaining it but not
the hash [ie encrypted hash] or the timing of several password attempts leaks
the salt). I consider this attack minuscule compared to random memory access.
Since I think that side channel is easier.


> For the very first round of branching the GPU could run all 4 in
> parallel and select the correct one. But, for the second round, the
> GPU would have to run branch 1 for each of the possible previous
> branches, and branch 2 for each of the possible branches, and so
> forth.
>
> The carry value is modified from start to finish through each
> branch. There is no realistic way to perform the attack you describe
> because the previous branch path possibilities grows exponentially.
> After just a few rounds you'd be doing an exponential amount of work.
>

When you run out of the ChaCha block data you conditionally generate another.
There is no exponential cost increase for GPUs.


> Or perhaps you're imagining doing all 4 branches, then pausing to
> figure out which was right, then doing another 4, pausing to figure out
> which was right, and so on? You'd need some sort of copy-on-write
> memory to facilitate "backing out" of the changes made by wrong
> branches. I don't see how you'd actually be avoiding any
> data-dependant branching.
>
> Can you explain more concretely what your suggested attack is? I don't
> see how what you're describing would actually work.
>
> Also for what it's worth, if OmegaCrypt makes it to round 2 I plan on
> adjusting the branches to consume a coprime amount of the ChaCha stream
> to further protect against executing future rounds by trying all the
> combinations of the previous rounds.
>

With GPUs they have a conditional variable that determines whether an
instruction is run or not. x86 SIMD does not have this, yet. It's coming to
AVX-512.