phc-discussions - Re: [PHC] A must read...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140115071223.GA22864@openwall.com>
Date: Wed, 15 Jan 2014 11:12:23 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] A must read...

On Tue, Jan 14, 2014 at 09:45:11PM -0500, Bill Cox wrote:
> On Tue, Jan 14, 2014 at 5:45 PM, Marsh Ray <maray@...rosoft.com> wrote:
> > Since Rand(i) % i doesn't depend on anything specific to the candidate password, couldn't it be precomputed? Or computed and fed to a large number of processors performing the rest of the loop in parallel.
> >
> > It looks like all memory accesses are predictable, therefore a custom external memory subsystem could be constructed which accepts streaming data as fast as you can you can move it on and off the chip (IO pin count * clock rate * DDR perhaps). But presumably the defender would not be able to utilize such an architecture, so it would represent an advantage to the attacker.
> 
> Yes, the predictable memory addressing presents a problem.
[...]
>     addr = blockLen*(i + (Rand(i) % i))/2
[...]
> You are correct that a sophisticated attacker could prefetch memory
> better than my CPU because of the fixed addressing in the first loop.
> However, with a block size of 4K bytes, I'm not seeing any significant
> memory latency issues.  Hashing 2GB in .4 seconds will still require
> an attacker move 4GB (one read, one write), over more than half of
> that .4 seconds.
[...]
> Thanks again for the feedback.  I feel like this multiplication thing
> is the right way to go, but I'm not 100% sure.

I think the multiplication thing is the right way to go along with
secret-dependent indexing.  Having the addresses predictable sort of
defeats the purpose of having the "latency guarantee" of the multiplies.

When the addresses are predictable, you're back to relying only on
memory bandwidth and latency for the time factor in area*time.

If you want to also have some safety from cache timing attacks, you need
to switch from predictable to unpredictable lookups somewhere during
your hash computation, and you should use the multiplies at least during
the unpredictable lookups phase.  The pseudo-code you posted got this
backwards: you're using the multiplies where they buy you little, and
you're stopping to use them where they'd buy you potentially a lot.

I hope this helps.

Alexander