lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 24 Jan 2014 09:03:51 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] cache timing attacks (Re: [PHC] Initial multiply-compute-hardened
 Catena-3 benchmark)

On Fri, Jan 24, 2014 at 4:46 AM,  <Stefan.Lucks@...-weimar.de> wrote:
> On Fri, 24 Jan 2014, Solar Designer wrote:
>
>> If this corresponds to a substantial portion of the full hash
>> computation, then that attacker hasn't gained all that much - only a
>> speedup of their offline attack by a certain factor, which we may try to
>> make reasonably small.
>
>
> Agreed, the speed-up by knowing the memory-access pattern is low, if you
> just count the clock cycles. For scrypt, the speed-up would be about two.
> But you seem to overlook a crucial point.

The scrypt author did an amazing job, IMO, but one thing he seemed not
to understand very well is the complexity of Salsa20/8.  I think in
custom hardware, this should be about a 1ns per iteration inner loop,
producing 64 bytes, or 64GB/second on a single tiny core.  This is
much more than 2X of what we can do on a single CPU.  This one reason
scrypt has fallen to GPU attacks to some extent.  Just counting clock
cycles is a lot harder than it used to be!  I'm guessing you cut your
teeth on assembly code in the 80's or 90's, when counting clocks still
worked.  I miss those days.  Personally, I'm counting on the latency
of sequential multiply operations to thwart an attacker from speeding
up hashing by more than 2X.  These shift, XOR, and ADD operations
don't do the job.

> Once you know the memory access pattern, one can sort out wrong password
> candidates almost *memoryless*. Thus, the attacker can run the attack on a
> memory-constrained massively parallel hardware (e.g., run on a GPU with
> thousands of cores, using only the L1 caches) -- completely defeating the
> entire purpose of using a memory-intense password-hash function!

Yep.  This is the entire reason for my interest in hard-to-pebble
hashing algorithms.  It remains a conundrum.  We certainly can do a
better job making an attacker use more memory with random password
dependent reads/writes.  How much speed/memory are we willing to give
up to be more cache timing resistant?

> The only difficulty is to gather the required information about the memory
> access pattern ("which cache lines may have been read in the first few
> iterations of scrypt").
>
> Stefan

Scrypt does no password dependent memory access while filling memory.
With a more compute intensive (in an ASIC) hashing function, it would
be compute-time-hardened.  But, the attacker still gets to run in L1
cache on his GPU cores, increasing his attack speed 1000X-ish -- I
hate trying being specific when Alexander is here to give me the real
numbers :-)  How many script instances can a GPU run in parallel if it
does not try to save any of the computed data?  How fast do they run?

Bill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ