phc-discussions - Re: [PHC] Reworked KDF available on github for feedback: NOELKDF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p58NXTT7u4yvewN5rUNGBW1mFbQw8HgeVTzbai-k75Gyg@mail.gmail.com>
Date: Sat, 4 Jan 2014 11:31:49 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Reworked KDF available on github for feedback: NOELKDF

On Sat, Jan 4, 2014 at 1:31 AM, Solar Designer <solar@...nwall.com> wrote:

> Two more comments on it:
>
> This appears to select the random page index based on the first
> uint64_t of a page:
>
>         // Select a random from page
>         fromPage = mem + PAGE_LENGTH*(*prevPage % i);
>
> and you appear to be computing uint64_t's of a page sequentially, in
> increasing order.  Thus, the next random page index becomes known
> almost as soon as you've started processing a page.  This may be
> intentional (e.g., EARWORM deliberately allows for one-ahead prefetch,
> but it targets memory bandwidth and doesn't try to be sequential
> memory-hard), but probably it is not (it provides extra parallelism and
> allows for much higher latency memory to be used efficiently, which
> you're not making use of - at least not yet - so it benefits attackers).
> scrypt uses the last (not the first) element of a block to determine the
> random index.
>

I had the same thought when I wrote that, but then I realized that I can
compute the last index just as quickly as the first, since they do not
depend on each other.  Unless I take your advice and reduce parallelism by
some tunable parameter, there's no way to force the page to be computed
before the next address is known.

I'm not convinced that reducing this sort of parallelism is desirable, but
if it is, it should be tunable as you suggest.  An ASIC based attacker is
going to be memory bandwidth limited almost regardless of our attempts to
limit parallelism, IMO.  They probably have a bigger serial execution speed
benefit in a custom ASIC than they have memory bandwidth benefit.  I think
we're more likely to hurt ourselves more than an attacker with limits on
parallel execution.  For example, I know of no recent Android phones or
Windows laptops that don't have some graphics acceleration ability.
 Attackers can use GPUs, but so can most users.  GRAM is often faster than
the CPU's main memory.  We could likely get closer to the attacker's speed
using our own GPUs by default.  With the multi-threading layout I'm
currently using, we could run threads on the GPU to max out GDRAM
simultaneously with running SIMD instructions on the CPU to max out DRAM
bandwidth.  GDDR5 has amazing bandwidth, so well tuned implementations
should use this in the future.  If I max out both my GDDR5 and DDR3 memory
busses, good luck to any attacker trying to beat my speed without paying as
much as me for RAM.

The reference implementation shouldn't touch the GPU, but a speed freak who
loves optimizing low level parallel execution... I'm not naming names here,
but I could sure use help in this area... such a speed freak would love to
max out both bandwidths at the same time.

Bill

Content of type "text/html" skipped