phc-discussions - Re: [PHC] Scripting memory (not so) high vs Catena in PHP (with optimizations)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140113093128.GA15018@openwall.com>
Date: Mon, 13 Jan 2014 13:31:28 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Scripting memory (not so) high vs Catena in PHP (with optimizations)

On Mon, Jan 13, 2014 at 02:50:22AM -0600, Steve Thomas wrote:
> I attached a faster version of Catena.

Thanks!  Here's how it runs on the i7-4770K:

$ php5 catena-sha512.php 
e73f619ca131d66303ebe195e5114f23f3082c7314922e5d61b3c5af6215002342c85c536d946615f83ac715367d13eb02e99e56814fc0b33eea12e305607c79
0.098096132278442
0.098453998565674
0.098135948181152
0.098548889160156
0.098435163497925
0.098062038421631
0.098290205001831
-----------------
0.098288910729545

a7afcba0ed58b5fe69be986485d8a0ce43ee7abb02f48b1db23cb1d37f5183234dad112a65c72a5a4751b9bf68f1c2b364f014e2cb8e2ebc7812e575b8024ac4
0.19590091705322
0.19565296173096
0.19623398780823
0.19550395011902
0.19607210159302
0.22412419319153
0.23219180107117
-----------------
0.20509713036673

For 8 concurrent instances, the averages are ~0.215 for optimized,
~0.445 for original.

I guess this means, for the optimized version, 10 c/s on one core, 37 c/s
cumulative for 8 concurrent instances.

> I averaged 7 calls and took the average of 3 runs. So your i7-4770K is
> 4.45x faster.

Ouch.  I guess you're using a 32-bit build of PHP.  Right?

> On January 12, 2014 at 9:12 AM Solar Designer <solar@...nwall.com> wrote:
> > On i7-4770K, the speed at 1 MiB is 37 c/s for 1 instance, 136 c/s
> > cumulative for 8 concurrent instances. Can you benchmark it on your
> > system, including against your Catena scripts?
> 
> 1 MiB, 4 concurrent instances:
> Catena: 8.34 c/s
> Catena-original: 4.21 c/s
> smhkdf-v2: 28.1 c/s

Thanks!

Q9300, 4 concurrent instances, 32-bit PHP?:
28.1/8.34 = 3.37x in favor of smhkdf-v2

i7-4770K, 8 concurrent instances, 64-bit PHP:
136/37 = 3.67x in favor of smhkdf-v2

i7-4770K, 1 core in use, 64-bit PHP:
37/10 = 3.7x in favor of smhkdf-v2

> > > Scripting memory (not so) high:
> > > $ja = unpack('V', $x);
> > > vs
> > > $ja = unpack('V', substr($x, -4));
> >
> > I've included this change now. The reason for substr($x, -4) was to
> > prevent an optimized (non-PHP) implementation from prefetching the next
> > block a few steps (3 steps, I think) before completing the SHA-512
> > computation. But if this costs us too much in terms of PHP overhead,
> > let's omit it. Besides, such prefetching may be helpful for defensive
> > native code implementations as well.
> 
> At first that's why I thought you were doing that, but it's the exact opposite
> of what's happening:
> 
> h = g
> g = f
> f = e
> e = d + temp1
> d = c
> c = b
> b = a
> a = temp1 + temp2
> 
> So the best choice is the first 4 bytes. You get the last value 3.25 rounds
> prior
> to finishing. temp1 only depends on e, f, g, h, and w[]. On round 74 (1-80) you
> can calculate the next 4 temp1s then you have the last value. It's about "4"
> rounds.

Oh.  I must have recalled incorrectly.  This sort of detail will need to
be reviewed more carefully before we finalize/standardize a KDF like this.

> > > $v = array();
> >
> > BTW, if we do that, we can also discourage TMTO by random writes in the
> > last loop:
> >
> > http://www.openwall.com/lists/crypt-dev/2013/11/20/1
> >
> > I think the same is not easy to do on a string, where substr() can only
> > be used on the right hand side, right? We're lacking an equivalent of
> > Perl's vec() in PHP, right?
> 
> I believe so, but you could do this with reading and writing characters. Only
> problem is this is slow.

Do you mean treating the string as array of chars?

> > I think PHP arrays are always associative (they use hash tables), which
> > means they're somewhat inefficient for our needs here. Right?
> 
> Correct, I don't know what they use to store arrays but it's order preserving.
> If I had to guess it would be a b-tree with two extra pointers for next and
> previous.
> I was actually impressed with the speed of Catena in PHP. I thought it would
> be much worse because of the use of arrays.

Yes, the only ~3.5x slowdown for Catena vs. a PHP-tuned algorithm is not
that bad.

Speeds aside, I like the secret-dependent indexing better, despite of
the side-channels risk, because it introduces a dependency on memory
latency for the attacker.  With pre-determined indices, the attacker may
prefetch blocks way in advance.

I think the side-channels issue may be mitigated by combining both
approaches: switching to the risky secret-dependent indexing at some
point during hash computation, so that the impact of a potential
early-reject (possible after side-channel monitoring) is fairly low.

Alexander