phc-discussions - Re: [PHC] OMG we have benchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150401094015.GA10335@openwall.com>
Date: Wed, 1 Apr 2015 12:40:15 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] OMG we have benchmarks

On Wed, Apr 01, 2015 at 11:01:36AM +0200, Milan Broz wrote:
> Graph for t_min is here (but is somehow strange)
> https://raw.githubusercontent.com/mbroz/PHCtest/master/output/round2_Lenovo_X230_i5_16G/m_cost/memory_time.png

This the more useful one, and it doesn't look strange to me.  There's
some noise, but if the number of samples was small that's to be
expected.  Thank you!

It's puzzling that the Lyra2 lines don't reach 512 MB and that several
others don't have a final point at 1 GB, though.

When you benchmark -SSE versions, what exactly are those built for?
SSE2, SSSE3, SSE4.1, AVX, AVX2?

(FWIW, yescrypt's pwxform is such that it can be as optimal with SSE2 as
with AVX, but compilers tend to produce better code for it when SSE4.1
or AVX is enabled.  We may introduce a hand-written assembly version in
plain SSE2 eventually, which I expect to be same speed as AVX.  Luckily,
the performance difference between these different SIMD builds of
yescrypt is small, though.  Only AVX2 and beyond should differ more,
once such intrinsics are added to yescrypt-simd.c.)

While this is not relevant to your use case, I'd like to also see a
request rate capacity vs. memory usage per hash graph for the 128 KB to
128 MB range.  Note that I say "request rate capacity" rather than "run
time", because the relevant use case - password hashing on a server -
will have multiple concurrent requests, so that's what we should
benchmark (and this is what yescrypt's "userom" program simulates).

BTW, 128 KB is what UFC-crypt in glibc has been using/wasting for ages.

Alexander