phc-discussions - Panel: Please require the finalists to help with benchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p4kru=WqhdDenh001kpafWjSHOFTSAoJ4o8Ohgq2JW_6w@mail.gmail.com>
Date: Thu, 2 Apr 2015 09:20:00 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Panel: Please require the finalists to help with benchmarks

How this should be done is obviously up to the panel, but here's my
thoughts anyway:

Milan has the most complete and up to date benchmark suite here:

https://github.com/mbroz/PHCtest

I think we should ask Milan to document how to run the tests in a README,
and how to generate tables of results, and then require the finalists to
verify the code and results.  However, I would prefer that the panel also
require:

- Adding a C-linkable benchmarking PHS interface
- Enhance their code to conform to this interface
- Provide parameters to this interface for each “official” benchmark

I think all the authors should be required to fork Milan's github repo and
submit pull requests for these enhancements.

There are several use cases, and they should be run with different
parameters to show each algorithm in it's best light. Off the top of my
head, they include for the memory-hard entries:

- L1 cache bound - 8KiB?
- L3 cache bound - 4MiB?
- DRAM bound - 1GiB?
- Huge hash - 4GiB on multi-core CPU?

I have never cared for the t_cost parameter.  It's use for TMTO defense is
clever, but I don't trust users to select this parameter based on any valid
insights.  Is there any reason to benchmark with t_cost other than the
defaults recommended by the authors?

We should separately benchmark 1 – N threads, where N is the number of
cores on the machine doing the benchmark.

There are also target platforms:

- AMD 32/64 bit
- Intel 32/64 bit
- ARM (Raspberry Pi?)

GPU targets would be helpful, but I would not want benchmarks based on the
authors' own attack code.  It should an independent GPU expert who writes
the attack code.  Is there any way to get a GPU developer involved for this
purpose?  It's a ton of work, so some company would probably have to
support this effort.  Maybe Nvidia could be talked into it?

Similarly, we should have an independent ASIC expert involved for a
realistic analysis for the estimated cost and performance of the various
proposed attack ASICs.  My own experience is too outdated.  Could we get
someone from AMD, IBM, or Intel?

I would like to see three compiled versions for each platform:

- Reference
- Optimized, but no SIMD
- SIMD optimized

This is mostly already done.  Milan did a great job getting this all to
work in a unified framework.  I do not think it is worthwhile benchmarking
“reference” implementations, but it is useful for verification.

To support all these use cases, I think the benchmarking PHS interface
should take only:

- use_case: an enum identifying the plot that this benchmark data goes on
- m_exp:  The memory hashed should be 2^m_exp in bytes
- max_cores: the max number of CPU cores the algorithm is allowed to use
- iterations: the number of times the hash function should be run in a loop

The authors would be responsible for picking the best parameters for each
use.  If a use case or memory size is not supported, the function should
return an error code rather than crash, so that it wont be included in that
plot.

The number of iterations parameter could allow algorithms to allocate and
initialize memory once and reuse it in each iteration.  This can help us
focus on the difference in algorithm runtimes and reduce noise from
different memory allocation techniques.  It would be very helpful for
estimating how well an algorithm performs in the authentication server use
case, and allow us to more accurately measure the runtime of those L1 cache
bound hashes.

Lyra2 would need to be modified to have nPARALLEL as a runtime parameter.
Yescrypt would have to be modified to make the number of rounds a
parameter.  IMO, it is not useful to claim that a compile-time parameter is
a “tunable”.  Most users want to run just one compiled version.  The would
enforce this.

Benchmark plots:

We should have plots with axes of runtime and memory used for ranges near
the selected use case.  I don't think they should be log-log plots – that
makes the differences to hard to see.  There should be separate plots for
optimized versions and SIMD versions, and also separate plots for
single-thread and multi-thread.  This may just be due to my color blindness
– I do not like charts that have a ton of lines mixing various use cases.

This is a ton of work for the authors, but it's certainly not fair to ask
Milan to do it all!  It is also not fair to ask analysts to figure out the
right parameters for every use case.

Bill

Content of type "text/html" skipped