phc-discussions - Re: [PHC] Panel: Please require the finalists to help with benchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 2 Apr 2015 22:00:53 +0200
From: Dmitry Khovratovich <khovratovich@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Panel: Please require the finalists to help with benchmarks

I almost fully support this approach. The only thing I am concerned of is the number of parameters. I am sure that the users do not like too many parameters, and would prefer default values as much as possible. Moreover, as an analyst I do not like parameters that affect the security unpredictably; those that do not can be fixed.

Thus I would insist on hardcoding the number of rounds, hash function type, flags, etc. into the PHS() call. The designers should make a choice.

But apart from that, I thank Bill again for this proposal.

Dmitry

Sent from my iPhone

> On 2 Apr 2015, at 18:20, Bill Cox <waywardgeek@...il.com> wrote:
> 
> How this should be done is obviously up to the panel, but here's my thoughts anyway:
> 
> Milan has the most complete and up to date benchmark suite here:
> 
> https://github.com/mbroz/PHCtest
> 
> I think we should ask Milan to document how to run the tests in a README, and how to generate tables of results, and then require the finalists to verify the code and results.  However, I would prefer that the panel also require:
> 
> - Adding a C-linkable benchmarking PHS interface
> - Enhance their code to conform to this interface
> - Provide parameters to this interface for each “official” benchmark
> 
> I think all the authors should be required to fork Milan's github repo and submit pull requests for these enhancements.
> 
> There are several use cases, and they should be run with different parameters to show each algorithm in it's best light. Off the top of my head, they include for the memory-hard entries:
> 
> - L1 cache bound - 8KiB?
> - L3 cache bound - 4MiB?
> - DRAM bound - 1GiB?
> - Huge hash - 4GiB on multi-core CPU?
> 
> I have never cared for the t_cost parameter.  It's use for TMTO defense is clever, but I don't trust users to select this parameter based on any valid insights.  Is there any reason to benchmark with t_cost other than the defaults recommended by the authors?
> 
> We should separately benchmark 1 – N threads, where N is the number of cores on the machine doing the benchmark.
> 
> There are also target platforms:
> 
> - AMD 32/64 bit
> - Intel 32/64 bit
> - ARM (Raspberry Pi?)
> 
> GPU targets would be helpful, but I would not want benchmarks based on the authors' own attack code.  It should an independent GPU expert who writes the attack code.  Is there any way to get a GPU developer involved for this purpose?  It's a ton of work, so some company would probably have to support this effort.  Maybe Nvidia could be talked into it?
> 
> Similarly, we should have an independent ASIC expert involved for a realistic analysis for the estimated cost and performance of the various proposed attack ASICs.  My own experience is too outdated.  Could we get someone from AMD, IBM, or Intel?
> 
> I would like to see three compiled versions for each platform:
> 
> - Reference
> - Optimized, but no SIMD
> - SIMD optimized
> 
> This is mostly already done.  Milan did a great job getting this all to work in a unified framework.  I do not think it is worthwhile benchmarking “reference” implementations, but it is useful for verification.
> 
> To support all these use cases, I think the benchmarking PHS interface should take only:
> 
> - use_case: an enum identifying the plot that this benchmark data goes on
> - m_exp:  The memory hashed should be 2^m_exp in bytes
> - max_cores: the max number of CPU cores the algorithm is allowed to use
> - iterations: the number of times the hash function should be run in a loop
> 
> The authors would be responsible for picking the best parameters for each use.  If a use case or memory size is not supported, the function should return an error code rather than crash, so that it wont be included in that plot.
> 
> The number of iterations parameter could allow algorithms to allocate and initialize memory once and reuse it in each iteration.  This can help us focus on the difference in algorithm runtimes and reduce noise from different memory allocation techniques.  It would be very helpful for estimating how well an algorithm performs in the authentication server use case, and allow us to more accurately measure the runtime of those L1 cache bound hashes.
> 
> Lyra2 would need to be modified to have nPARALLEL as a runtime parameter.  Yescrypt would have to be modified to make the number of rounds a parameter.  IMO, it is not useful to claim that a compile-time parameter is a “tunable”.  Most users want to run just one compiled version.  The would enforce this.
> 
> Benchmark plots:
> 
> We should have plots with axes of runtime and memory used for ranges near the selected use case.  I don't think they should be log-log plots – that makes the differences to hard to see.  There should be separate plots for optimized versions and SIMD versions, and also separate plots for single-thread and multi-thread.  This may just be due to my color blindness – I do not like charts that have a ton of lines mixing various use cases.
> 
> This is a ton of work for the authors, but it's certainly not fair to ask Milan to do it all!  It is also not fair to ask analysts to figure out the right parameters for every use case.
> 
> Bill

Content of type "text/html" skipped