phc-discussions - Re: [PHC] A review per day

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 08 Sep 2014 20:28:31 -0400
From: Bill Cox <waywardgeek@...hershed.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] A review per day - Lyra2

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for the detailed reply!  You have an excellent team, and I
really enjoy the discussion.

On 09/08/2014 11:00 AM, Marcos Simplicio wrote:
>> Comparison ----------
>> 
>> Yescrypt wins at GPU defense, due to it's rapid unpredictable
>> small memory reads from L1 cache.  Like bcrypt, Yescrypt issues
>> multiple unpredictable small memory reads in parallel, to get to
>> about 1/2 the number that bcrypt does when single-threaded.
>> TwoCats only issues one unpredictable read at a time, and
>> performs them at about 1/3 the number as bcrypt when single
>> threaded.  However, Yescrypt is designed to max out everything at
>> the same time, including CPU cores, while TwoCats runs into
>> memory bottlenecks on my test machine using only half the CPU
>> cores.  This gives Yescrypt a 3X-ish advantage over TwoCats in
>> unpredictable read rate, which likely translates into better GPU
>> defense.  Lyra2 does small-ish unpredictable reads, but only once
>> per Blake2b reduced round, which is several clock cycles, and
>> several times slower than either Yescrypt or TwoCats.  This 
>> translates into weaker GPU defense, though I am not qualified to
>> say by how much.
> 
> Did you actually benchmarked all three algorithms on a GPU? Because
> we did so with Lyra2 and, even without the additional small random
> reads prescribed in section 6.2, the performance on our GPU was
> ~200 times worse than in our CPU (we testes for memory usages
> between 96 MB and 800 MB). That was the main reason why we did not
> include this extension in the core of Lyra2, but left it there for
> further analysis in a more powerful GPU.

No, I did not do any GPU testing.  My GPU defense guesses are a total
WAG based on what I've read in posts here, and nothing more.

Scrypt achieves parity with GPUs at about 4MiB hashes, according to
Alexander.  Finding the parity point would be an interesting test.

Frankly, such small memory hashes are only warranted when servers are
doing a ton of authentications per second.  Alexander likes to say
100's to 1000's.

Small memory hashes all the way down to just a few KiB apparently are
commonly needed, but I honestly do not know what for!  Hashing just a
few KiB seems entirely counter to "memory-hard" to me.  However, this
is where small unpredictable memory apparently provide good GPU defense.

> Even if TwoCats and Yescrypt (and Lyra2 with the above extension) 
> provide a 1000x slow down on GPUs, what is the point? If it is
> much slower, that will probably be enough to make GPUs out of the
> list of preferred attack platforms... Well, that unless the
> attacker can use a TMTO attack to bring down the memory usage to a
> level that fits a GPU and then use its multiple cores more
> efficiently, and that is why we focused on TMTO in our design.

I think the parity point is the most interesting number.  Scrypt has a
4MiB parity, which people seem to feel is too high.  Bcrypt does it in
4KiB, so clearly that number can be lowered, and small unpredictable
reads seems to be the trick.

>> 
>> ASIC defense for large memory hashing is excellent for all
>> three, assuming Lyra2 becomes multi-threaded.  An ASIC attacker
>> will quickly become memory bandwidth limited.  Practical ASIC
>> attacks have at most about a 32 times higher memory bandwidth
>> than a modern high-end CPU. This enables an attacker to run up to
>> 32 times faster, but the expense per ASIC of such a system is
>> likely similar to the cost of a desktop PC.  TwoCats and Yescrypt
>> have multiplication chain compute-time hardening, which in some
>> cases will limit an attacker's ability to hash at the full speed
>> supported by his memory bandwidth.  In that case, Yescrypt and
>> TwoCats will force an attacker to either make fewer guesses per
>> second, or buy more memory so that more attacks can be 
>> interleaved, filling up memory bandwidth.  Yescrypt and TwoCats
>> win over Lyra2 because of this.  TwoCat's multiplication chains
>> are tunable, allowing it to tune the multiplication chain lengths
>> to the hashing speed of the machine.  This allows TwoCats to be
>> more multiplication chain hardened than Yescrypt in some cases.
>> The extra area on the ASIC needed by Lyra2 and Yescrypt
>> computation logic vs TwoCats is tiny, as there would only be 32
>> copies at most.  TwoCats has a slight win here over Yescrypt.
> 
> Multiplication hardening does seem to be an interesting thing, and
> that I see as one of the main contributions of Yescrypt and
> TwoCats. We are still looking for something like that to suggest as
> part of the sponge's underlying "f" permutation, but I agree you
> and Alexander have the lead on the matter.
> 
> OTOH, I would like to see an analysis of the strength of the
> proposed algorithms in terms of confusion and diffusion
> capabilities. For example, taking the multiplication chain of
> TwoCats, the "a ^= b*c >> 32" operation is more likely to provide
> 0s than 1s to be XORed with a. Is that critical in terms of
> security? Maybe not, but without any security claims about what
> that multiplication chain should achieve and why, it is hard to
> tell.

My writing does leave a lot to be desired.  I certainly tried to
explain what the multiplication chains were for, and how they defend
against ASIC attacks.  However, I believe you when you say you did not
get that from my paper.

It's really pretty simple.  They just force an attacker to spend a
minimum amount of time doing sequential multiplies before he can begin
hashing the next block.  They are not important in any way for the
quality of the final hash.  All that matters is that an attacker has
to do the same as the defender and do the multiplications and XORs
sequentially.

The value of register "a" is simply hashed into the state values with
Blake2s after hashing each 16KiB block.  With alternating multiplies
and XORs, all of it reversible, I believe that an attacker will have
to duplicate these sequential operations.  TwoCats relies entirely on
the underlying cryptographic hash to provide irreversibly and solid
final hashes.

> Just to be clear: no FUD meant here! My point is that, as someone 
> mentioned in another thread (I could not find who), it is standard
> in the cryptography community to analyze your own design and make
> security claims about it so other people can (1) check the design
> against those claims and (2) agree/disagree about whether these
> properties are enough.

I agree.  I certainly tried to make clear security claims, but been a
noob in this field, I doubt I provided what was expected, at least not
in the expected format.  I did analyze each weakness I could think of
in the algorithm in more detail than I see in most papers.

I've been having a lot of fun attacking code lately.  I wish someone
would put in a good effort to attack TwoCat's code.  I think I've
gotten pretty good at seeing possible attacks, but when you write the
code, you simply can't think about it the way another attacker would.
 Steve Thomas had the last really good attack, which forced me to
switch to more standard ARX hashing, and introduce a real
cryptographic hash between blocks.

>> 
>> Against 1-8MiB on-chip memory hashing ASIC attacks, Yescrypt and
>> Lyra2 make increased use of the SIMD unit which will cause the
>> AISC to be slightly bigger than when attacking TwoCats.  All
>> three algorithms will max out the large memory access speed, but
>> Yescrypt does two unpredictable reads in parallel, which should
>> slow down the ASIC memory slightly.  On the other hand,
>> multiplication chains might into play here.  The ASIC memory can
>> be optimized for the algorithm in use, allowing much lower
>> average latencies, and the right number of read/write ports.
>> Only the multiplication chains will keep an attacker from making
>> full use of his improved memory speed.  I think Yescrypt and
>> TwoCats tie here because TwoCat's better multiplication hardening
>> and Yescrypt's slightly larger area may cause roughly equal grief
>> to the attack.
>> 
>> TwoCats and Yescrypt fair better than Lyra2 against and ASIC
>> attack when hashing only 8KiB of memory.  With multiplication
>> chains, an ASIC attack might still speed up TwoCats by 2X, and
>> Yescrypt maybe 3X (a bit of a WAG here).  However the Blake2b
>> hash was practically designed for an ASIC attack, and will run
>> many times faster.  The lack of compute time hardening in Lyra2
>> will allow maybe a 10X speed increase for the attacker.  The area
>> of the multipliers in TwoCats and Yescrypt also become a factor,
>> being around the same size as the memory in Yescrypt's case, and
>> about half that in TwoCat's case.  Because of the extra area
>> required for Yescrypt hashing, I cannot predict which has the
>> better ASIC defense.  I would call it a tie again between
>> Yescrypt and TwoCats, with Lyra2 trailing by some distance.
> 
> If you consider Blake2b as the underlying permutation, than I
> cannot disagree. And that is why we are analyzing other
> possibilities for our (reduced-round) permutation.

I would not recommend switching away from the hashing code you derived
from Samuel Neves work in Blake2b.  Using that code puts you on par
with data throughput with TwoCats and Yescrypt, something few of the
other entries achieve.  The main difference between Lyra2 and TwoCats
is that TwoCats only writes once, and reads once, on average per
memory location, while Lyra2 with minimum t_cost setting does 9
read/write operations.  That does result in much better TMTO
resistance, but at a high cost in terms of memory*time defnse.  Your
hashing function and sponge construction look really good to me.

Since the hashing runs on the SIMD unit, leaving the scalar unit
mostly free, I found I could get a series of XOR and multiply
operations to run there without slowing down things too much.  Feel
free to use any of my implementation for inspiration, especially
running multiplication chains in the scalar unit.  With my simpler
hashing loop and unpredictable L1 reads, TwoCats doesn't benefit from
the multiplication chains as much as I feel Lyra2 could.

>> 
>> It may be common for some groups of users to run
>> single-threaded, though I recommend against it in all cases
>> except for single-core CPUs.  In that case, TwoCats wins in ASIC
>> defense simply because it will fill more memory than either Lyra2
>> or Yescrypt.  A single TwoCats thread fills external memory over
>> twice the speed of either Yescrypt or Lyra2.  When no SIMD unit
>> is available, that ratio might increases by a factor of about 4,
>> corresponding to the pipeline depth used in the SIMD units by
>> Yescrypt and Lyra2.
> 
> Just to be clear: those benchmarks were made with Lyra2, TwoCats
> and Yescrypt with the minimal parameters? I ask because the
> minimum recommended value in Lyra2 is T=1 (i.e., R iterations of
> the Wandering phase), while TwoCats and Yescrypt seem to propose
> something equivalent to "T=1/3" as minimum parameter (i.e., R/3
> iterations of the Wandering phase).

Yes, I used minimum t_cost settings for all three, which is the main
reason Lyra2 is shown significantly behind Yescrypt and TwoCats in
memory*time defense.  With even two threads, I think Lyra2 will fill
my test machine's memory bandwidth, but it is doing the equivalent of
3X more memory operations.

If I set t_cost 3X higher in the current broken form in TwoCats, it
would run almost as fast, but with my new code, it would continue
hashing memory in a manner more similar to Lyra2 and Yescrypt, though
I do feel both of your algorithms "win" in using t_cost for TMTO
defense more effectively and elegantly.

> Without some kind of normalization (e.g., executing only 1/3 of
> Lyra2's Wandering phase or using a higher T for TwoCats and
> Yescrypt, if we are analyzing only hashing speed), these
> comparisons do not sound very fair.

I claim TwoCats is less TMTO resistant than either Lyra2 and Yescrypt,
yet it's TMTO resistance is high enough to make it impractical for
attackers to gain significantly on time*memory cost, and with > 2X
TMTO, they see increasing time*memory cost.  This is much better than
Scrypt in TMTO resistance, which sees up to a 4X reduction in
memory*time defense with a TMTO attack.  There may be some cases where
an attacker bothers with a TMTO against TwoCats, but I think this will
be due to desperation to fit into his available memory rather than an
attempt to hash faster.

If I count right, Yescrypt, I need to use t_cost == 4 for a similar
total bandwith for Yescrypt, and my modified (and not yet submitted)
version of TwoCats that uses t_cost in a more similar manner to Lyra2
needs t_cost == 14.

Here's the TwoCat's running as close as the same as I can get it to
run as Lyra2, but with two threads:

twocats> time ./twocats -m20 -t14 -M0 -B16384 -l8
hash:blake2s memCost:20 timeCost:14 multiplies:0 lanes:8 parallelism:2
algorithm:twocats-extended password:password salt:salt blockSize:16384
subBlockSize:16384

17 79 ff 16 64 a0 f8 42
d6 e8 1c aa 8a af 95 a8
f0 be f7 73 25 9f 62 ca
11 d1 8f d7 ed b4 17 62      32 (octets)


real	0m0.790s
user	0m1.478s
sys	0m0.096s

Here's Lyra2 on one thread:

PHC> time ./phs-lyra 1 174762
Allocated 1073737728 bytes
copying 32 bytes

ba 90 88 28 74 69 ca 30
8f 00 88 d7 68 c7 f8 36
5f 3f 78 99 99 49 60 91
e2 cb 14 93 6c 49 c9 37      32 (octets)


real	0m0.711s
user	0m0.633s
sys	0m0.076s

Lyra2 is a bit faster at doing 9 memory passes than TwoCats.  I think
this is because Lyra2 writes back to locations just read, and does all
read-writes within a block sequentially.

Yescrypt is similar to TwoCats with 4 threads:

PHC> time ./phs-yescrypt 4 17
Starting
Hashing
Allocating local, need of 1073786880
freeing
finished

0b 31 37 91 93 95 f8 d9
93 60 52 4a 33 5c 8c a7
7c 09 70 66 a6 e1 e3 ce
a6 00 de ff 21 f0 55 e1      32 (octets)


real	0m0.805s
user	0m2.963s
sys	0m0.143s

All three are basically memory bandwidth limited, which kicks in hard
around 12GiB/s on my test machine.  Yescrypt is around 11GiB/s,
TwoCats is around 11.4GiB/s, and Lyra2 is around 12.6GiB/s.

The memory*time peak in this case is close for all three, with Lyra2
edging out the win.  However, both TwoCats and Yescrypt filled memory
in the first 1/4th of the run, while Lyra2 took closer to 1/2 the run,
and the average memory*time is a bit lower than for Yescrypt and TwoCats.

Honestly, I get pretty excited about speed, but even I have trouble
getting excited about these differences!  They all basically tie when
bandwidth limited, with minor differences for various reasons.
However, nice job on that speed!

I think the most important mode for both TwoCats and Yescrypt, IMO, is
with t_cost == 0, which provides the strongest time*memory security
for both.  I stand behind my assertion that Lyra2 is erring on the
side of too much required TMTO defense, and that users would benefit
from a higher time*memory cost that Lyra2 could achieve with fewer
reads and writes per pass.

> Anyhow, to be really fair Lyra2 would also have to provide a
> parallel implementation, which is almost done and should be
> publicly released soon. I do not doubt TwoCats and/or Yescrypt will
> be faster, but

I was wrong about your bandwidth not being high enough on one thread.
 9GiB in 0.71s!  That's awesome.  You should let your multi-threading
guy know that I expect him to have a difficult time improving overall
runtime given that you already max out memory with one thread.

That said, Solar Designer's multi-CPU high-end servers with insane
memory bandwidth will require multiple threads of Lyra2 to max them out.

>> 
>> By the way, TwoCats and Yescrypt also beat bcrypt in tiny memory
>> ASIC attacks.  Bcrypt may make 4 parallel unpredictable reads per
>> cycle, but it does so to 4 separate super-tiny SRAMs, making them
>> even faster, and it has zero compute-time hardening.
>> 
>> For defense against cache-timing attacks, TwoCats wins.  A 
>> cache-timing upgrade is planned for Yescrypt, and Yescrypt may
>> come out on top when this happens, but without an algorithm to
>> test, it is impossible to say.  Lyra2 has a cache-timing
>> resistant first loop, like TwoCats, but it uses a weak memory
>> access pattern that leaves it vulnerable to a very high TMTO
>> attack when cache-timing data is available.  Lyra2, IMO, should
>> be upgraded to use something like the Gambit memory access
>> pattern, which defended against my auto-pebbler attack far
>> better.
> 
> I can't argue with that either: as I said the first time you
> mentioned this issue, we were experimenting with different pebbling
> strategies trying to find something both simple and strong,
> especially after our tests with bit-reversal failed. We did find a
> better approach (which also facilitates analysis!) which combines
> "simple reverse" and "variable step". The approach itself is
> described in the documentation available at http://lyra-kdf.net/,
> but the security analysis is not there yet (it will be as soon as I
> handle the LaTeX warnings and errors in the document...)

Sweet.  The cache-timing defense in Lyra2 is one of the things I like
best about it, though I am hoping you will wind up with a better
memory access pattern, and I bet you will.  Let me know if you would
like me to test pebbling a new memory pattern.

>> 
>> One of the most important applications for a password hashing 
>> algorithm of the strength that can be provided by these
>> algorithms is user authentication on dedicated authentication
>> servers.  Yescrypt clearly wins in this category, and in fact the
>> algorithm seems to have been designed with this application as
>> the primary target.  The extra security provided to small memory
>> hashes of a few MiB with large ROM hashing puts it in a class
>> only shared by EARWORM.  EARWORM was my other favorite for this
>> category until I realized it has a weakness against distributed
>> ROM attacks due to it's very low RAM usage.  As things stand,
>> Yescrypt is easily the best algorithm for authentication servers
>> in the competition.  It's weaknesses, such as not yet having any
>> cache-timing defense, don't matter much in this application, and 
>> it's strengths really come to bear.  Yescrypt will keep 8 CPUs
>> very busy in parallel while hammering memory at insane speeds,
>> providing thousands of authentications per second, strengthened
>> by ROM hashing in addition to megabytes of memory.  Even if the
>> password database AND ROM leak to an attacker, a botnet will not
>> be able to work on password cracking if the ROM is large enough.
>> There are many features besides this in Yescrypt for
>> authentication servers.  In comparison, both TwoCats and Lyra2
>> opted for a bit more simplicity, and provide no authentication
>> server specific features, such as ROM support.
> 
> I have to agree that Yescrypt's ROM idea against botnets is very
> clever and that its cache-oriented design is ingenious. It may be
> on the best interest of all PHC candidates to at least consider
> including ROM support to their designs, which I believe will not be
> very difficult for most (if not all) entries. Making better use of
> caching, on the other hand, will require someone with expertise on
> computers architecture...

You and I never had a chance to beat Alexander in this category :-)  I
thought about adding ROM support just so I could check that box, but I
don't know how best to do that.  I'd mess it up, and it would look
dorky.  If Yescrypt does not wind up as a winner, perhaps Alexander
can help the winning entry add decent ROM support.

>> 
>> While all three have no requirement for special hardware, all
>> three are tuned for modern x86 SIMD units.  TwoCats I think wins
>> in adaptability to basic CPUs.  This is because it can run with a
>> single 32-bit width with far fewer instructions per byte of
>> memory hashed. Yescrypt also has variable width SIMD loops, but
>> it's pipeline depth is not currently variable, making it slow
>> down somewhat on a basic CPU.  TwoCats wins here because it uses
>> less pipelined parallelism. This is basically a duplicate of the
>> single-threaded comments above :-)
> 
> Well, that depends on what you adopt as underlying "f" in Lyra2 :)

True, but it's tricky to swap out f and not mess things up.  There's
no other cryptographic hash in use, so it has to be done carefully.
I'm sure your guys can do it easily, but I would not want any random
programmer out there to attempt it.

>> 
>> All three algorithms have variable hashing block sizes, enabling
>> them to tune to a machine's DRAM cache miss penalty.  The longer
>> the penalty, the more memory you want to read in one shot.  All
>> three tie on this point, but win vs most of the other entries.
>> Every entry without a capability to read large blocks of memory
>> at a time cannot supplant Scrypt.
> 
> Just one remark here: the original idea of doing so came from
> scrypt itself.

Which is why it's amazing how few entries have this!  Several have no
equivalent feature on purpose (such as PufferFish), and I respect
that, but others seem to still want to replace Scrypt, yet have no
chance.  Good job getting it right :-)

>> 
>> Lyra2 easily wins at TMTO defense.  Both Lyra2 and Yescrypt use 
>> increasing t_cost to improve TMTO defense, but Lyra2's decision
>> to continue overwriting memory while reading I think makes it
>> more TMTO resistant.  TwoCats as submitted uses t_cost poorly.  A
>> revised version exists that will be submitted if allowed, which
>> does increase TMTO resistance as t_cost increases, but this
>> feature works better in Lyra2 and Yescrypt.  However, because
>> TwoCats was designed as a single pass algorithm, it has the best
>> TMTO resistance when t_cost turned off, meaning only one memory
>> pass.  This is not a mode Lyra2 allows, but Yescrypt does, and
>> both Yescrypt and TwoCats achieve higher memory*time defense as a
>> result.
> 
> I'm not sure I understood what you mean, but it seems to be a
> matter of including an "if" in the code...

I was trying to say Lyra2 and Yescrypt get t_cost usage right, and
that Lyra2 has no setting that allows for fewer than 9 read/write
operations per memory location, when TwoCats does 2 by default, and
Yescrypt runs with 2.25 with t_cost == 0 (which I prefer).

Scrypt also did 1 write and 1 read, just like TwoCats.  It also has a
higher potential memory*time defense than any entry in the
competition, since it fills memory first, before ever reading, but
that made it more vulnerable to TMTO attacks.

>> 
>> All three have a TMTO defense that improves over Scrypt, but
>> Lyra2 sacrifices speed and thus time*memory defense in order to
>> improve it's TMTO defense.  This lower time*memory defense may be
>> the main reason I prefer TwoCats and Yescrypt.  TMTO attacks gain
>> little against TwoCats in any configuration (maybe 10-15%
>> time*memory benefit), and even less against Yescryt and Lyra2.
>> This is why I feel TMTO defense is not a very important factor
>> when comparing these three algorithms.  If TMTO defense were
>> paramount, I would choose Lyra2 as my favorite.  As it is, TMTO
>> defense is good enough in all three.  A final note about TMTO:
>> Yescryt and TwoCats both have techniques for mixing data between 
>> threads, making it difficult to benefit from running each thread 
>> sequentially.
> 
> I'm not so sure about the importance of TMTO resistance in a
> design, especially considering that most of scrypt's article is
> focused exactly on this issue...
> 
> Anyhow, the main interest of Lyra2 over the original Lyra is
> exactly the TMTO resistance, while the extension discussed in
> section 6.3 does suggest that this TMTO resistance in terms of
> number of reads/writes could be parameterized. That is something we
> have been playing with lately, although it seems that lowering the
> number of reads/writes (i.e., making something similar to Lyra)
> makes sense only in a multi-thread scenario, in which the multiple
> threads can compensate as suggested by the analysis in section
> 6.1.
> 
> BTW, the parallel version suggested in Lyra2 does mix data between
> threads.

All excellent news!  Keep up the excellent work.

...
>> 
>> For pure memory hashing speed per thread, TwoCats is the
>> fastest, followed by Lyra2, and then Yescrypt.  However, this is
>> a deliberate design choice by Alexander, and not significant
>> weakness in Yescrypt, as he demonstrated by slightly tweaking
>> Yescrypt to be virtually as fast per thread as TwoCats during a
>> benchmarking fest.  The idea is that the best defense occurs when
>> all parts of the CPU and memory are in use at the same time.
>> Yescrypt maxes out memory bandwidth at about a CPU's full
>> parallelism, which is when all cores are used to hash at the same
>> time.  TwoCats maxes out memory bandwidth with about 1/4 as many
>> threads, and cannot employ all CPU cores at the same time like 
>> Yescrypt does.  The decision to max out CPUs vs trying to do the
>> most possible with one thread is based on the importance of
>> different scenarios.  For a low-powered single core processor,
>> TwoCats will do better than either Yescrypt or Lyra2, because
>> multiple threads will not help in that situation.  For high-end
>> authentication servers protecting passwords for major
>> corporations, Yescrypt wins by providing better computational
>> hardness - an attacker will need many SIMD unit's worth of
>> processing power to run as fast as the authentication server.
>> Once Lyra2 is upgraded with multi-threading, it should do better
>> than TwoCats, but worse than Yescrypt at computational hardness
>> (which is different than runtime hardness). IMO, computational
>> hardness matters little, since even multipliers are tiny on an
>> advanced ASIC.  Memory really is all that counts in terms of
>> silicon expense.
> 
> Our benchmarks with a parallel version do show that it is hard to
> beat Yescrypt's internal function in terms of cache usage. If the
> security of the algorithm proves to be strong, I have to give
> thumbs up for Yescrypt on this matter.

Yeah.  He out-coded me in this area.  IMO, not many people generally
succeed at that, but we've got a couple of really good SIMD guys here.

>> 
>> As for basic security, Yescrypt and TwoCats run well accepted 
>> cryptographic hash functions on all inputs at the beginning to
>> create a derived key, and that key is hashed with the result of
>> memory hashing again with the cryptographic hash function at the
>> end. Regardless of any weakness in TwoCats or Yescrypt memory
>> hashing, this scheme will preserve entropy and will always hash
>> at least as securely as two applications of the cryptographic
>> hash.
> 
> I agree with the phrase, but that basically implies that it may
> also be just as strong as those two applications of the
> cryptographic hash...
> 
> This is actually my main concern with TwoCats and Yescrypt: so far,
> I have seen no analysis that shows that "there is no loss of
> entropy" in the internal process. The fear that this may not be the
> case seems to be among the reasons why many entries, some of them
> created by people experienced in the area of security, prefer to
> use a full hash in the algorithm's design. This conservative
> approach at least gives them a security margin that allows them to
> build security claims more easily. Again, no FUD intended here:
> what I would like to see are exactly those security claims...

I certainly tried to show this.  In particular, both the
multiplication loop and hashing loop in TwoCats are reversible
permutations, and therefore loose no entropy.  Blake2s will be the
only source of entropy loss, which I do not think is an issue.  Also,
since early memory is constantly being re-hashed into later memory,
any Blake2b entropy loss is likely to get recovered.  The final output
from memory hashing has no need for much entropy in any case, because
the original high-entropy derived key is hashed with the memory
hashing result.

This is actually something I would love to see in Lyra2.  If there is
any concern about entropy loss, this should keep Lyra2 secure.  It
makes it easier, IMO, to prove the algorithm is secure.  I am biased
here, but after reading all three sets of code, Lyra2, IMO, wins at
simplicity and elegance, but left me a be worried about its reliance
on a reduced-round Blake2b to not lose entropy.  I believe it's not a
problem, but why not just hash in the original derived key before
returning the result?

Think of it another way.  Being able to replace the hash function is
supposed to be a Lyra2 strength.  Who would you trust with deciding
how to replace it?  If they get it wrong, Lyra2 security looks like it
would be in trouble to me.  Mixing in the pre-memory-hashing key at
the end is a great way to help save fools from themselves.  I happen
to be quite the fool fairly often.

>> Lyra2 relies on the reduced Blake2b function's application over
>> and over many times while hashing to cryptographically scramble
>> it's sponge's state, and it relies on this slightly modified
>> Blake2b function to not lose entropy. The final output is
>> squeezed from the sponge.  Because of this construction, Lyra2
>> hashing security needs careful analysis by hashing experts to
>> verify it has no significant weakness.  I personally do not feel
>> qualified to do it.  However, I do believe that they most likely 
>> got it right.  However, I would say TwoCats and Yescrypt win
>> here, in being trivial to prove basically secure.
> 
> I believe the exact opposite: it is non-trivial to say whether or
> not everything made between the two calls of the hash function in
> TwoCats and Yescrypt is secure or not. Remember your criticism
> about Schvrch, for example: at first sight, and without reading the
> analysis in the mentioned paper, it is hard to say whether or not
> the evolve and revolve functions would actually prevent shortcuts
> in the computation. Why is it "trivial" for a multiplication
> chain?

First, a full Blake2s Hash round is called after each block is hashed,
cryptographically scrambling the state often.  This is the only place
memory hashing in TwoCats can lose entropy.  In contrast, Lyra2
lightly scrambles the state in every operation, with possible entropy
loss in memory (but not the sponge).

Memory hashing in TwoCats and Yescrypt does not have to be
cryptographically secure.  Our memory hashing output can be highly
biased with no loss of security.  All that is required is that the
hashing cannot be reversed and does not loose so much entropy that an
attacker can skip memory hashing and try to guess it's output
correctly.  Guessing the output correctly brute-force for any given
password guess would have to be faster than the memory hashing, which
is not likely.  The state kept by TwoCats is parallelism times the
width of the underlying hash function.  It is cryptographically hashed
with non-entropy-losing hash data once for every block hashed.  Trying
to guess this state at the end would not work out for an attacker
trying to improve on 1 second  for a 4GiB memory hash.

In reality, I'm sure it would be fine to just return the state.  This
would be more similar to Lyra2.  However, just for added paranoia, I
prefer to re-hash in the derived key at the end.  I don't think this
is a "real" issue.  It's more like why we prefer to choose "safe"
primes when hard-coding primes for Duffie-Helman key exchange.  It's
just one less thing for the public to worry about.

> In Lyra2, we are semi-conservative: we combine a sponge
> construction, for which there many analysis stating that it may be
> used as long as the underlying "f" is a good permutation, and a
> function whose analysis indicate that it is a good permutation
> (originally, Blake2b, but it could be anything else). When we take
> the reduced round "f", we are actually relying on the fact that, in
> the long run (i.e., after a few rounds), this reduced-round
> operation becomes as secure as the full-round version, as
> prescribed by the ALRED construction. The first and last operations
> are done with the full sponge (again, following ALRED), so the
> whole design should be at least as secure as "2+R \times 
> (rounds/rounds_max)" applications of the hash function.

I agree, unless for some crazy reason the reduced Blake2b round is
spilling lots of entropy.  I checked your memory patterns with
Dieharder, and if you had a bug causing massive entropy loss, it most
likely would have shown up there.

>> 
>> TwoCats and Yescrypt win vs Lyra2 in terms of being ready for 
>> deployment.  Lyra2 still uses memset to clear buffers, just as a 
>> placeholder for the real function, and has no byte swapping to
>> deal with big-endian machines.  There is no provision for locking
>> memory from swap (mmap vs malloc).  It also still needs a thread
>> upgrade, while TwoCats and Yescrypt are in theory ready, other
>> than needing more review, for deployment now.  There are planned
>> upgrades to Yescrypt, but nothing critical, while TwoCats has a
>> modified t_cost loop that has not yet been submitted.  Lyra2 does
>> not yet have a parameter estimation function, while Yescrypt and
>> TwoCats do.  In reality, simply because Alexander makes so few
>> mistakes, Yescrypt is the closest entry of the three to being
>> ready for deployment, edging out a close win here vs TwoCats.
> 
> I cannot argue with that either, except maybe to say that we would 
> probably have a different AES today if the reference implementation
> was all that counted in that contest :)

I did list it last on my spreadsheet, and listed the units as "yawns".
 I'm slightly embarrassed to have pointed this out, but I'm even going
for banana attacks in my reviews :-)

>> 
>> Of the three, Lyra2 has the simplest and most elegant design.
>> The sponge construction was a pleasure to read, and cleverly
>> done. TwoCats probably comes in second here, simply because it
>> inherits no baggage from Scrypt, but complexity is a significant
>> weakness of the full TwoCats algorithm, which is why I also wrote
>> the very simple SinnyCat subset algorithm.  Starting with the
>> complexity of Scrypt and building on it made Yescrypt more
>> complex than Lyra2 or TwoCats.  I consider this to be the most
>> significant weakness of Yescrypt, and the most significant
>> advantage of Lyra2.
> 
> While simplicity was indeed a goal in the design, probably that 
> impression comes from the fact that we tried to make a core and
> leave extensions out of it. If you put everything suggested as
> "extension" on the same (pseudo)code, things get much more
> complicated. I believe the inverse is also valid for Yescrypt and
> TwoCats: take out the "ifs" and you get a simpler design (well,
> that ignoring the internal hash functions, which are more complex
> to understand and analyze...).

Absolutely true.  This is why I felt the need to create my SkinnyCat
sub-set hashing algorithm.  I would prefer that people compare some of
the simpler KISS algorithms in the competition to SkinnyCat than the
full blown TwoCats.

However, your sponge is cool.  It made my day when I read it.

When all the bells and whistles get added, and if I can brow-beat you
into offering fewer read/writes per memory location, I am confident
any of the three would be a real winner.

>> 
>> Of the three, only TwoCats has a pluggable hash function, meaning
>> that it can easily run with different secure cryptographic
>> hashes.  The default is Blake2b for it's speed, but TwoCats comes
>> with SHA256 and SHA512 enabled as well.  This is only a minor
>> weakness in Yescrypt, because there is nothing in Yescrypt
>> preventing this feature from being added.  However, in Lyra2 it
>> is fairly difficult to change the underlying hash function.  Any
>> change means that the algorithm needs to be reviewed again
>> carefully for any cryptographic weakness by experts.  In
>> contrast, any C programmer will find it trivial to change the
>> hash in TwoCats.
> 
> Hum... that is a good point, but only if you take "reduced-round
> hash function" literally (and that was our intention!). However, if
> you can find any good permutation to use in the middle of the
> algorithm (i.e., the "reduced-round function" that can be proven to
> be secure in the long run, then I do not see any reason why the
> first absorb and the last squeeze can be done with another
> permutation. Nonetheless, I would not put the security of a system
> on something that may or may not be secure...

With a few full cryptographic hashes as you go (maybe one per row?),
and a post-hash that includes the original derived key, I think you
could let the public twiddle with f.

>> 
>> Yescrypt wins in another imporant category: ease of adoption.
>> While remaining compatible with Scrypt has made Yescrypt more
>> complex than needed, it also makes it easier to deploy.  Anyone
>> out there with Scrypt already can simply replace it with Escrypt,
>> and enjoy the benefit of the higher speed in Alexander's
>> implementation.  They can easily switch to stronger hashing
>> techniques provided in Yescrypt at any time.  In contrast, users
>> will have to learn about a whole new hashing scheme to adopt
>> TwoCats or Lyra2.
> 
> Agreed.

Yep.  No one seems to think Alexander is a fool.  His algorithm is
still my preferred choice, but with the upgrades you're already
making, Lyra2 is right up there, IMO.  It is very nice work.

Bill
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUDkmrAAoJEAcQZQdOpZUZnlcP/j6k0Tm/Icj2ZYpT9mJqhlow
806gNI18NxF6zUzxgbS8aB3phgTgWX40tojFgSwMChCHDct1bjnAU9+51EkFBZxF
ckKb+FvEPpI++PHI8uVOPG8OhxcWOP7WDaxEKjUgSqT0ArPcX/WwOPkBKzb5w8+4
y6VNzefE4qcC4c8WjG67OBvXANeiR4WEZOkdZRszDn5QPvkQiX7mXYstcXucuk/o
L3qUkMv1TRdN72s5xvO5yWCi65z+EWiWVbMcM5IRa7K99QVe5o7PCuGo0B2yeuL0
srIp6UwrajfwG9x8IQF4h3XpRVZpI419jiEhKGYxq3CEccv0ohGoHtEHsliZosJq
b9iN51iFEShWyU0LzGzsJNqb2lj0INUvBhZQ8QHGYSh36Za+zpwIhMzxIcCuMRuv
9nT+XpxqAKwc+dkx37qLGueyvqQjHwo8sLLuwCRbH+cZfmizoH8VWew9Vw8f+gjN
jMyeQK25PdIZt7CH43EIy1VJUQiroZouh/b6hLflJFaUi+w8y68QjZ3AO6vHKpTy
fAEtZJWaTrfzih4ihuiW4AAZ3+Fr13JYBBL3eEEhAcxxXTddVITVe5koKUwwDH/o
NdJkN5yMHwOCzqbnnAwzny0PBNu9mnLtrOi6nwX1ATD6tlab5RTuFKkz7iCLmmGT
Zfz3SfR7EeM/1L1ZGTlx
=QLkk
-----END PGP SIGNATURE-----