phc-discussions - Re: [PHC] A review per day

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5411D546.9000907@larc.usp.br>
Date: Thu, 11 Sep 2014 14:00:54 -0300
From: Marcos Simplicio <mjunior@...c.usp.br>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] A review per day - Lyra2


On 08-Sep-14 21:28, Bill Cox wrote:
> Thanks for the detailed reply!  You have an excellent team, and I
> really enjoy the discussion.

Well, the problem with detailed replies is that they take time,
something that became scarce now that the classes have begun :/

I will try to keep it up, but clearly I'm getting late by a few days
already. :)

> 
> On 09/08/2014 11:00 AM, Marcos Simplicio wrote:
>>> Comparison ----------
>>>
>>> Yescrypt wins at GPU defense, due to it's rapid unpredictable
>>> small memory reads from L1 cache.  Like bcrypt, Yescrypt issues
>>> multiple unpredictable small memory reads in parallel, to get to
>>> about 1/2 the number that bcrypt does when single-threaded.
>>> TwoCats only issues one unpredictable read at a time, and
>>> performs them at about 1/3 the number as bcrypt when single
>>> threaded.  However, Yescrypt is designed to max out everything at
>>> the same time, including CPU cores, while TwoCats runs into
>>> memory bottlenecks on my test machine using only half the CPU
>>> cores.  This gives Yescrypt a 3X-ish advantage over TwoCats in
>>> unpredictable read rate, which likely translates into better GPU
>>> defense.  Lyra2 does small-ish unpredictable reads, but only once
>>> per Blake2b reduced round, which is several clock cycles, and
>>> several times slower than either Yescrypt or TwoCats.  This 
>>> translates into weaker GPU defense, though I am not qualified to
>>> say by how much.
> 
>> Did you actually benchmarked all three algorithms on a GPU? Because
>> we did so with Lyra2 and, even without the additional small random
>> reads prescribed in section 6.2, the performance on our GPU was
>> ~200 times worse than in our CPU (we testes for memory usages
>> between 96 MB and 800 MB). That was the main reason why we did not
>> include this extension in the core of Lyra2, but left it there for
>> further analysis in a more powerful GPU.
> 
> No, I did not do any GPU testing.  My GPU defense guesses are a total
> WAG based on what I've read in posts here, and nothing more.
> 
> Scrypt achieves parity with GPUs at about 4MiB hashes, according to
> Alexander.  Finding the parity point would be an interesting test.
> 
> Frankly, such small memory hashes are only warranted when servers are
> doing a ton of authentications per second.  Alexander likes to say
> 100's to 1000's.
> 
> Small memory hashes all the way down to just a few KiB apparently are
> commonly needed, but I honestly do not know what for!  Hashing just a
> few KiB seems entirely counter to "memory-hard" to me.  However, this
> is where small unpredictable memory apparently provide good GPU defense.

On 08-Sep-14 23:18, Solar Designer wrote:>> On 09/08/2014 11:00 AM,
Marcos Simplicio wrote:
>>> Did you actually benchmarked all three algorithms on a GPU? Because
>>> we did so with Lyra2
>
> I did not.  It's great that you did.  I was wondering what the CUDA code
> included with Lyra2 was for - whether it was to show how inefficient
> Lyra2 is on GPUs, or to show that it's usable defensively even on GPUs.
> Now it's clear that your intent was the former.
>
>>> and, even without the additional small random
>>> reads prescribed in section 6.2, the performance on our GPU was
>>> ~200 times worse than in our CPU (we testes for memory usages
>>> between 96 MB and 800 MB).
>
> This range is currently almost irrelevant for testing of GPU attack
> resistance.  It is clear that any sane scrypt-like algorithm will be very
> slow on current GPUs, as compared to current CPUs, for per-hash memory
> usage in this range.  This would be true for scrypt itself as well, but
> we're trying to do better than scrypt.
>
> A currently relevant per-hash memory cost setting is 2 MB for yescrypt,
> and probably much less than that for Lyra2 (since it's slower at same
> memory usage).  This is what you'd realistically achieve with mass user
> authentication.  96 MB is currently unrealistic for that use case.
> scrypt's recommended minimum of 16 MB is difficult to achieve as well -
> sometimes it's possible, sometimes not (depending on the required
> request rate capacity and exact choice of server hardware).


OK, I think I see the point made by both of you. We will try to perform
~1MB tests soon to see what happens. Then we will be able to tell if the
variant suggested in section 6.2 should make it to the core design.

Anyhow, I'm not sure whether Lyra2 is actually slower than
Yescrypt/TwoCats when the three do a full execution of the Wandering
phase (or "unpredictable loop", to use TwoCats's naming convention),
since this differs for the minimum parameters of each algorithm, as
mentioned by Bill below.

I think we will be able to get better normalized comparisons (e.g., by
bandwidth usage, by TMTO, by total number of hash operations, by
resistance to ASICs, etc.) after we have a parallel implementation of
Lyra2. OTOH, the absence of such implementation is totally our fault, so
I cannot complain about non-normalized comparisons, but just add a
"caveat" for the moment...


>> Even if TwoCats and Yescrypt (and Lyra2 with the above extension) 
>> provide a 1000x slow down on GPUs, what is the point? If it is
>> much slower, that will probably be enough to make GPUs out of the
>> list of preferred attack platforms... Well, that unless the
>> attacker can use a TMTO attack to bring down the memory usage to a
>> level that fits a GPU and then use its multiple cores more
>> efficiently, and that is why we focused on TMTO in our design.
> 
> I think the parity point is the most interesting number.  Scrypt has a
> 4MiB parity, which people seem to feel is too high.  Bcrypt does it in
> 4KiB, so clearly that number can be lowered, and small unpredictable
> reads seems to be the trick.
> 
>>>
>>> ASIC defense for large memory hashing is excellent for all
>>> three, assuming Lyra2 becomes multi-threaded.  An ASIC attacker
>>> will quickly become memory bandwidth limited.  Practical ASIC
>>> attacks have at most about a 32 times higher memory bandwidth
>>> than a modern high-end CPU. This enables an attacker to run up to
>>> 32 times faster, but the expense per ASIC of such a system is
>>> likely similar to the cost of a desktop PC.  TwoCats and Yescrypt
>>> have multiplication chain compute-time hardening, which in some
>>> cases will limit an attacker's ability to hash at the full speed
>>> supported by his memory bandwidth.  In that case, Yescrypt and
>>> TwoCats will force an attacker to either make fewer guesses per
>>> second, or buy more memory so that more attacks can be 
>>> interleaved, filling up memory bandwidth.  Yescrypt and TwoCats
>>> win over Lyra2 because of this.  TwoCat's multiplication chains
>>> are tunable, allowing it to tune the multiplication chain lengths
>>> to the hashing speed of the machine.  This allows TwoCats to be
>>> more multiplication chain hardened than Yescrypt in some cases.
>>> The extra area on the ASIC needed by Lyra2 and Yescrypt
>>> computation logic vs TwoCats is tiny, as there would only be 32
>>> copies at most.  TwoCats has a slight win here over Yescrypt.
> 
>> Multiplication hardening does seem to be an interesting thing, and
>> that I see as one of the main contributions of Yescrypt and
>> TwoCats. We are still looking for something like that to suggest as
>> part of the sponge's underlying "f" permutation, but I agree you
>> and Alexander have the lead on the matter.
> 
>> OTOH, I would like to see an analysis of the strength of the
>> proposed algorithms in terms of confusion and diffusion
>> capabilities. For example, taking the multiplication chain of
>> TwoCats, the "a ^= b*c >> 32" operation is more likely to provide
>> 0s than 1s to be XORed with a. Is that critical in terms of
>> security? Maybe not, but without any security claims about what
>> that multiplication chain should achieve and why, it is hard to
>> tell.
> 
> My writing does leave a lot to be desired.  I certainly tried to
> explain what the multiplication chains were for, and how they defend
> against ASIC attacks.  However, I believe you when you say you did not
> get that from my paper.
> 
> It's really pretty simple.  They just force an attacker to spend a
> minimum amount of time doing sequential multiplies before he can begin
> hashing the next block.  They are not important in any way for the
> quality of the final hash.  All that matters is that an attacker has
> to do the same as the defender and do the multiplications and XORs
> sequentially.
> 
> The value of register "a" is simply hashed into the state values with
> Blake2s after hashing each 16KiB block.  With alternating multiplies
> and XORs, all of it reversible, I believe that an attacker will have
> to duplicate these sequential operations.  TwoCats relies entirely on
> the underlying cryptographic hash to provide irreversibly and solid
> final hashes.

I really can't blame your writing as much as my reading, so let's be
gentlemen and say we are even :)

Anyhow, I'm not sure you are achieving the goal of forcing the attacker
to perform the multiplies, at least not always. My point is: since you
get the most significant portion of the multiplication (that is what ">>
32" does, right?), you get a biased distribution for the result. Hence,
there may be some shortcuts here, such as a simple (?) test that
eliminates the need of doing multiplications because you already now the
result by looking at (a part of) the inputs.

For the sake of the discussion, take a look at the attached excel file:
it shows what happens for multiplications between numbers of 4 bits. In
this scenario, about 30%  of the multiplications will give 0 as a result
(namely, when both operands are small enough), and there is a reasonably
high probability that only a few bits on the right will be 1. The same
file also shows that the least significant portion is less biased, while
the distribution when one at least one of the operands is odd is almost
unbiased (the latter is actually a result from number theory: when one
of the operands is co-prime to the modulus, you get all possible values
between 0 and the modulus-1).

With 32 bits, you will get 0 out of your multiplications much less often
than 30% of the time, but the bias toward small numbers (including 0)
will still be there. Is it worthy exploring such bias to avoid those
multiplications (e.g., using a look-up circuit)? I'm not sure, but if
that is the case, there will be passwords that can be tested with fewer
multiplications than usual.

I apologize if you already have considered this attack venue and deemed
it unpractical. However, in that case it may be worthy to add this
discussion to your security analysis to avoid anyone else thinking it
may be worthy exploring it.

BTW, is there a good reason why you are using the most significant
rather than the least significant bits? The latter seems better
considering that it leads to less predictable outputs...



> 
>> Just to be clear: no FUD meant here! My point is that, as someone 
>> mentioned in another thread (I could not find who), it is standard
>> in the cryptography community to analyze your own design and make
>> security claims about it so other people can (1) check the design
>> against those claims and (2) agree/disagree about whether these
>> properties are enough.
> 
> I agree.  I certainly tried to make clear security claims, but been a
> noob in this field, I doubt I provided what was expected, at least not
> in the expected format.  I did analyze each weakness I could think of
> in the algorithm in more detail than I see in most papers.
> 
> I've been having a lot of fun attacking code lately.  I wish someone
> would put in a good effort to attack TwoCat's code.  I think I've
> gotten pretty good at seeing possible attacks, but when you write the
> code, you simply can't think about it the way another attacker would.
>  Steve Thomas had the last really good attack, which forced me to
> switch to more standard ARX hashing, and introduce a real
> cryptographic hash between blocks.

This does help against the above attack idea (if it is really an attack
idea...), because then you get approximately random values for your
operands from time to time.

Side note: AFAIU, one of the good properties about ARX is that each of
its operations are unbiased (all outputs are equally distributed if you
combine every possible inputs)...

> 
>>>
>>> Against 1-8MiB on-chip memory hashing ASIC attacks, Yescrypt and
>>> Lyra2 make increased use of the SIMD unit which will cause the
>>> AISC to be slightly bigger than when attacking TwoCats.  All
>>> three algorithms will max out the large memory access speed, but
>>> Yescrypt does two unpredictable reads in parallel, which should
>>> slow down the ASIC memory slightly.  On the other hand,
>>> multiplication chains might into play here.  The ASIC memory can
>>> be optimized for the algorithm in use, allowing much lower
>>> average latencies, and the right number of read/write ports.
>>> Only the multiplication chains will keep an attacker from making
>>> full use of his improved memory speed.  I think Yescrypt and
>>> TwoCats tie here because TwoCat's better multiplication hardening
>>> and Yescrypt's slightly larger area may cause roughly equal grief
>>> to the attack.
>>>
>>> TwoCats and Yescrypt fair better than Lyra2 against and ASIC
>>> attack when hashing only 8KiB of memory.  With multiplication
>>> chains, an ASIC attack might still speed up TwoCats by 2X, and
>>> Yescrypt maybe 3X (a bit of a WAG here).  However the Blake2b
>>> hash was practically designed for an ASIC attack, and will run
>>> many times faster.  The lack of compute time hardening in Lyra2
>>> will allow maybe a 10X speed increase for the attacker.  The area
>>> of the multipliers in TwoCats and Yescrypt also become a factor,
>>> being around the same size as the memory in Yescrypt's case, and
>>> about half that in TwoCat's case.  Because of the extra area
>>> required for Yescrypt hashing, I cannot predict which has the
>>> better ASIC defense.  I would call it a tie again between
>>> Yescrypt and TwoCats, with Lyra2 trailing by some distance.
> 
>> If you consider Blake2b as the underlying permutation, than I
>> cannot disagree. And that is why we are analyzing other
>> possibilities for our (reduced-round) permutation.
> 
> I would not recommend switching away from the hashing code you derived
> from Samuel Neves work in Blake2b.  Using that code puts you on par
> with data throughput with TwoCats and Yescrypt, something few of the
> other entries achieve.  The main difference between Lyra2 and TwoCats
> is that TwoCats only writes once, and reads once, on average per
> memory location, while Lyra2 with minimum t_cost setting does 9
> read/write operations.  That does result in much better TMTO
> resistance, but at a high cost in terms of memory*time defnse.  Your
> hashing function and sponge construction look really good to me.
> 
> Since the hashing runs on the SIMD unit, leaving the scalar unit
> mostly free, I found I could get a series of XOR and multiply
> operations to run there without slowing down things too much.  Feel
> free to use any of my implementation for inspiration, especially
> running multiplication chains in the scalar unit.  With my simpler
> hashing loop and unpredictable L1 reads, TwoCats doesn't benefit from
> the multiplication chains as much as I feel Lyra2 could.


Well, we have been taking a look at the literature but only a few
algorithms do employ multiplications in their design and even fewer are
not considered broken or weakened...

Considering the great performance provided by Blake2 (since it can beat
MD5 and remain secure, I do not expect to find anything nearly as
good!), we are actually considering adding multiplications to it and see
what happens in terms of security and performance. It would at least be
one alternative to plain Blake2 (we are not planning to force any choice
for "f" anyway). I hope Samuel Neves and the other authors of Blake2 do not

I would gladly use your multiplication chain for the sake of testing
too, but I would not recommend it until further analysis (please do not
take this personally, but I'm paid to be paranoid :) )


> 
>>>
>>> It may be common for some groups of users to run
>>> single-threaded, though I recommend against it in all cases
>>> except for single-core CPUs.  In that case, TwoCats wins in ASIC
>>> defense simply because it will fill more memory than either Lyra2
>>> or Yescrypt.  A single TwoCats thread fills external memory over
>>> twice the speed of either Yescrypt or Lyra2.  When no SIMD unit
>>> is available, that ratio might increases by a factor of about 4,
>>> corresponding to the pipeline depth used in the SIMD units by
>>> Yescrypt and Lyra2.
> 
>> Just to be clear: those benchmarks were made with Lyra2, TwoCats
>> and Yescrypt with the minimal parameters? I ask because the
>> minimum recommended value in Lyra2 is T=1 (i.e., R iterations of
>> the Wandering phase), while TwoCats and Yescrypt seem to propose
>> something equivalent to "T=1/3" as minimum parameter (i.e., R/3
>> iterations of the Wandering phase).
> 
> Yes, I used minimum t_cost settings for all three, which is the main
> reason Lyra2 is shown significantly behind Yescrypt and TwoCats in
> memory*time defense.  With even two threads, I think Lyra2 will fill
> my test machine's memory bandwidth, but it is doing the equivalent of
> 3X more memory operations.
> 
> If I set t_cost 3X higher in the current broken form in TwoCats, it
> would run almost as fast, but with my new code, it would continue
> hashing memory in a manner more similar to Lyra2 and Yescrypt, though
> I do feel both of your algorithms "win" in using t_cost for TMTO
> defense more effectively and elegantly.
> 
>> Without some kind of normalization (e.g., executing only 1/3 of
>> Lyra2's Wandering phase or using a higher T for TwoCats and
>> Yescrypt, if we are analyzing only hashing speed), these
>> comparisons do not sound very fair.
> 
> I claim TwoCats is less TMTO resistant than either Lyra2 and Yescrypt,
> yet it's TMTO resistance is high enough to make it impractical for
> attackers to gain significantly on time*memory cost, and with > 2X
> TMTO, they see increasing time*memory cost.  This is much better than
> Scrypt in TMTO resistance, which sees up to a 4X reduction in
> memory*time defense with a TMTO attack.  There may be some cases where
> an attacker bothers with a TMTO against TwoCats, but I think this will
> be due to desperation to fit into his available memory rather than an
> attempt to hash faster.
> 
> If I count right, Yescrypt, I need to use t_cost == 4 for a similar
> total bandwith for Yescrypt, and my modified (and not yet submitted)
> version of TwoCats that uses t_cost in a more similar manner to Lyra2
> needs t_cost == 14.
> 
> Here's the TwoCat's running as close as the same as I can get it to
> run as Lyra2, but with two threads:
> 
> twocats> time ./twocats -m20 -t14 -M0 -B16384 -l8
> hash:blake2s memCost:20 timeCost:14 multiplies:0 lanes:8 parallelism:2
> algorithm:twocats-extended password:password salt:salt blockSize:16384
> subBlockSize:16384
> 
> 17 79 ff 16 64 a0 f8 42
> d6 e8 1c aa 8a af 95 a8
> f0 be f7 73 25 9f 62 ca
> 11 d1 8f d7 ed b4 17 62      32 (octets)
> 
> 
> real	0m0.790s
> user	0m1.478s
> sys	0m0.096s
> 
> Here's Lyra2 on one thread:
> 
> PHC> time ./phs-lyra 1 174762
> Allocated 1073737728 bytes
> copying 32 bytes
> 
> ba 90 88 28 74 69 ca 30
> 8f 00 88 d7 68 c7 f8 36
> 5f 3f 78 99 99 49 60 91
> e2 cb 14 93 6c 49 c9 37      32 (octets)
> 
> 
> real	0m0.711s
> user	0m0.633s
> sys	0m0.076s
> 
> Lyra2 is a bit faster at doing 9 memory passes than TwoCats.  I think
> this is because Lyra2 writes back to locations just read, and does all
> read-writes within a block sequentially.
> 
> Yescrypt is similar to TwoCats with 4 threads:
> 
> PHC> time ./phs-yescrypt 4 17
> Starting
> Hashing
> Allocating local, need of 1073786880
> freeing
> finished
> 
> 0b 31 37 91 93 95 f8 d9
> 93 60 52 4a 33 5c 8c a7
> 7c 09 70 66 a6 e1 e3 ce
> a6 00 de ff 21 f0 55 e1      32 (octets)
> 
> 
> real	0m0.805s
> user	0m2.963s
> sys	0m0.143s
> 
> All three are basically memory bandwidth limited, which kicks in hard
> around 12GiB/s on my test machine.  Yescrypt is around 11GiB/s,
> TwoCats is around 11.4GiB/s, and Lyra2 is around 12.6GiB/s.
> 
> The memory*time peak in this case is close for all three, with Lyra2
> edging out the win.  However, both TwoCats and Yescrypt filled memory
> in the first 1/4th of the run, while Lyra2 took closer to 1/2 the run,
> and the average memory*time is a bit lower than for Yescrypt and TwoCats.

Hum... that is an interesting point: our Setup/"resistant loop" is
indeed likely to take longer to run due to the higher number of memory
operations... We did not have that in mind when we thought about the
extension suggested in section 6.3 (for fine tuning the amount of
bandwidth usage), but it should affect this aspect too. We will have to
check.


> 
> Honestly, I get pretty excited about speed, but even I have trouble
> getting excited about these differences!  They all basically tie when
> bandwidth limited, with minor differences for various reasons.
> However, nice job on that speed!
> 
> I think the most important mode for both TwoCats and Yescrypt, IMO, is
> with t_cost == 0, which provides the strongest time*memory security
> for both.  I stand behind my assertion that Lyra2 is erring on the
> side of too much required TMTO defense, and that users would benefit
> from a higher time*memory cost that Lyra2 could achieve with fewer
> reads and writes per pass.

Well, I remember saying that TMTO resistance was our preferred
"alternative winning condition" a long time ago :)

But seriously, I don't disagree with you: TMTO resistance should be a
user-defined parameter, which is the idea behind the extension that
allows fine tuning the amount of bandwidth usage. Then the user can
decide what is the preferred approach (with the help of a "suggested
parameters" table and/or algorithm, of course)


> 
>> Anyhow, to be really fair Lyra2 would also have to provide a
>> parallel implementation, which is almost done and should be
>> publicly released soon. I do not doubt TwoCats and/or Yescrypt will
>> be faster, but
> 
> I was wrong about your bandwidth not being high enough on one thread.
>  9GiB in 0.71s!  That's awesome.  You should let your multi-threading
> guy know that I expect him to have a difficult time improving overall
> runtime given that you already max out memory with one thread.

He already realized that and was more than willing to work on making
bandwidth usage parameterizable. I don't know why he was so eager, but
it probably had something to do with a "otherwise, you won't get your
PhD" :p

> 
> That said, Solar Designer's multi-CPU high-end servers with insane
> memory bandwidth will require multiple threads of Lyra2 to max them out.
> 
>>>
>>> By the way, TwoCats and Yescrypt also beat bcrypt in tiny memory
>>> ASIC attacks.  Bcrypt may make 4 parallel unpredictable reads per
>>> cycle, but it does so to 4 separate super-tiny SRAMs, making them
>>> even faster, and it has zero compute-time hardening.
>>>
>>> For defense against cache-timing attacks, TwoCats wins.  A 
>>> cache-timing upgrade is planned for Yescrypt, and Yescrypt may
>>> come out on top when this happens, but without an algorithm to
>>> test, it is impossible to say.  Lyra2 has a cache-timing
>>> resistant first loop, like TwoCats, but it uses a weak memory
>>> access pattern that leaves it vulnerable to a very high TMTO
>>> attack when cache-timing data is available.  Lyra2, IMO, should
>>> be upgraded to use something like the Gambit memory access
>>> pattern, which defended against my auto-pebbler attack far
>>> better.
> 
>> I can't argue with that either: as I said the first time you
>> mentioned this issue, we were experimenting with different pebbling
>> strategies trying to find something both simple and strong,
>> especially after our tests with bit-reversal failed. We did find a
>> better approach (which also facilitates analysis!) which combines
>> "simple reverse" and "variable step". The approach itself is
>> described in the documentation available at http://lyra-kdf.net/,
>> but the security analysis is not there yet (it will be as soon as I
>> handle the LaTeX warnings and errors in the document...)
> 
> Sweet.  The cache-timing defense in Lyra2 is one of the things I like
> best about it, though I am hoping you will wind up with a better
> memory access pattern, and I bet you will.  Let me know if you would
> like me to test pebbling a new memory pattern.

Please feel free to do so if you wish. The description of the pebbling
strategy is on http://lyra-kdf.net/Lyra2ReferenceGuide.pdf (table 2 has
some examples). The idea is to use a step is approximately half the
window containing all rows that have already been initialized. More
precisely, for a window of wnd, we have step=wnd/2+(-1)^{lg(wnd)}, where
wnd is a power of 2 as in the original submission (but there the step
was always -1).

Among many alternatives we tested, this led to the best results when you
consider that we write on the rows we revisit. I'm not sure we would
beat bit-reversal if we were only reading rows, though (that is a merit
of Catena that TwoCats seems to have employed quite well).


> 
>>>
>>> One of the most important applications for a password hashing 
>>> algorithm of the strength that can be provided by these
>>> algorithms is user authentication on dedicated authentication
>>> servers.  Yescrypt clearly wins in this category, and in fact the
>>> algorithm seems to have been designed with this application as
>>> the primary target.  The extra security provided to small memory
>>> hashes of a few MiB with large ROM hashing puts it in a class
>>> only shared by EARWORM.  EARWORM was my other favorite for this
>>> category until I realized it has a weakness against distributed
>>> ROM attacks due to it's very low RAM usage.  As things stand,
>>> Yescrypt is easily the best algorithm for authentication servers
>>> in the competition.  It's weaknesses, such as not yet having any
>>> cache-timing defense, don't matter much in this application, and 
>>> it's strengths really come to bear.  Yescrypt will keep 8 CPUs
>>> very busy in parallel while hammering memory at insane speeds,
>>> providing thousands of authentications per second, strengthened
>>> by ROM hashing in addition to megabytes of memory.  Even if the
>>> password database AND ROM leak to an attacker, a botnet will not
>>> be able to work on password cracking if the ROM is large enough.
>>> There are many features besides this in Yescrypt for
>>> authentication servers.  In comparison, both TwoCats and Lyra2
>>> opted for a bit more simplicity, and provide no authentication
>>> server specific features, such as ROM support.
> 
>> I have to agree that Yescrypt's ROM idea against botnets is very
>> clever and that its cache-oriented design is ingenious. It may be
>> on the best interest of all PHC candidates to at least consider
>> including ROM support to their designs, which I believe will not be
>> very difficult for most (if not all) entries. Making better use of
>> caching, on the other hand, will require someone with expertise on
>> computers architecture...
> 
> You and I never had a chance to beat Alexander in this category :-)  I
> thought about adding ROM support just so I could check that box, but I
> don't know how best to do that.  I'd mess it up, and it would look
> dorky.  If Yescrypt does not wind up as a winner, perhaps Alexander
> can help the winning entry add decent ROM support.

I'm not sure (Alexander certainly grasps the details much better than I
do), but it seems to me that the same idea that allows more
pseudorandomly picked rows to be visited in Lyra2 (again, section 6.3)
would allow one or more of these rows to come from ROM rather than RAM.
Probably the same idea could be included in other entries too. But
again, this is an educated guess...

> 
>>>
>>> While all three have no requirement for special hardware, all
>>> three are tuned for modern x86 SIMD units.  TwoCats I think wins
>>> in adaptability to basic CPUs.  This is because it can run with a
>>> single 32-bit width with far fewer instructions per byte of
>>> memory hashed. Yescrypt also has variable width SIMD loops, but
>>> it's pipeline depth is not currently variable, making it slow
>>> down somewhat on a basic CPU.  TwoCats wins here because it uses
>>> less pipelined parallelism. This is basically a duplicate of the
>>> single-threaded comments above :-)
> 
>> Well, that depends on what you adopt as underlying "f" in Lyra2 :)
> 
> True, but it's tricky to swap out f and not mess things up.  There's
> no other cryptographic hash in use, so it has to be done carefully.
> I'm sure your guys can do it easily, but I would not want any random
> programmer out there to attempt it.

Hum... that makes me think that maybe we did not modularize the
(pseudo)code well enough... One more task to our TODO list :)

> 
> 
>>>
>>> Lyra2 easily wins at TMTO defense.  Both Lyra2 and Yescrypt use 
>>> increasing t_cost to improve TMTO defense, but Lyra2's decision
>>> to continue overwriting memory while reading I think makes it
>>> more TMTO resistant.  TwoCats as submitted uses t_cost poorly.  A
>>> revised version exists that will be submitted if allowed, which
>>> does increase TMTO resistance as t_cost increases, but this
>>> feature works better in Lyra2 and Yescrypt.  However, because
>>> TwoCats was designed as a single pass algorithm, it has the best
>>> TMTO resistance when t_cost turned off, meaning only one memory
>>> pass.  This is not a mode Lyra2 allows, but Yescrypt does, and
>>> both Yescrypt and TwoCats achieve higher memory*time defense as a
>>> result.
> 
>> I'm not sure I understood what you mean, but it seems to be a
>> matter of including an "if" in the code...
> 
> I was trying to say Lyra2 and Yescrypt get t_cost usage right, and
> that Lyra2 has no setting that allows for fewer than 9 read/write
> operations per memory location, when TwoCats does 2 by default, and
> Yescrypt runs with 2.25 with t_cost == 0 (which I prefer).
> 
> Scrypt also did 1 write and 1 read, just like TwoCats.  It also has a
> higher potential memory*time defense than any entry in the
> competition, since it fills memory first, before ever reading, but
> that made it more vulnerable to TMTO attacks.

OK, the fixed parameterization was inherited from the original Lyra, but
there we had few reads/writes and now we have many more. I get more and
more inclined to assume that the extension from section 6.3 is indeed
needed...

> 
>>>
>>> As for basic security, Yescrypt and TwoCats run well accepted 
>>> cryptographic hash functions on all inputs at the beginning to
>>> create a derived key, and that key is hashed with the result of
>>> memory hashing again with the cryptographic hash function at the
>>> end. Regardless of any weakness in TwoCats or Yescrypt memory
>>> hashing, this scheme will preserve entropy and will always hash
>>> at least as securely as two applications of the cryptographic
>>> hash.
> 
>> I agree with the phrase, but that basically implies that it may
>> also be just as strong as those two applications of the
>> cryptographic hash...
> 
>> This is actually my main concern with TwoCats and Yescrypt: so far,
>> I have seen no analysis that shows that "there is no loss of
>> entropy" in the internal process. The fear that this may not be the
>> case seems to be among the reasons why many entries, some of them
>> created by people experienced in the area of security, prefer to
>> use a full hash in the algorithm's design. This conservative
>> approach at least gives them a security margin that allows them to
>> build security claims more easily. Again, no FUD intended here:
>> what I would like to see are exactly those security claims...
> 
> I certainly tried to show this.  In particular, both the
> multiplication loop and hashing loop in TwoCats are reversible
> permutations, and therefore loose no entropy.  Blake2s will be the
> only source of entropy loss, which I do not think is an issue.  Also,
> since early memory is constantly being re-hashed into later memory,
> any Blake2b entropy loss is likely to get recovered.  The final output
> from memory hashing has no need for much entropy in any case, because
> the original high-entropy derived key is hashed with the memory
> hashing result.
> 
> This is actually something I would love to see in Lyra2.  If there is
> any concern about entropy loss, this should keep Lyra2 secure.  It
> makes it easier, IMO, to prove the algorithm is secure.  I am biased
> here, but after reading all three sets of code, Lyra2, IMO, wins at
> simplicity and elegance, but left me a be worried about its reliance
> on a reduced-round Blake2b to not lose entropy.  I believe it's not a
> problem, but why not just hash in the original derived key before
> returning the result?

The last squeeze operation is not round-reduced: it does employ the full
Blake2b to produce the keys. We do not fell comfortable about doing only
a reduced-round operation here either! Actually, I sincerely doubt that
two consecutive squeezes at this point would result in unrelated outputs.

> 
> Think of it another way.  Being able to replace the hash function is
> supposed to be a Lyra2 strength.  Who would you trust with deciding
> how to replace it?  If they get it wrong, Lyra2 security looks like it
> would be in trouble to me.  Mixing in the pre-memory-hashing key at
> the end is a great way to help save fools from themselves.  I happen
> to be quite the fool fairly often.

I fully agree with you. Actually, the ALRED idea that we borrowed to
design Lyra2 is "use a full hash at the beginning and at the end, and
possibly something less strong in the middle". If someone ignores the
part of sentence that precedes the comma, then there is no security
assurance.

> 
>>> Lyra2 relies on the reduced Blake2b function's application over
>>> and over many times while hashing to cryptographically scramble
>>> it's sponge's state, and it relies on this slightly modified
>>> Blake2b function to not lose entropy. The final output is
>>> squeezed from the sponge.  Because of this construction, Lyra2
>>> hashing security needs careful analysis by hashing experts to
>>> verify it has no significant weakness.  I personally do not feel
>>> qualified to do it.  However, I do believe that they most likely 
>>> got it right.  However, I would say TwoCats and Yescrypt win
>>> here, in being trivial to prove basically secure.
> 
>> I believe the exact opposite: it is non-trivial to say whether or
>> not everything made between the two calls of the hash function in
>> TwoCats and Yescrypt is secure or not. Remember your criticism
>> about Schvrch, for example: at first sight, and without reading the
>> analysis in the mentioned paper, it is hard to say whether or not
>> the evolve and revolve functions would actually prevent shortcuts
>> in the computation. Why is it "trivial" for a multiplication
>> chain?
> 
> First, a full Blake2s Hash round is called after each block is hashed,
> cryptographically scrambling the state often.  This is the only place
> memory hashing in TwoCats can lose entropy.  

I'm not sure we agree with the concept of "entropy loss". Entropy is
measured by how much "disorganized" things are. From that perspective,
any biased operation leads to entropy loss. To give extreme case
examples: a logical AND with zeros just wipes out every entropy of the
system, because it is fully biased toward zero; on the other hand, a
random oracle preserves entropy, since for each entry there will be an
unpredictable result (and that is why cryptographer like to model hash
functions as random oracles when analyzing the security of a system).

Then, I would say that the part in which you do a cryptographic hash is
the only place in TwoCats that you more are likely to _not_ lose
entropy. Was that I typo or I misunderstood something?

> In contrast, Lyra2
> lightly scrambles the state in every operation, with possible entropy
> loss in memory (but not the sponge).

I'm not sure I get it. If one round of the underlying function loses
entropy, then it is unlikely that the whole function will not lose
entropy. I mean, each round should conserve entropy (and that is the
idea behind unbiased operations of the ARX design) to ensure that the
whole function does so too.

If you mean that "Lyra2 leaves traits of the hash function in memory",
then I fully agree with you. However, I do not see how this can be
explored to reverse a round or to predict an output (it that was easy
enough for one round, then it should still be reasonably easy for 10
rounds). But this is basically a conjecture backed up by the ALRED idea.

> 
> Memory hashing in TwoCats and Yescrypt does not have to be
> cryptographically secure.  Our memory hashing output can be highly
> biased with no loss of security.  All that is required is that the
> hashing cannot be reversed and does not loose so much entropy that an
> attacker can skip memory hashing and try to guess it's output
> correctly.  Guessing the output correctly brute-force for any given
> password guess would have to be faster than the memory hashing, which
> is not likely.  The state kept by TwoCats is parallelism times the
> width of the underlying hash function.  It is cryptographically hashed
> with non-entropy-losing hash data once for every block hashed.  Trying
> to guess this state at the end would not work out for an attacker
> trying to improve on 1 second  for a 4GiB memory hash.
>
> In reality, I'm sure it would be fine to just return the state.  This
> would be more similar to Lyra2.  However, just for added paranoia, I
> prefer to re-hash in the derived key at the end.  I don't think this
> is a "real" issue.  It's more like why we prefer to choose "safe"
> primes when hard-coding primes for Duffie-Helman key exchange.  It's
> just one less thing for the public to worry about.


I do agree with this view that the middle of the memory hashing between
the start and end of the algorithm does not need to be cryptographically
secure in the sense of avoiding collisions and even second pre-image.
Resistance to first pre-image (i.e., no reversibility) and "having no
shortcuts" (e.g., cycles) do seem to be the main requirements here. I'm
just not fully convinced that these two requirements are met with
something that does not try to be unbiased and, thus, in the long run
create something that would look like a hash.

That does not mean at all that TwoCats or Yescrypt are insecure. Your
choice of "multiplies <=8", for example, does give me a certain
confidence to say that the simple operations employed in TwoCats are
unlikely to allow anyone to skip the whole multiplication chain too
often, because if the attacker gets even one bit wrong that will mess up
the full hash computation that comes right afterward. All I'm saying is
that they need a more careful analysis to see how often one can run into
trouble.



> When all the bells and whistles get added, and if I can brow-beat you
> into offering fewer read/writes per memory location, I am confident
> any of the three would be a real winner.

Well, most of the credit goes to our "parallelism guy" (Paulo Carlos, to
be exact), but at least you have the right to say "I told you so" :)


Marcos.

Download attachment "Multiplications.xls" of type "application/vnd.ms-excel" (64000 bytes)