lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 16 Sep 2014 23:51:59 +0000
From: Brandon Enright <bmenrigh@...ndonenright.net>
To: Bill Cox <waywardgeek@...hershed.org>
Cc: discussions@...sword-hashing.net, bmenrigh@...ndonenright.net
Subject: Re: [PHC] A review per day - OmegaCrypt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Bill, thank you for the thorough review of OmegaCrypt.  I've trimmed
your review and responded to some of your points inline and written a
small summary at the end.


On Wed, 03 Sep 2014 14:11:07 -0400
Bill Cox <waywardgeek@...hershed.org> wrote:
> 
> OmegaCrypt uses CubeHash for it's cryptographic strength security,
> [...]  I will not attempt to analyze these choices, or check their
> implementation, since we could substitute CubeHash with Blake2b or
> whatever if we wanted.

Indeed.  I used CubeHash for its simplicity and adjustable
speed/security tradeoffs.  Although I used ChaCha8 I see no reason why
it couldn't be reduced to 4 or 6 rounds.

> I think that the OmegaCrypt architecture can
> easily be seen to produce secure hashes, [...]  I have no security
> concerns about OmegaCrypt, other than the concerns I have about all
> the Catena-like algoritms.

Thanks, I tried to make the cryptographic choices and design as
uncontroversial as possible.

> 
> OmegaCrypt does small unpredictable reads, based on the password, like
> bcrypt.  However, in OmegaCrypt's case, I believe it should not.  What
> "Practical" algorithns like bcrypt and Scrypt do rapid small
> unpredictable addressing to foil both GPU and ASIC attacks.
> OmegaCrypt dribbles them out so slowly as to not provide enough
> defense to be worthwhile.

Yes, in the current proposal OmegaCrypt does tiny reads from anywhere
in the entire allocated memory region.  Many other proposals try to
select a smaller windows to work in so that better use of the multiple
levels of CPU cache can be used.

OmegaCrypt spends almost all of time doing random memory accesses
rather than saturating memory bandwidth or exhausting computing
resources.

It seems like there are a few strategies proposals seem to use:

* Try to exhaust computing resources by making a guess at the SIMD
  width per SIMD instruction and guess at the number of parallel
  threads / execution units and trying to keep them all maximally busy.

* Try to exhaust memory bandiwdth by performing large transfers to and
  from main memory as rapidly as possible.

* Attacking all but large computers by trying to use huge amounts of
  memory.

* Attacking certain types of computational models like stream
  processing so that parallelism, especially extra-wide parallelism in
  GPUs can't be used.


OmegaCrypt attacks stream processing by making heavy use of
data-dependant branching and all other platforms by trying to use a
large amount of memory.

The trouble I have with trying to exhaust computational resources is
that it is hard to predict how parallel to make the algorithm.  If
you guess too small and the bad guys can build very parallel machines.
It's also hard to predict how computing will evolve 10+ years from now.

The trouble I have with trying to exhaust memory bandwidth is that
strategies to do so try to guess how the levels of caching work and the
size of the caches.  Choices made today might be poor several years in
the future.  Also, I think ASICS provide a lot of freedom to
designers.  Instead of taking the data to the computation (memory
transfer) you can take the computational logic to the data.  Also,
there is no real reason why an ASIC can't have the same multiple memory
levels seen on modern computers.

In general, I think any design meant to be extremely efficient for one
type of computational model will inadvertently end up being efficient
on others.

> I recommend that his ChaCha state be
> initialized with a constant, rather than the input data.

This would defeat the data-dependent branching that is at the core of
the anti-GPU design of OmegaCrypt.

I am considering other options though.  For example I could truncate
the ChaCha key to just 20 bits which would produce about 1 million
different branching paths but put much tighter limits on how effective
cache timing attacks would be.

> I also
> recommend that case 2 and 3 be modified to not do memory dependent
> addressing.  With both of these changes, OmegaCrypt would become a
> competitor in the Catena-like category of entries concerned more about
> what I call "mathematical" defense rather than "practical" defense.

If OmegaCrypt makes it to round 2 I will completely redesign each
branch.  I will document the design criteria too.

> In terms of competing against more practical algorithms like
> PufferFish, OmegaCrypt gets crushed in speed.  It fairs better against
> Catena and Gambit.

OmegaCrypt is supposed to be inefficient on all computers.  Your use of
the word 'speed' only makes sense if you want to talk about operations
per second because you can adjust the time parameter to make OmegaCrypt
run in however much time you want.

> Like the other bcrypt-like and Catena-like entries, OmegaCrypt is
> essentially a cache-bound algorithm, suffering from excessive delay
> when accessing external DRAM.

Actually I believe OmegaCrypt is memory latency bound.  Its memory
access pattern makes caching almost useless.

> However, because of it's calls to
> ChaCha for the pseudo-random addresses and data, it runs closer to the
> speed of the Catena-like algorithms, though easily 10X slower than the
> bcrypt-like algorithms.  The choice to use only output from strong
> cryptographic hashes is shared among the Catena-like entries, which is
> why they run so slowly.

I don't think either ChaCha or CubeHash are the bottleneck here.  The
final CubeHash has been severely weakened to favor speed over
security.  Both CubeHash and ChaCha can use enough parallelism in
modern CPUs that with an efficient implementation, only a negligible
amount of computational time will be spent in either ChaCha or CubeHash.

> I think this is the group of people who feel
> that mathematical security is paramount, rather than what I consider
> to be "practical" security.  This is also why they all decided to
> focus on purely cache-timing resistant algorithms, except for
> OmegaCrypt.

I think we disagree about the word practical.  I did my best to make
OmegaCrypt practical.  I think that shows through in the simple,
easy-to-implement design and the tuning of CubeHash for speed rather
than strong security.  I don't think there is any mathematical or
theoretical basis for OmegaCrypt, just a practical one.

> 
> Because of OmegaCrypt's choice to focus on mathematical security at
> the expense of bcrypt-like speed, while not being cache-timing
> resistant, I feel it does not have an area where it is competitive.

I think the method of defeating stream processing (GPUs) is novel and
makes it competitive.

> [...]
> 
> One thing to note is that there is no need for XORing ChaCha data with
> constants to generate pseudo-random addresses.  This does not make it
> any more random, but does slow down the defender.

This was done so that implementations couldn't pre-fetch some of the
memory addresses.  As I've said before, I will be revisiting the
branching so this might be adjusted.

> Like several entries, OmegaCrypt XORs and ADDs over memory, but
> probably should use the read memory first, since it's read anyway.
> This would make it more TMTO resistant with little computation
> penalty.  Lyra2 does this, for example.

The memory initialization with XOR is meant to be a negligible part of
the computation but I'll take this under consideration.

> ASIC Hardness
> - -------------
> 
> A significant problem that the author may not realize is that
> OmegaCrypt does not benefit from it's unpredictable addressing nearly
> as much as bcrypt-like entries.
> 
> The data from the ChaCha stream *is* predictable in the sense that I
> can pre-compute as much of it as I want at about 32 bytes per ns in my
> high-end ASIC.  This allows me to pre-compute which cases will execute
> in each loop, as well as most addresses that will accessed, so that
> they can be put in the pipeline before their data is needed.  The only
> effectively unpredictable memory accesses are in case 2 and 3, one
> each.  The read from memory indexed by carry in case 2 depends on the
> value of carry computed in the previous loop iteration.  The
> unpredictable memory access in case 3 is the read from an address
> loaded from a predictable memory location.  Unfortunately, there is no
> sequential dependence, and the case 3 addresses can call be
> pre-computed in advance as well.

All of the cases will be re-done if OmegaCrypt makes it to round 2.  I
know the data-dependent branching doesn't affect an ASIC at all but I
want precomputing the memory to be accessed to be just as costly as
simply doing the computations.

I think OmegaCrypt's ASIC resistance comes down to three things:

* The ASIC can't (efficiently) make use of a streaming architecture

* A high memory cost parameter forces the ASIC to have lots of RAM

* The random memory accesses mean most time is wasted in memory latency
  rather than making use of custom computational logic.


The big assumptions I'm making here are that you can't large quantities
of significantly faster, lower latency RAM on the ASIC than you can on
a general purpose computer.

> [...]

> TMTO Resistance
> - ---------------
> 
> [...]
> 
> OmegaCrypt should also be modified to make use of the data read when
> doing the += and ^= operations to memory.  This creates more edges in
> the data dependency graph, making it harder to pebble.  Lyra2 benefits
> from this.

I will consider it.

> 
> Benchmarks
> - ----------
>
> [...]
>
> 1GiB hash:
> 
> PHC> time ./phs-omegacrypt 0 10
> 
> dc 99 9a e6 b6 0b 65 82
> bf 3b 83 8a 68 f9 39 80
> 3b 25 ed d6 20 06 a3 40
> 1d bb c5 b3 b2 2b 41 8d      32 (octets)
> 
> 
> real	0m13.910s
> user	0m13.846s
> sys	0m0.068s
> 
> [...] Also, I ran this with t_cost == 0, which did only 2^17
> loop iterations, potentially creating a TMTO vulnerability, since most
> data in memory remains simply the ChaCha output to the end.

A lot of the terrible speed here is in the unoptimized ChaCha and
CubeHash implementations.  You've adjusted the parameter here to
benchmark ChaCha and CubeHash almost exclusively.

> 
> OmegaCrypt is simply too slow and has too weak ASIC resistance to be a
> competitive algorithm in the field of "practical" bcrypt-like and
> Scrypt-like entries.  It would do better if it were tweaked to compete
> in the Catena-like category instead.
>
> [...]
>
> Bill


Thanks again for your review.  I think I can fairly summarize your
main complaint as "OmegaCrypt spends too much time waiting on memory
latency instead of performing lots of operations on the memory that is
accessed".

You're right, OmegaCrypt is memory latency bound rather than memory
capacity, memory bandwidth, computational capacity, or
computational speed bound.

OmegaCrypt started out as an "everything and the kitchen sink" idea
similar to some of the other PHC proposals.  I started out with the
goal of defeating GPUs and built a bunch of other things around that.
I realized before submission though that my design was too complex and
it was too hard to evaluate the design.  Worse, I felt the core GPU
resistant design was obscured by al of the other cruft.

So I stripped out all of the cruft and made a minimal anti-GPU design.
I think my approach to data-dependent branching is novel and I hope
that's what the judges look at when voting.

I'm no expert on ASIC designs or what you call "Catena-like" designs.
Sure I could make OmegaCrypt work in smaller windows of memory at at
time so that it efficiently uses caches and does a lot more
computations or uses a lot more memory in the same period of time.  I
doubt I could do it as well as others have though so instead I kept my
initial proposal simple.  If OmegaCrypt is selected to move on I will
perform "hardening" where appropriate which will address some of your
concerns about performance.


Regards,

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlQYzSUACgkQqaGPzAsl94Ik/wCgnesbRe1eiFo4C82BKAYubmKs
gKMAn1LQXKl0apQd+EjeIWHNKxVsh5UU
=d0sJ
-----END PGP SIGNATURE-----

Powered by blists - more mailing lists