phc-discussions - A review per day - EARWORM (and a request to the judges)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <541422BC.2060803@ciphershed.org>
Date: Sat, 13 Sep 2014 06:55:56 -0400
From: Bill Cox <waywardgeek@...hershed.org>
To: discussions@...sword-hashing.net
Subject: A review per day - EARWORM (and a request to the judges)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

EARWORM is designed specifically for one application: authentication
servers.  The idea is not to win at everything, just this one very
important case.  It *almost* does, and still might if tweaked.

I have already carefully reviewed EARWORM and do not need to look at
the code again.  It is excellent work on every measure.  It's core
idea is outstanding, one of the best in the competition.  It is
simple, well written, and has, IMO, almost no flaws.  It does have one
major flaw which I believe is simple to fix, and which only reluctance
by the author not to copy Alexander's work too closely has kept him
from doing.  Obviously, I do not have such reluctance.  I feel the
EARWORM author should simply acknowledge that he is making his design
more like Yescrypt, and go ahead and make the change.  I hope the
judges will judge EARWORM as if it had already been fixed.

The core idea that makes EARWORM different is to embrace the
limitations that come with using hardware AES-NI instructions.
EARWORM would be slower and less elegant using Blake2b.  To be an
excellent authentication server algorithm, EARWORM makes heavy use of
ROM, an idea promoted by Alexander.  It is the only other algorithm in
the competition besides Yescrypt to work really well as an
authentication server algorithm without requiring significant upgrades.

Having such a narrow focus enables EARWORM to have KISS simplicity
without leaving out a ton of important features.  It is missing client
independent update, server relief, and garlic, which are ideas I
copied from Catena.  Like the rest of the entries, it does need those
at some point, but without them, the algorithm is extremely simple.

EARWORM is not a memory-hard algorithm.  Instead, it's ROM-hard.  The
memory used per password is only 512 bits (4 AES-128 lanes).  An
attacker will not have to use much regular memory, but an attacker
will have to have a ton of RAM to support efficient access the huge
ROM.  If an attacker were to build a server similar to a regular
high-end Core i7 based server, he would gain zero in efficiency vs the
defender.  Because EARWORM relies on hardware AES, even an ASIC
attacker will gain little in speed attacking EARWORM.  The high-end
servers supporting EARWORM will have memory bandwidth comparable to
the best high-end ASIC attack.  This would limit even a high-end ASIC
attack to close to parity with the server in performance, though it
may be cheaper than a high-end server.

EARWORM makes optimal use of the available parallelism in the SIMD
units of modern Intel processors.  It also has a simple strategy for
nearly eliminating cache miss penalties.  EARWORM has the *highest*
memory bandwidth I have measured, higher even than second-place
winner, TwoCats.  I was *really* hoping to win that, but EARWORM is
faster.

This entry is fast.  It is simple.  It is well focused on
authentication servers.  It has outstanding GPU defense, and *almost*
outstanding ASIC defense.  There are virtues for authentication
servers in Yescrypt that improve over EARWORM, but the reverse is also
true.  It is not clear to me which algorithm wins in this space.

Except for one flaw.

Distributed ROM Attack
- ----------------------

Because EARWORM uses only 512 bits per password hash, it is simple to
mount a distributed ROM attack.  This is where an attacker takes a
1TiB ROM (as an example) and divides it into many smaller ROMS (say
4GiB each).  There are many ways to mount this attack.  It might even
be doable on a bot-net.  In a bot-net example, each node would hash
against 2GiB.  They would have a queue of maybe several MiB of
partially hashed passwords.  After hashing them all, they would be
sent to the next node over the Internet.  In his case, the Internet
bandwidth dominates, but at 256KiB/s, that's still 4096 guesses per
second, which isn't bad for a bot-net node.

Accelerating this attack initially is a matter of increasing the
bandwidth between nodes.  At 1GiB/s, there is enough bandwidth for
16.7 million guesses per second, which means we can use commodity
router hardware for this attack.  At that point, it is simply a matter
of how fast each node can hash passwords against ROM data.  This turns
out to be very fast with regular CPUs, which means each node is as
fast as the defender's high-end server, yet only a few GiB of ROM data
per node is required.  This is what makes EARWORM fail to be the
equivalent of memory-hard.  An attacker with a distributed ROM attack
pays for the ROM once, but uses it many times in parallel.

This attack gets much worse when a custom ASIC is used.  In that case,
the ASIC can have very many AES cores, with many times more compute
AES compute power than the defender.  Memory bandwidth would be a
limiting factor, except that the attacker is hashing many guesses in
parallel.  He can simply sort passwords based on which 1MiB of ROM
they need next, ohis AES cores hash in parallel at 1MiB cache
bandwidth speeds.  A custom cache with many ports could be built to
support far higher bandwidth than the defender's L3 cache.  There are
important details about how to do the password sorting fast enough,
but it all works out.  An ASIC attack would be pretty devastating,
eliminating most of the defender's ROM cost per node, and also
eliminating most of the defender's compute time per guess.  The
combined impact on memory*time cost is devastating.

The Fix
- -------

Fixing EARWORM is simple.  There is simply no reason to keep the
password hashing state at 512 bits.  At 1MiB, a distributed ROM attack
will become massively limited by bandwidth between nodes.  For
example, 1GiB/s commodity routers could only sustain 1024 states per
second between nodes.  A password update rate of 1024 per second is
pathetic, especially when you have to do many updates to each password
state to make one guess.

For such a system, we could design custom routing networks to increase
the bandwidth, but since the destination is unpredictable, it's
basically a full crossbar switch.  I'm sure Cisco could help us design
a 1TiB network, but we still have the bandwidth limitation into and
out of a custom ASIC node.  At maybe 384GiB/s, even just reading and
writing password states would limit an ASIC to 192*1024 password
updates per second, which is pathetic.

IMO, adding 1MiB-ish of password hashing state completely defeats
distributed ROM attacks.

Making this change to EARWORM is trivial.  It only lightly complicates
the reference version.

Summary
- -------

I have an unusual request for the judges in regards to EARWORM:

Please force the author to fix EARWORM, by increasing the password
hashing state size.

I would be disappointed to see EARWORM not make it into the second
round.  You shouldn't drop the leader in any major category at this
point, IMO.  However, the author is resisting fixing EARWORM due to
his respect for Alexander.  With the fix, he may have the best sever
authentication algorithm in the competition.  Without the fix, it is
dangerously insecure and needs no further consideration.

So, please just make him do the right thing :-)

Bill
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUFCK5AAoJEAcQZQdOpZUZoOQP/2v41B1DFRRvZNELGuy0Ke+a
HE25+sLn/Ctr2L+cmAXT1YR1qQTE8FrwstUNtfldiuy8jUhM8fTHl8WQWjpYuumu
Ocsv8HzQoo24mWnB7mYPQreSAFNNbM4u/jQg7pVYpVSK2JCXhBlPUdX6eAtqQWpC
z8FlP+WtwGFpjwCfGca6iP8m7xZEL3JbzuRpjaipnjMrPVNKLeWrkbSQTaWVeIrH
HYg9v95U0hOa5YKJdd0wmsyQJ5ocS7kBSqYyAVa3UWbxDevbONLvNc+sQkXJENv0
r/vHceRjuoarY4otS0urlbpg+jmwkRWXZkHS24K7bcp8R7sF06/OcrosSXAY75WZ
r9YsOxXxYbSIflc8c1v+mRFipBmqhP+2njkb9LAPY9p5wbbIObzUQWn1vKtr7uEi
QPB/WBw1hP6OTXa8F2L75zInqLQV0XGb4Gam87tt8Ap719DCy+dtIAAQl+M+Hu30
lytlu8cJDUiNpePyqCRZ2t4bV1mqyL8btpYGrt1di8xN1uPL2oO0ornJZPG+u9Ye
4ctfxysUl2oMw+OdD2bcDfoDfkz1tzVRX31rwnR27GhmQDTiJr2xkeLU8lhNss/t
AZJkH/L4uugeyAmq7us3ok5kztas9TotS4/wg+eIbmktwzBryQtEBc8y2wzp2i2h
W2oRRCGbRvNU94tUXH7V
=i6M4
-----END PGP SIGNATURE-----