phc-discussions - Re: [PHC] Best use of ROM in password hashing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLP8p5Zo1xV65dEVSJA9OOVFAwZobHM96dV2UuP4vyvdbzxuA@mail.gmail.com>
Date: Mon, 21 Apr 2014 06:50:46 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Best use of ROM in password hashing

On Mon, Apr 21, 2014 at 5:07 AM, Solar Designer <solar@...nwall.com> wrote:

> On Sun, Apr 20, 2014 at 11:22:59AM -0400, Bill Cox wrote:
> > Using large ROMs for authentication servers is brilliant.  It is pretty
> > hard to steal a TiB or more without being noticed, and if an attacker
> does
> > not have a copy of the ROM, he's stopped cold.
>
> This is only one of the aspects of the original ROM idea.  Besides
> stealing the ROM, which may happen, there's the issue of otherwise
> suitable attack nodes possibly not having enough of fast enough storage
> locally to them to use their own copies of the ROM and the issue of
> distributing the ROM to attack nodes, or the issue of having attack
> nodes access a remote ROM.
>
> yescrypt's use of RAM+ROM at once is meant in part to increase cost of
> accessing a higher latency remote ROM, by tying up RAM (or some of
> whatever local read-write storage the attack nodes have) for almost each
> pending request to the remote ROM.  (I say "almost" because the first
> ROM lookup per hash computation happens before there's much RAM usage,
> unless ROM lookups in SMix1 are disabled by setting the mask that way.)
>
> ROM-in-RAM (such as the 112 GiB I was testing with) can be viewed as an
> anti-botnet feature.  It's not that large that it won't be stolen, but
> it can be large enough to buy a few years of botnet resistance.  It's
> practical to keep authentication servers' current ROM-in-RAM (for newly
> set/changed passwords) 16+ times higher than a typical botnet node's RAM
> size - e.g., it can be 128 GiB in server vs. 16 GiB or less in typical
> botnet nodes now.  (I don't know what their actual RAM sizes are now.
> I guess typical is actually below 16 GiB at this time, although newly
> added computers probably do have 16 or 32 GiB of RAM now.)
>
> ROM-on-SSD can be similar, or its size can be expanded much further in
> attempt to also make it hard to steal.  Jeremy uses 16 TB, which might
> currently work in terms of detection of attempts to steal it over the
> network, if bandwidth usage monitoring is in place.  Stealing a copy of
> a ROM physically is another matter.


Got it.  With my scheme above, a bot-net could still be used because a
central server could do the quick ROM hash and distribute the memory-hard
hashing to remote nodes.  Doing constant access throughout hashing forces
it to be one one machine, or a cluster with very high bandwidth connections.

> The problem with large ROMs is that an attacker can split them into many
> > small ROMs.  If the data routing problem this creates is cheap enough in
> > time and cost, this can be a win for an attacker.
>
> Sure.


I think people designing ROM hashing into their algorithms should keep this
in mind.  I just re-checked smix in yescrypt, and you're doing it right for
defending against attackers splitting the ROM.  You want to do password
based pseudo-random reads (predicable reads invite cheap parallel attacks)
from ROM on the same small blocks you use for hashing RAM, and you want to
hash the ROM data into a state that is as large as the ROM blocks you're
reading.  If you hash a lot of ROM data into a small state before moving on
to another portion of the ROM, that benefits an attacker who splits the ROM
into multiple nodes, because he only has to route the state between nodes
to update it from ROM data.  By hashing ROM data into a state the same size
as the ROM block read, you insure that an attacker splitting the ROM has a
major data routing problem, one that dominates over the ROM access time
problem.  Even so, it may benefit an attacker to split the ROM, if the ROM
is a dominant cost in the system.  If I split it into 256 nodes, and the
routing problem slows me down by 16X per node, that can still be a big win.

Of the algorithms that support ROM, so far I only see yescrypt getting
these details right.  Had I added ROM support to TwoCats, I would have
probably goofed it up.


> > Because of this, I
> > prefer bandwidth runtime hardening with RAM, where we fill the RAM and
> read
> > from it pseudo-randomly, based on the password.
>
> While I also prefer simultaneous use of a RAM and a ROM, I don't see how
> it's closely related to possible splitting of the ROM by attackers.
>
> My reasons to also use a RAM:
>
> 1. Have scrypt-like lower bound on area-time.  Don't depend solely on
> ROM bandwidth for it (which is not clearly translated to die area).
>
> 2. Have area-time cost increase along with ROM access latency, via tying
> up more RAM instances.  This discourages remote/networked ROMs.
>
> Were you possibly referring to an instance of #2, where an attack node
> wouldn't have its own copy of the full ROM and would have to incur
> latency of accessing another node's portion of the ROM?


Not quite.  The attack I was referring to routes the password hashing state
between nodes, rather than the ROM data.  The way you're hashing ROM data
in yescrypt makes it cheaper to route the ROM data between nodes rather
than the state data, so that's how I would do it if attacking yescrypt.  I
might use an Achronix FPGA for routing the ROM data.  If I'm not mistaken,
they can do something like 10 giga-bit per pin, and while they have many
clock cycles of latency, the throughput is just sick.


> > Still, the benefits of a large ROM are undeniable.  What if we wrapped a
> > memory-hard PHS with something like:
> >
> > 1) Start reading random location on N TiB hard-disk based on PRK =
> > H(passwordLen || password || saltLen || salt)?
> > 2) In parallel, do some bandwidth limited memory-hard hashing to produce
> > PRK2 = MemoryHash(PRK)
> > 3) return H(PRK2 || ROM[PRK % ROMLEN])
> >
> > This still requires that an attacker have the ROM, and allows us to use
> > cheap hard-disks for storage of the ROM data.  Perhaps an attacker could
> > buy a big SSD and squash the access time, and share the SSD among many
> > guessing cores, but who cares?  The algorithm is still memory bandwidth
> > hardened, which seems stronger than ROM bandwidth hardening.
>
> Moreover, it can also be computation time hardened, tying up the RAM
> even if it's higher bandwidth RAM than the defender had used.  Moving
> from defender's HDDs to attacker's SSDs is still a win for the attacker,
> because they'd be able to speed things up by providing more computation
> resources, RAM size, and RAM bandwidth accordingly.  While staying with
> HDDs, they would not be able to put those extra resources to use.


I think there may be value in adding ROM to algorithms that don't natively
support it, but it's not as solid of a defense as what yescrypt does for
the reasons you listed.


> > How would that compare against a PHS that intermixes hashing ROM data and
> > RAM data?
>
> yescrypt lets you set the mask such that the ROM access is one-shot, and
> that one access happens early on (or you can have it happen half way
> through computation, with a different mask setting).  However, further
> computation in yescrypt depends on the looked up value, so it stalls
> until the ROM access is complete.  Compared to your algorithm above,
> this means that the defender's latency per hash computation may be up to
> twice higher, and the defender will need more concurrent instances to
> compensate for the stalls.  In terms of throughput, which is what
> matters most, it'd be the same as yours.
>
> I think it's better to include multiple ROM lookups per hash computation
> if at all possible.  This discourages attack nodes that have only a
> portion of the ROM available to them locally.  The defender's cost is in
> that the defender will need to use SSDs or RAM, rather than HDDs.  As to
> stalls of defenders' hash computations, I think they're acceptable,
> since the typical target for throughput on this sort of authentication
> servers is high enough that the latency is low in terms of user
> perception even with latency of multiple ROM-on-SSD lookups included.
> (The number of lookups should be adjusted such that the target
> throughput is achieved.  In yescrypt, it's controlled via the mask.)
> The latency is somewhat unfortunate in terms of request queue size
> growth on other servers, but even with full overlap of ROM lookups and
> computation inside each hash computation the latency would be at best
> twice lower, so there's not that much room for improvement here.
>
> Alexander


Got it.  Thanks for the explanations.

Bill

Content of type "text/html" skipped