phc-discussions - What is "basic cryptography" ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141012142443.GA26315@bolet.org>
Date: Sun, 12 Oct 2014 16:24:43 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: What is "basic cryptography" ?

Hello everybody,

in his survey, Dmitry Khovratovich includes a property called "basic
cryptography" and defined succinctly as:

  "Basic cryptography — collision/preimage resistance, unpredictability."

While I can intuitively have a rough idea of what this means, I struggle
to write it down as a mathematically precise definition. My goal is to
have sufficiently clean definitions so that I could set out to "prove"
security of some functions (in my case Makwa, but this may be applicable
to other candidates as well) in the sense that any successful attack
would have to either break a known underlying hard problem, _or_ to be a
computational shortcut such as a TMTO or something like that. In
informal terms, I want to separate the "cryptography" from the
"engineering".


For collision resistance, I can have something satisfactory: for a
password hashing function h(), that takes as inputs a salt s and
password p, collision resistance is about the computational
infeasibility of finding (s,p,p') where p <> p' and h(s, p) = h(s, p').
Note that I am talking about infeasibility, regardless of the space of
plausible or potential passwords. While collisions can help an attacker
only within rather esoteric usage scenarios involving active misuse of
the function by the defender, it is still a nice thing to have.

In particular, collision resistance implies second-preimage resistance,
and THAT is good for security proofs, in the following sense: if the
attacker is given s and h(s, p), and is challenged with finding a
password p' such that h(s, p') = h(s, p) (that is the normal setup when
considering an authentication server), then second-preimage resistance
means that if the attacker succeeds, then he found p' = p; otherwise,
that would contradict second-preimage resistance. So far so good: if
second-preimage resistance can be proven (and proving collision
resistance implies proving second-preimage resistance), then we have
shown that if an attacker finds a matching password, then he found _the_
password, meaning that no password entropy was lost in the process: the
space of potential passwords, already quite small (that's the problem
with passwords), is not unduly shrunk through some misbehaving of the
function.


It is with preimages (not second-preimages) that I have a problem. The
usual definition for a cryptographic hash function is the following:

   Given a hash function f() with inputs in space X and outputs in space
   Y, f() is said to be preimage resistant if it is not feasible, given
   y in Y, to find x in X such that f(x) = y with a computational
   average cost lower than #Y times the cost of evaluating f().

It is important to note that preimages and second-preimages are very
different things. In the second-preimage setup, there is no secret; in
the preimage setup, the target output is a given and there is no
information, indeed no condition on how it was generated.

With normal hash functions such as SHA-1, the output space is "all
possible 160-bit blocks" and, crucially, the space of possible inputs is
much larger (even if we restrict ourselves to inputs that imply only
minimal evaluation cost, we have 2^448-1 possible inputs, i.e. all
sequences of 0 to 447 bits). Under these conditions, it is rather easy
to define preimage resistance with a target output taken randomly and
uniformly in that well-defined output space.

With password hashing, things are not as easy. The space of possible
inputs will be a lot smaller, and quite smaller than the apparent size
of the output space. For instance, a password hashing function may
produce a 128-bit value, but since there are much fewer than 2^128
possible passwords (realistically), then not all 128-bit outputs are
actually possible. In any case, brute force on the input is assumed
to work (the input is a password). So I could try something like this:

   Password-hashing function h() will be said to be preimage resistant
   if it is not feasible, given s and h(s, p) for an unknown p chosen
   randomly and uniformly in space P, to find p' such that
   h(s, p) = h(s, p') with average cost less than #P/2 the cost of
   evaluating h().

I am not satisfied with that definition for the two following reasons:

 - This depends on the space of potential passwords. For the definition
   to be useful, it must hold for all spaces of a given size, and not
   just "on average". That is, the "average cost of #P/2 evaluations"
   must be understood as "over all p in P for a single, given P". I
   fear that such a definition may be defeated by anomalous spaces of
   passwords that are not realistic in any way.

 - This definition includes the cost of evaluating h(), i.e. it
   postulates the lack of any shortcut. Thus, it fails at separating
   cryptography from engineering.

So far my best attempt at producing a clean, mathematical definition of
preimage resistance for a password hashing function is the following:

   For a given space P of inputs, the password-hashing function h() will
   be said to be preimage-resistant over P if, given s and h(s, p), the
   average cost (for p chosen randomly and uniformly in P) of finding p'
   such that h(s, p) = h(s, p') is not less than the computational cost
   of evaluating h(s, t) for all t in T, for any subset T of P such that
   #T >= #P/2.

   h() will be said to be preimage-resistant up to "k bits" if it is
   preimage-resistant over all possible spaces P of size less than 2^k.

The intuition behind this definition is that h() must be such that
cracking a password must involve obtaining the hashes for at least half
of the potential passwords. If that can be proven, then it will imply
that any successful attack on the function MUST be a "computational
shortcut" that allows computing, say, 30 hash values for the cost of 25,
or something like that.

Now that is quite clunky and I don't really like it.

Does anybody here has an idea on the subject ? Or maybe a reference to
some existing published work that already covers these questions ?


	--Thomas Pornin