phc-discussions - Re: [PHC] Design Rationale and Security Analysis of PHC candidates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141002120202.GA13311@bolet.org>
Date: Thu, 2 Oct 2014 14:02:02 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Design Rationale and Security Analysis of PHC candidates

On Thu, Oct 02, 2014 at 12:50:31PM +0200, Dmitry Khovratovich wrote:
> The value "Explored" stands for the case when the designers actually try to
> mount a collision/preimage attack on their design and show why it fails. It
> is not the best possible option: ideally, one would provide some sort of
> security proof that reduces the security of the mode of operation (which
> the hashing scheme is) to the security of the underlying primitive, and
> thus get the grade "Verified" or "Proven".

In that case, I'll claim that "proven" status for Makwa.

Without pre-hashing or post-hashing:

  The Makwa core really is Rabin encryption, followed by many squarings.
  Since the Rabin encryption output is a quadratic residue, and squaring
  is a permutation of quadratic residues (when working modulo a Blum
  integer), then it follows that any preimage attack against Makwa is
  also a preimage attack against Rabin encryption (see note 2 below).
  Rabin encryption is the "underlying primitive", still unbroken after
  more than three decades of exposure in the field.

  As for collisions: since the squarings are a permutation, only the
  very first squaring may induce a collision. However, if there is a
  collision, then you get two values x and y such that x^2 = y^2 modulo
  N. Moreover, since the most significant bit of both values are zero,
  it cannot be that x = -y. Then it follows that gcd(x+y,N) reveals a
  non-trivial factor of N. Thus, finding a collision for Makwa (without
  post-hashing) implies factoring N, which is known to be hard.

  Note that the arguments above don't rely on any property of the salt.
  It could be entirely under attacker's control; it would not change
  anything here.

With pre-hashing or post-hashing:

  Makwa is a sequence of three elements: the pre-hash, the squarings,
  and the post-hash. When considering a combination of successive
  functions, it is rather obvious that:
   - any preimage attack on the combination implies a preimage attack
     on the last step;
   - any collision attack on the combination implies a collision
     atack on at least one of the steps.
  We saw above that the middle step of Makwa (the squarings) is
  resistant to both preimages and collisions (regardless of the
  attacker's control on the salt). Thus, Makwa, in all generality, is
  resistant to both preimages and collisions as long as the "hash
  function" used for the pre-hashing and post-hashing step is also
  resistant to collisions (note that the salt is not used for
  pre-hashing and post-hashing).

  However, that hash function is HMAC_DRBG, a well-known primitive which
  has been proven to be "perfect" (indistinguishable from a random
  oracle) as long as the underlying HMAC lives up to its promises (the
  references are in the Makwa spec).

One design principle of Makwa was to rely on existing, "proven"
primitives (at least proven through years of exposure, though any kind
of reduction proof is also nice to have).



Note 1: you can still botch things by using post-hashing to get a
deliberately short output; e.g. with 80 bits of output, resistance to
preimages cannot exceed 2^80, and 2^40 for collisions. But mind the
details:

 - This "2^80" is counted in a number of evaluations of the underlying
   function, which is designed to be slow. That does not translate to
   the usual "2^80 simple operations" or "2^80 invocations of MD5".

 - If an attacker is allowed to run 2^80 operations, then he can guess
   the password, because we are hashing passwords, and it would not be
   wise to claim that passwords offer 80 bits of entropy.

 - Collisions are an issue for password hashing only in the context of
   rather esoteric scenarios involving some underspecified extra bug. I
   never found that convincing in any way; for me, resistance to
   collisions is irrelevant to password hashing. However, if you worry
   about collisions, then you can just ask for a twice longer output:
   HMAC_DRBG can produce as many bytes as you wish.

 - This point about output length is completely generic: it applies to
   _all_ password hashing functions.


Note 2: there is still one delicate point about Rabin encryption.
Contrary to RSA, it was never specified with a RFC or a similar
standard. In particular, the "padding" step was not specified. It is
also known that while extracting square roots modulo N is _in general_
equivalent to factoring N, square root extraction for some values can be
easy: namely if x^2 < N (as integers), then the square root of x^2
modulo N is also the square root of the natural integer x^2, i.e. x.
This is a well-known property which also applies to RSA (with cubic
roots for RSA with e=3). This is the reason why the PKCS#1 padding,
defined for RSA, ensures that there are some non-zero bits in the most
significant positions. The padding used in Makwa is very similar to the
PKCS#1 padding, for pretty much the same reasons. In that sense, Makwa
is as strong as Rabin encryption would be, if Rabin encryption was
specified with PKCS#1 padding.

To be more formal, I _define_ Rabin encryption to consist in the following:

 - Pad input message m to a sequence x of the same length of N:

    x = 0x00 || rrrrrr || m || len(m)

   where len(m) is the length of m in bytes (expressed over a single byte),
   and "rrrrrr" is a sequence of random bytes of suitable length (so that
   the length of x equals that of N). There are at least 30 random bytes.

 - The encrypted message is x^2 mod N (x is interpreted as an integer with
   the big-endian convention).

I claim that this instanciation of Rabin encryption is a faithful
practical incarnation of "the" Rabin encryption scheme which has been
studied at length by cryptographers. This is my underlying primitive.

Makwa replaces the "random rrrrrr" with the deterministically generated
bytes produced by HMAC_DRBG, but this is equivalent (from an attacker's
point of view) because HMAC_DRBG is indistinguishable from a random
oracle (see above).


(I agree that this analysis is not well explained in the Makwa
specification.)


	--Thomas Pornin