phc-discussions - Specification of a modular crypt format (2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150928111244.GA27416@bolet.org>
Date: Mon, 28 Sep 2015 13:12:44 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Specification of a modular crypt format (2)

Hello all,

here are my revised specification for the modular crypt format, and
example code. Here are the changes since the last version:

** While producers must still follow the unique, deterministic encoding,
   consumers are officially allowed not to verify that the string is
   correct. This makes decoding simpler (e.g. no need to check for extra
   leading zeros in decimal integers) but increases the risk of
   dissemination of local variants. This also has some consequences
   which are worth mentioning in the Unix crypt() API; I have added a
   section about it.

** I switched back to "normal" Base64, albeit without the '=' padding
   signs. For C implementations, the choice between standard Base64, and
   the crypt() variant, should be neutral since the standard library does
   not offer any implementation of either; in any case, it is easily
   reimplemented, as the example code demonstrates. For other languages
   (e.g. C# or PHP), using standard Base64 may allow reusing the
   facilities of the language framework.
  
   I still mandate suppression of the '=' padding signs, because
   otherwise they may make some encodings ambiguous in the presence of
   optional parameters. For most languages, removing the '=' signs upon
   encoding is a simple one-liner (e.g. '.Replace("=", "")' in C#).

** Functions are now supposed to specify the expected order of parameters
   instead of following a single, universal rule. (It turned out that
   nobody likes lexicographic ordering.)

** Parameters can be optional. The function specification must tell which
   parameters are optional and which are not; for optional parameters,
   they MUST specify the default value, and an optional parameter MUST
   be omitted if its value is equal to the default.

   These rules preserve determinism while allowing optional features
   without overly increasing the hash string length. In Argon2i, I
   define the m, t and p parameters to be mandatory, and the keyid and
   data parameters to be optional.

** Output length can vary. The function must define a default length, to
   be used unless a very good reason not to do so applies. A sane
   minimum is still enforced. The output length is NOT made a parameter
   since this would be redundant with the output itself, and also
   because it really should not be parameterized in normal conditions.

** The parameters for Argon2i have been expanded with the keyid
   (identifier for the key, optional) and the data (associated data,
   optional).

** The example code has been aligned to the new specification. I also
   added an explicit license (actually, public domain dedication under
   Creative Commons "CC0" -- this seems to be the closest to public
   domain I can achieve without dying).

-----------------------------------------------------------------------

I have read all the comments made on this list, and I thank everybody
for such comments; I tried to include most of them, but, unavoidably,
choices had to be made, because it seems to be a mathematical
impossibility to comply to all these conflicting requirements.

Notably, Alexander argued for a much more compact format, at the expense
of readability (depending on who is doing the reading, of course). In
the specification draft, I chose to concentrate on readability rather
than compactness. This does not mean that a compact string is not a
worthwhile goal; only that I cannot achieve readability and compactness
with a single specification. Maybe another, extra specification should
be written for the "compactness" case.


An important point to be made is that the most extreme compactness can
be achieved through function-specific rules. For instance, consider
encoding of a "time cost". This is an integer, and Alexander has done
some nice research on how to generically encode an integer into a
sequence of characters. However, a time cost does not need to be able to
be any integer from 1 to 2^32-1. Consider bcrypt: its number of
iterations is intrinsically restricted to be equal to 2^x for some
integer x. This is very artificial (there is no structural requirement
for the iterations to be a power of two; this is a simple loop counter)
but it allows for encoding the time cost with a single, short integer.

Forcing the time cost to be a power of 2 may be a bit inflexible (though
I do not hear people clamouring about it when they use bcrypt). A
slightly more configurable option is to define the time cost to be
equal to a*2^b where a = 2 or 3, and b is an integer. If we want to
encode such a cost over _one_ character, convert the character to a
6-bit value x, and say that: t = (2 + (x & 1)) << (x >> 1)
This allows 64 possible time costs, scaled from 2 to 3*2^31.

The time cost here is just an example. My point is that very compact
encodings can be obtained by looking at the semantics of the parameters,
which are, by definition, specific to each function. Generic rules on
integer encoding, however smart they are, won't yield the maximum
possible compactness.


With the rules in the specification draft here enclosed, a "typical"
Argon2i hash string would look something like this:

   $argon2i$m=1024,t=50000,p=4$t003K73k/Bomtg8iHN/K4w$KH64roXLeU8kXGNeZXchGBcpmrJXT6NB1fw82bMs5Pk

i.e. a total of 94 characters in that example. Most notably, systems
that support the "SHA-512 crypt" variant (from glibc) must accept
strings of length up to 106 characters ("$6$" header, salt up to 16
characters, an extra "$", then 86 characters of Base64);
application-allocated buffers should thus already be up to the task. If
we want Argon2i strings to be still smaller, then I would be tempted to
reduce the hash output length (128 bits would already be a lot of
overkill for password authentication; 256 bits seem just wasteful).


	--Thomas Pornin

View attachment "phc-mcf-spec.txt" of type "text/plain" (14342 bytes)

View attachment "phc-mcf-parse.c" of type "text/x-csrc" (15246 bytes)