phc-discussions - Re: [PHC] Interest in specification of modular crypt format

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150925013201.GA2518@openwall.com>
Date: Fri, 25 Sep 2015 04:32:01 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Interest in specification of modular crypt format

On Mon, Sep 07, 2015 at 01:55:30PM +0300, Solar Designer wrote:
> If we continue to build upon the traditional crypt(3) alphabet and
> encoding (like the above specification for scrypt does), then I'd also
> like to include a way to represent the numeric parameters in a more
> compact form: let the little-endian encodings be truncated when the
> values are small.  So e.g. instead of:
> 
> $7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
> 
> it would be valid to use:
> 
> $7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
> 
> or if we want to keep the number of '$' characters fixed, then either
> require the '$' delimiters to always be present or maybe introduce '+'
> or '=' for truncation (a character already used in the other base64
> encoding's alphabet, so likely safe to use):
> 
> $7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
> 
> The savings for classic scrypt are small, but with yescrypt's additional
> parameters they are more significant.

BTW, an even more compact encoding of numeric parameters is possible,
through having the first B64 character of every number encode whether
and how many additional characters are expected for that number.  Small
numbers e.g. up to 31 or 47 would be encoded with just one B64 character
and no other character needed to indicate this short encoding length.
This would be similar to how UTF-8 works, with 7-bit ASCII only needing
one octet per character.  And we would get similar tradeoffs, too. ;-(
We would need to decide whether overlong encodings would be possible
(and their use would then be forbidden in the spec, much like Thomas'
current spec forbids leading zeroes in decimal numbers) or whether there
would be only one way to encode a given number without us having to
explicitly forbid anything.  We can do the latter by starting each of
the larger sub-ranges right where the previous sub-range ended:

For example, by value of first B64 char:

 0 to 47 - 1 char, range 0 to 47
48 to 55 - 2 chars, range 48 to 559 (using 3+6 bits of the two chars)
56 to 59 - 3 chars, range 560 to 16943 (2+12 bits)
60 to 61 - 4 chars, range 16944 to 541231 (1+18 bits)
62       - 5 chars, range 541232 to 17318447 (0+24 bits)
63       - 6 chars, range 17318448 to 1091060271 (0+30 bits)

This unnecessarily(?) allows going 1.6% higher than 2^30, which would
have been the desired maximum for (ye)scrypt's numeric parameters, but
on the bright side the non-overlapping ranges result in there being only
1 way to encode a given number, and some encodings are shorter (e.g. for
512 to 559, which would otherwise require 3 chars).  If a wider range
needs to be encoded, we can limit the 1-char range e.g. to 0 to 31 (and
proceed to up to 7 chars and 36+ bits), but I actually wanted to avoid
the risk of ever hitting 2^31 here (would be risky if implemented with
signed int somewhere, e.g. in a scripting language that uses signed
32-bit int behind the scenes).

Is this unjustified complexity or is it a good way to achieve several
goals at once (very compact encoding, no two ways to encode a number,
use of only the B64 alphabet with no need for an "end of number" char,
no way to hit 2^31)?

Is there an existing encoding like this that we could reuse, or would we
have to invent our own like I have above?

With this encoding, my example above could become simply:

$7X$C6/SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

because both '6' (for r=8) and '/' (for p=1) are small enough that they
fall into the 0 to 47 range needing only 1 character.  (I put an "X"
here after the "$7" to indicate that we'd need to somehow alter the
prefix if we implement this for scrypt in particular, since it wouldn't
otherwise be detectable.  And we'd like change other things at once.
By "X" I mean eXperimental - just a placeholder I use until we've
decided on an encoding.)

Of course, we can also introduce a '$' before the actual salt value
artificially if we want to, such as to allow a trivial parser (without
decoding of the numeric parameters) to separate the parameters from the
salt value.  Then we'd have something like:

$7X$C6/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

Also, if we like, even the initial log2(N), which was 1 char already,
could be treated as encoded using this scheme, for consistency.  For
practical purposes, it'd remain 1 char anyway (since 2^47 is a lot), and
values 48 or higher would be invalid anyway (2^48 is way too high) no
matter which encoding.

Alexander