phc-discussions - Re: [PHC] Interest in specification of modular crypt format

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20151011012815.GA8938@openwall.com>
Date: Sun, 11 Oct 2015 04:28:15 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Interest in specification of modular crypt format

On Sun, Sep 27, 2015 at 03:55:52PM +0300, Alexander Cherepanov wrote:
> On 2015-09-25 04:32, Solar Designer wrote:
> >Is this unjustified complexity or is it a good way to achieve several
> >goals at once (very compact encoding, no two ways to encode a number,
> >use of only the B64 alphabet with no need for an "end of number" char,
> >no way to hit 2^31)?
> 
> The problem with this approach is that the encoding is closely tied to 
> the range of allowed values. Argon2 spec says that the range for p is 
> 1..255 while m and t can go upto 2^32-1. Should they use the same 
> encoding? Should it be the same as for (ye)scrypt?

Yeah, having the encoding tied to the range of allowed values is both
good and bad.  As an option, Argon2 specifically could choose to limit
its parameters to 2^30.  (No, I am not suggesting this currently.)

> Relatively widely used encoding for similar cases is Variable-length 
> quantity[1]. There is even "Base64 VLQ" in Source Map[2] but it encodes 
> a sign in a least significant bit which could inconvenient if we want to 
> encode unsigned integers only.
> 
> [1] https://en.wikipedia.org/wiki/Variable-length_quantity
> [2] 
> https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k

Thank you!  Besides the sign bit issue, this existing "Base 64 VLQ"
appears to allow for trailing chars with all-zero data bytes, which
wouldn't alter the value (since it's a little-endian encoding).
So it does not force deterministic encoding.

"Base 64 VLQ
        The VLQ is a Base64 value, where the most significant bit (the
6th bit) is used as the continuation bit, and the "digits" are encoded
into the string least significant first, and where the least significant
bit of the first digit is used as the sign bit.  Note: The values that
can be represent by the VLQ Base64 encoded are limited to 32 bit
quantities until some use case for larger values is presented."

I just found this:

"BASE64 VLQ CODEC (COder/DECoder) AND SOURCEMAP V3 MAPPINGS PARSER"
http://murzwin.com/base64vlq.html

and experimented with it.  There are many ways to encode 0:

A
B
gA
ggA
gggA

as well as non-zero numbers, e.g. all of these decode to 16:

gB
ghA
ghgA

and all of these to 17:

iB
ihA
ihgA

Also, there exist invalid encodings that a robust parser will need to
reject.  "Invalid VLQ64 encoding, continuation bit set at last character
while decoding" as that JavaScript parser says e.g. for "g" and "gg".

So I am not happy about this specific encoding, although it's something
we could use if we were to use the combination of RFC Base64 charset and
compact encoding for numbers, which as it appears we're not going to.

Alexander