phc-discussions - RE: [PHC] PHC output specifics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BY2PR03MB55431D3D75850BE960B070FA71C0@BY2PR03MB554.namprd03.prod.outlook.com>
Date: Fri, 6 Mar 2015 22:46:26 +0000
From: Marsh Ray <maray@...rosoft.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: RE: [PHC] PHC output specifics

From: Jeffrey Goldberg [mailto:jeffrey@...dmark.org] 
> And if this works, I think that Thomas is correct. Some normalization will be needed.
...
> I've only just started to look at the normalization schemes, but
> I do think that the PHC is going to have to recommend one.

I agree you both are correct in that without Unicode normalization the encoding is underspecified. But I guarantee you that no developer is going to read the PHC spec and say "Oh, I should go implement Unicode normalization now".

What will happen instead is our encoding recommendation will be ignored altogether. Many developers in US-AU-NZ will say "just use ASCII" without realizing that doesn't actually mean anything in practice. (I used to have on my desk a book 1 inch 2.54 cm thick with variations on ASCII-based code pages and encoding schemes.) Worst of all, yet another generation of web developers and users around the world will grow up with avoidable limitations on their password character sets because of bad interoperability.

My guess is there are only a few development teams in the world that are invested into Unicode deep enough that they are willing to put normalization in their product and those teams probably don't need advice from us how to do it. Yes, there are open source libraries for this, but this means even more code handling these secret in memory. I doubt any Unicode libraries implement normalization in a side channel resistant manner.

So how about this wording:

	"For best interoperability of credentials, character data
	SHOULD be a UTF-8 encoded sequence of [cite: ISO 10646] characters.
	[cite: Unicode] aware applications that wish to perform normalization
	SHOULD normalize to [normalization form TBD] before UTF-8 encoding."

This is just my gut feeling, based on experience, about where PHC should draw the line in good taste.

- Marsh