phc-discussions - Re: [PHC] PHC output specifics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150306014015.GA3445@bolet.org>
Date: Fri, 6 Mar 2015 02:40:15 +0100
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] PHC output specifics

On Thu, Mar 05, 2015 at 06:33:22PM +0000, Gregory Maxwell wrote:
> But it should also be specified that UTF-8 is used.

UTF-8 is only part of it. For instance, a glyph like "é" (a common
letter in French) admits two distinct representations in UTF-8:
C3 A9, and 65 CC 81. Neither is more correct than the other; but
they will yield distinct hash values.

To be unambiguous, one has to also specify the normalization. Unicode
defines NFC, NFD, NFKC and NFKD. In general, NFC is what you get from
existing user interfaces, so standardizing on NFC is probably the best
idea. But that is debatable (and debated).

See:
http://unicode.org/reports/tr15/
http://unicode.org/faq/normalization.html
http://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html

	--Thomas Pornin