lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <BY2PR03MB554FA8D99ABAF59026A5EF6A71C0@BY2PR03MB554.namprd03.prod.outlook.com> Date: Fri, 6 Mar 2015 21:26:38 +0000 From: Marsh Ray <maray@...rosoft.com> To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net> Subject: RE: [PHC] PHC output specifics From: Thomas Pornin [mailto:pornin@...et.org] > To be unambiguous, one has to also specify the normalization. Unicode defines NFC, NFD, NFKC > and NFKD. In general, NFC is what you get from existing user interfaces, so standardizing on > NFC is probably the best idea. But that is debatable (and debated). Yeah, Unicode greatly complicates secure string comparisons. I once printed out the algorithm for doing "case insensitive string comparison" correctly. It was literally like 40 pages of rules. Since that experience, I've been a fan of the way RFC 2865 [RADIUS] handles password encoding issues, without getting too bogged down in the details of Unicode. http://tools.ietf.org/html/rfc2865 : Text contains UTF-8 encoded 10646 characters and String contains 8-bit binary data. Servers and servers and clients MUST be able to deal with embedded nulls. ... It is recommended that the message contain UTF-8 encoded 10646 [7] characters. I figure any PHC recommendation would be at https://tools.ietf.org/html/rfc2119 'SHOULD' level. Does anyone object to wording like this? "For best interoperability of credentials, character data SHOULD be a UTF-8 encoded sequence of ISO 10646 characters." - Marsh -----Original Message----- From: Thomas Pornin [mailto:pornin@...et.org] Sent: Thursday, March 5, 2015 5:40 PM To: discussions@...sword-hashing.net Subject: Re: [PHC] PHC output specifics On Thu, Mar 05, 2015 at 06:33:22PM +0000, Gregory Maxwell wrote: > But it should also be specified that UTF-8 is used. UTF-8 is only part of it. For instance, a glyph like "é" (a common letter in French) admits two distinct representations in UTF-8: C3 A9, and 65 CC 81. Neither is more correct than the other; but they will yield distinct hash values. To be unambiguous, one has to also specify the normalization. Unicode defines NFC, NFD, NFKC and NFKD. In general, NFC is what you get from existing user interfaces, so standardizing on NFC is probably the best idea. But that is debatable (and debated). See: http://unicode.org/reports/tr15/ http://unicode.org/faq/normalization.html http://www.win.tue.nl/~aeb/linux/uc/nfc_vs_nfd.html --Thomas Pornin
Powered by blists - more mailing lists