lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 27 Nov 2013 15:41:36 -0500
From: Rich Felker <dalias@...ifal.cx>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] CJK character sets

On Wed, Nov 27, 2013 at 08:17:44PM +0000, Marsh Ray wrote:
> http://en.wikipedia.org/wiki/Hanzi
> 
> Certain Asian "CJK" regions use character sets that are quite large.
> For example, in Japan college-entry level literacy includes over
> 2000 characters. PRC China officially uses a 'simplified' character
> set having 3500 and 7000 'frequently' and 'commonly' used
> characters. My understanding is that most of these characters
> represent semantic units similar to words.
> 
> Having fluency in an alphabet orders of magnitude larger than our
> tiny Western alphabets surely changes the password strength problem.
> I would expect that it would make it easier to create and remember
> strong entropy. A short 8-character password in a Western script
> could perhaps be more like a pass phrase in Chinese-based script.
> 
> Is there any available research on the entropy of passwords in
> typical use in these regions?
> 
> Do these regions present any special considerations (other than
> proper encoding) for the PHC?

I'm not aware of any such research, but anecdotally, my experience is
that a large number of East Asian users use 100%-decimal-digit
passwords. I suspect this is due to the risk of ideographic characters
in passwords not being accepted at all, or being misinterpreted due to
different encoding issues (some characters have lookalikes, and
sometimes which lookalike gets used depends on how the local legacy
encoding is converted to Unicode), and perhaps the fact that entering
these characters via an IME exposes them visually during entry.

Anyway, my point is that even if such research exists, it might not be
applicable if a large portion of users are still using all-digit
passwords.

Rich

Powered by blists - more mailing lists