linux-kernel - RE: [PATCH 1/4] exfat: Simplify exfat_utf8_d

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <TY1PR01MB1578D63C6F303DE805D75DAA90C20@TY1PR01MB1578.jpnprd01.prod.outlook.com>
Date:   Mon, 6 Apr 2020 09:37:38 +0000
From:   "Kohada.Tetsuhiro@...MitsubishiElectric.co.jp" 
        <Kohada.Tetsuhiro@...MitsubishiElectric.co.jp>
To:     'Pali Rohár' <pali@...nel.org>
CC:     "'linux-fsdevel@...r.kernel.org'" <linux-fsdevel@...r.kernel.org>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        "'namjae.jeon@...sung.com'" <namjae.jeon@...sung.com>,
        "'sj1557.seo@...sung.com'" <sj1557.seo@...sung.com>,
        "'viro@...iv.linux.org.uk'" <viro@...iv.linux.org.uk>
Subject: RE: [PATCH 1/4] exfat: Simplify exfat_utf8_d_hash() for code points
 above U+FFFF

> > If you want to get an unbiased hash value by specifying an 8 or 16-bit
> > value,
> 
> Hello! In exfat we have sequence of 21-bit values (not 8, not 16).

hash_32() generates a less-biased hash, even for 21-bit characters.

The hash of partial_name_hash() for the filename with the following character is ...
 - 21-bit(surrogate pair): the upper 3-bits of hash tend to be 0.
 - 16-bit(mostly CJKV): the upper 8-bits of hash tend to be 0.
 - 8-bit(mostly latin): the upper 16-bits of hash tend to be 0.

I think the more frequently used latin/CJKV characters are more important
when considering the hash efficiency of surrogate pair characters.

The hash of partial_name_hash() for 8/16-bit characters is also biased.
However, it works well.

Surrogate pair characters are used less frequently, and the hash of 
partial_name_hash() has less bias than for 8/16 bit characters.

So I think there is no problem with your patch.


> Did you mean hash_32() function from linux/hash.h?

Oops. I forgot '_'.
hash_32() is correct.


---
Kohada Tetsuhiro <Kohada.Tetsuhiro@...MitsubishiElectric.co.jp>