linux-kernel - Re: vfat: Broken case-insensitive support for UTF-8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200119230809.GW8904@ZenIV.linux.org.uk>
Date:   Sun, 19 Jan 2020 23:08:09 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Pali Rohár <pali.rohar@...il.com>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        "Theodore Y. Ts'o" <tytso@....edu>,
        OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
        Namjae Jeon <linkinjeon@...il.com>,
        Gabriel Krisman Bertazi <krisman@...labora.com>
Subject: Re: vfat: Broken case-insensitive support for UTF-8

On Sun, Jan 19, 2020 at 11:14:55PM +0100, Pali Rohár wrote:

> So when UTF-8 on VFS for VFAT is enabled, then for VFS <--> VFAT
> conversion are used utf16s_to_utf8s() and utf8s_to_utf16s() functions.
> But in fat_name_match(), vfat_hashi() and vfat_cmpi() functions is used
> NLS table (default iso8859-1) with nls_strnicmp() and nls_tolower().
> 
> Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are
> broken for vfat in UTF-8 mode.
> 
> I was thinking how to fix it, and the only possible way is to write a
> uni_tolower() function which takes one Unicode code point and returns
> lowercase of input's Unicode code point. We cannot do any Unicode
> normalization as VFAT specification does not say anything about it and
> MS reference fastfat.sys implementation does not do it neither.

Then how can that possibly be broken?  If it matches the native behaviour,
that's it.

> As you can see lowercase 'd' and uppercase 'D' are same, but lowercase
> 'č' and uppercase 'Č' are not same. This is because 'č' is two bytes
> 0xc4 0x8d sequence and comparing is done by Latin1 table. 0xc4 is in
> Latin 'Ä' which is already in uppercase. 0x8d is control char so is not
> changed by tolower/toupper function.

Again, who the hell cares?  Does the behaviour match how Windows handles
that thing?  "Case" is not something well-defined; the only definition
is "whatever weird crap does the native implementation choose to do".
That's the only reason to support that garbage at all...