lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db8161512f33468981cbc49e71b7bf05@AcuMS.aculab.com>
Date:   Mon, 20 Jan 2020 16:51:34 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Al Viro' <viro@...iv.linux.org.uk>
CC:     'Pali Rohár' <pali.rohar@...il.com>,
        OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        "Namjae Jeon" <linkinjeon@...il.com>,
        Gabriel Krisman Bertazi <krisman@...labora.com>
Subject: RE: vfat: Broken case-insensitive support for UTF-8

From: Al Viro
> Sent: 20 January 2020 16:12
> > From: Pali Rohár
> > > Sent: 20 January 2020 15:20
> > ...
> > > This is not possible. There is 1:1 mapping between UTF-8 sequence and
> > > Unicode code point. wchar_t in kernel represent either one Unicode code
> > > point (limited up to U+FFFF in NLS framework functions) or 2bytes in
> > > UTF-16 sequence (only in utf8s_to_utf16s() and utf16s_to_utf8s()
> > > functions).
> >
> > Unfortunately there is neither a 1:1 mapping of all possible byte sequences
> > to wchar_t (or unicode code points), nor a 1:1 mapping of all possible
> > wchar_t values to UTF-8.
> > Really both need to be defined - even for otherwise 'invalid' sequences.
> 
> Who.  Cares?
> 
> Filename is a sequence of octets, not codepoints.  Its interpretation is
> entirely up to the userland.

For filesystems that really ought to be true.
Saves a lot of problems in the kernel.

I guess the fat driver has to do something to convert the UCS-16 on-disk filenames
to/from a sequence of octets.

Even Microsoft have made it much easier to have case-dependant
NTS4 filesystems in windows 10.
(Ever watched the number of different cases in the list of c:/windows/system32/drivers/*.sys
filenames output when windows boots? They are nearly all different!)

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ