lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200107173842.ciskn4ahuhiklycm@pali>
Date:   Tue, 7 Jan 2020 18:38:42 +0100
From:   Pali Rohár <pali.rohar@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-ntfs-dev@...ts.sourceforge.net, linux-cifs@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Luis de Bethencourt <luisbg@...nel.org>,
        Salah Triki <salah.triki@...il.com>,
        Steve French <sfrench@...ba.org>,
        OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        David Sterba <dsterba@...e.com>,
        Dave Kleikamp <shaggy@...nel.org>,
        Anton Altaparmakov <anton@...era.com>,
        Jan Kara <jack@...e.com>, "Theodore Y. Ts'o" <tytso@....edu>,
        Eric Sandeen <sandeen@...hat.com>,
        Namjae Jeon <linkinjeon@...il.com>,
        Pavel Machek <pavel@....cz>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: Unification of filesystem encoding options

On Tuesday 07 January 2020 14:32:33 Jan Kara wrote:
> On Thu 02-01-20 22:18:55, Pali Rohár wrote:
> > 1) Unify mount options for specifying charset.
> > 
> > Currently all filesystems except msdos and hfsplus have mount option
> > iocharset=<charset>. hfsplus has nls=<charset> and msdos does not
> > implement re-encoding support. Plus vfat, udf and isofs have broken
> > iocharset=utf8 option (but working utf8 option) And ntfs has deprecated
> > iocharset=<charset> option.
> > 
> > I would suggest following changes for unification:
> > 
> > * Add a new alias iocharset= for hfsplus which would do same as nls=
> > * Make iocharset=utf8 option for vfat, udf and isofs to do same as utf8
> > * Un-deprecate iocharset=<charset> option for ntfs
> > 
> > This would cause that all filesystems would have iocharset=<charset>
> > option which would work for any charset, including iocharset=utf8.
> > And it would fix also broken iocharset=utf8 for vfat, udf and isofs.
> 
> Makes sense to me.

Ok!

> > 2) Add support for Unicode code points above U+FFFF for filesystems
> > befs, hfs, hfsplus, jfs and ntfs, so iocharset=utf8 option would work
> > also with filenames in userspace which would be 4 bytes long UTF-8.
> 
> Also looks good but when doing this, I'd suggest we extend NLS to support
> full UTF-8 rather than implementing it by hand like e.g. we did for UDF.

Current kernel NLS framework API supports upper-case / lower-case
conversion only for single byte encodings. So no case-insensitive
support for UTF-8 encoding. And for Unicode conversion it supports only
UCS-2, therefore code points up to the U+FFFF, so for UTF-8 maximally
3byte long sequences.

This really is not possible to fix without rewriting existing
filesystems which uses NLS API.

One hacky option would be to extend NLS API from UCS-2 to UTF-16 and fix
all users of NLS API to expects UTF-16 surrogate pairs.

But I dislike UTF-16 and rather would use usage of unicode_t (UTF-32)
which is already present in kernel. But because existing filesystems
drivers pass their UCS-2/UTF-16 buffers from FS to NLS API it is not
easy to change whole NLS API from UCS-2 to UTF-32.

And still this change does not add support for case-insensitivity, so
is useless for all MS filesystems (msdos, vfat, ntfs), which is
majority.

Kernel already provides functions for converting between UTF-8 and
UTF-16, so this seems to be the easiest way how to provide full UTF-8
support for filesystems which internally uses UTF-16. Similarly like it
is implemented in UDF.

Moreover all NLS encodings except UTF-8 are single byte encodings and
maps into Plane-0, so can be represented by currently used UCS-2
encoding. Therefore conversion to Unicode works correctly and also their
case-insensitivity functions (or rather tables).

Adding support for case-insensitivity into UTF-8 NLS encoding would mean
to create completely new kernel NLS API (which would support variable
length encodings) and rewrite all NLS filesystems to use this new API.
Also all existing NLS encodings would be needed to port into this new
API.

It is really something which have a value? Just because of UTF-8?

For me it looks like better option would be to remove UTF-8 NLS encoding
as it is broken. Some filesystems already do not use NLS API for their
UTF-8 support (e.g. vfat, udf or newly prepared exfat). And others could
be modified/extended/fixed in similar way.

> > 3) Add support for iocharset= and codepage= options for msdos
> > filesystem. It shares lot of pars of code with vfat driver.
> 
> I guess this is for msdos filesystem maintainers to decide.

Yes!

-- 
Pali Rohár
pali.rohar@...il.com

Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ