[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210815094224.dswbjywnhvajvzjv@pali>
Date: Sun, 15 Aug 2021 11:42:24 +0200
From: Pali Rohár <pali@...nel.org>
To: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Cc: linux-fsdevel@...r.kernel.org,
linux-ntfs-dev@...ts.sourceforge.net, linux-cifs@...r.kernel.org,
jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Jan Kara <jack@...e.cz>, "Theodore Y . Ts'o" <tytso@....edu>,
Luis de Bethencourt <luisbg@...nel.org>,
Salah Triki <salah.triki@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Kleikamp <shaggy@...nel.org>,
Anton Altaparmakov <anton@...era.com>,
Pavel Machek <pavel@....cz>,
Marek Behún <marek.behun@....cz>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
On Sunday 15 August 2021 12:42:47 OGAWA Hirofumi wrote:
> Pali Rohár <pali@...nel.org> writes:
>
> > Currently iocharset=utf8 mount option is broken and error is printed to
> > dmesg when it is used. To use UTF-8 as iocharset, it is required to use
> > utf8=1 mount option.
> >
> > Fix iocharset=utf8 mount option to use be equivalent to the utf8=1 mount
> > option and remove printing error from dmesg.
>
> This change is not equivalent to utf8=1. In the case of utf8=1, vfat
> uses iocharset's conversion table and it can handle more than ascii.
>
> So this patch is incompatible changes, and handles less chars than
> utf8=1. So I think this is clean though, but this would be regression
> for user of utf8=1.
I do not think so... But please correct me, as this code around is mess.
Without this change when utf8=1 is set then iocharset= encoding is used
for case-insensitivity implementation (toupper / tolower conversion).
For all other parts are use correct utf8* conversion functions.
But you use touppper / tolower functions from iocharset= encoding on
stream of utf8 bytes then you either get identity or some unpredictable
garbage in utf8. So when comparing two (different) non-ASCII filenames
via this method you in most cases get that filenames are different.
Because converting their utf8 bytes via toupper / tolower functions from
iocharset= encoding results in two different byte sequences in most
cases. Even for two utf8 case-insensitive same strings.
But you can play with it and I guess it is possible to find two
different utf8 strings which after toupper / tolower conversion from
some iocharset= encoding would lead to same byte sequence.
This patch uses for utf8 tolower / touppser function simple 7-bit
tolower / toupper ascii function. And so for 7-bit ascii file names
there is no change.
So this patch changes behavior when comparing non 7-bit ascii file
names, but only in cases when previously two different file names were
marked as same. As now they are marked correctly as different. So this
is changed behavior, but I guess it is bug fix which is needed.
If you want I can put this change into separate patch.
Issue that two case-insensitive same files are marked as different is
not changed by this patch and therefore this issue stay here.
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Powered by blists - more mailing lists