[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871r6vt0it.fsf@mail.parknet.co.jp>
Date: Sun, 15 Aug 2021 20:23:06 +0900
From: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
To: Pali Rohár <pali@...nel.org>
Cc: linux-fsdevel@...r.kernel.org,
linux-ntfs-dev@...ts.sourceforge.net, linux-cifs@...r.kernel.org,
jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Jan Kara <jack@...e.cz>, "Theodore Y . Ts'o" <tytso@....edu>,
Luis de Bethencourt <luisbg@...nel.org>,
Salah Triki <salah.triki@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Kleikamp <shaggy@...nel.org>,
Anton Altaparmakov <anton@...era.com>,
Pavel Machek <pavel@....cz>,
Marek Behún <marek.behun@....cz>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
To: Pali Rohár <pali@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-ntfs-dev@...ts.sourceforge.net, linux-cifs@...r.kernel.org, jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, "Theodore Y . Ts'o" <tytso@....edu>, Luis de Bethencourt <luisbg@...nel.org>, Salah Triki <salah.triki@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, Dave Kleikamp <shaggy@...nel.org>, Anton Altaparmakov <anton@...era.com>, Pavel Machek <pavel@....cz>, Marek Behún <marek.behun@....cz>, Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
From: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Gcc: nnimap+ibmpc.myhome.or.jp:Sent
--text follows this line--
To: Pali Rohár <pali@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-ntfs-dev@...ts.sourceforge.net, linux-cifs@...r.kernel.org, jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, "Theodore Y . Ts'o" <tytso@....edu>, Luis de Bethencourt <luisbg@...nel.org>, Salah Triki <salah.triki@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, Dave Kleikamp <shaggy@...nel.org>, Anton Altaparmakov <anton@...era.com>, Pavel Machek <pavel@....cz>, Marek Behún <marek.behun@....cz>, Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option
From: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Gcc: nnimap+ibmpc.myhome.or.jp:Sent
--text follows this line--
Pali Rohár <pali@...nel.org> writes:
>> This change is not equivalent to utf8=1. In the case of utf8=1, vfat
>> uses iocharset's conversion table and it can handle more than ascii.
>>
>> So this patch is incompatible changes, and handles less chars than
>> utf8=1. So I think this is clean though, but this would be regression
>> for user of utf8=1.
>
> I do not think so... But please correct me, as this code around is mess.
>
> Without this change when utf8=1 is set then iocharset= encoding is used
> for case-insensitivity implementation (toupper / tolower conversion).
> For all other parts are use correct utf8* conversion functions.
>
> But you use touppper / tolower functions from iocharset= encoding on
> stream of utf8 bytes then you either get identity or some unpredictable
> garbage in utf8. So when comparing two (different) non-ASCII filenames
> via this method you in most cases get that filenames are different.
> Because converting their utf8 bytes via toupper / tolower functions from
> iocharset= encoding results in two different byte sequences in most
> cases. Even for two utf8 case-insensitive same strings.
>
> But you can play with it and I guess it is possible to find two
> different utf8 strings which after toupper / tolower conversion from
> some iocharset= encoding would lead to same byte sequence.
>
> This patch uses for utf8 tolower / touppser function simple 7-bit
> tolower / toupper ascii function. And so for 7-bit ascii file names
> there is no change.
>
> So this patch changes behavior when comparing non 7-bit ascii file
> names, but only in cases when previously two different file names were
> marked as same. As now they are marked correctly as different. So this
> is changed behavior, but I guess it is bug fix which is needed.
> If you want I can put this change into separate patch.
>
> Issue that two case-insensitive same files are marked as different is
> not changed by this patch and therefore this issue stay here.
OK, sure. utf8 looks like broken than I was thinking (although user can
use iocharset=ascii and utf8=1 for this). The code might be better to
clean up a bit more though, looks like good basically.
One thing, please update FAT_DEFAULT_IOCHARSET help in Kconfig and
Documentation/filesystems/vfat.rst (with new warning about iocharset=utf8).
Thanks.
--
OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Powered by blists - more mailing lists