[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87seiycz0w.fsf@mailhost.krisman.be>
Date: Mon, 14 Jul 2025 16:12:31 -0400
From: Gabriel Krisman Bertazi <gabriel@...sman.be>
To: Amir Goldstein <amir73il@...il.com>
Cc: André Almeida <andrealmeid@...lia.com>, Miklos Szeredi
<miklos@...redi.hu>, Theodore Tso <tytso@....edu>,
linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
kernel-dev@...lia.com
Subject: Re: [PATCH 1/3] ovl: Make ovl_cache_entry_find support casefold
Amir Goldstein <amir73il@...il.com> writes:
> On Wed, Apr 9, 2025 at 5:01 PM André Almeida <andrealmeid@...lia.com> wrote:
>>
>> To add overlayfs support casefold filesystems, make
>> ovl_cache_entry_find() support casefold dentries.
>>
>> For the casefold support, just comparing the strings does not work
>> because we need the dentry enconding, so make this function find the
>> equivalent dentry for a giving directory, if any.
>>
>> Also, if two strings are not equal, strncmp() return value sign can be
>> either positive or negative and this information can be used to optimize
>> the walk in the rb tree. utf8_strncmp(), in the other hand, just return
>> true or false, so replace the rb walk with a normal rb_next() function.
>
> You cannot just replace a more performance implementation with a
> less performant one for everyone else just for your niche use case.
> Also it is the wrong approach.
>
> This code needs to use utf8_normalize() to store the normalized
> name in the rbtree instead of doing lookup and d_same_name().
> and you need to do ovl_cache_entry_add_rb() with the normalized
> name anotherwise you break the logic of ovl_dir_read_merged().
>
> Gabriel,
>
> Do you think it makes sense to use utf8_normalize() from this code
> directly to generate a key for "is this name found in another layer"
> search tree?
utf8_normalize is on its way out of the kernel and I don't think it
would help here, since it doesn't handle case-insensitive equivalent
names either, bug is just as expensive.
utf8_casefold might do what you want, but it is expensive as well. With
it, you can store the folded version and be sure it is a byte-per-byte
match.
Alternatively, you can keep the existing name and open code something
similar to what generic_ci_match does: check with strncmp first and only
if the mountpoint must consider case-insensitive, do a
utf8_strncasecmp_folded if the first check wasn't a match.
--
Gabriel Krisman Bertazi
Powered by blists - more mailing lists