[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87plci3lxw.fsf@mailhost.krisman.be>
Date: Mon, 25 Aug 2025 21:34:35 -0400
From: Gabriel Krisman Bertazi <krisman@...e.de>
To: Amir Goldstein <amir73il@...il.com>
Cc: André Almeida <andrealmeid@...lia.com>, Miklos Szeredi
<miklos@...redi.hu>, Theodore Tso <tytso@....edu>,
linux-unionfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
kernel-dev@...lia.com
Subject: Re: [PATCH v6 4/9] ovl: Create ovl_casefold() to support casefolded
strncmp()
Gabriel Krisman Bertazi <gabriel@...sman.be> writes:
> Amir Goldstein <amir73il@...il.com> writes:
>
>> On Mon, Aug 25, 2025 at 5:27 PM Amir Goldstein <amir73il@...il.com> wrote:
>>>
>>> On Mon, Aug 25, 2025 at 1:09 PM Gabriel Krisman Bertazi
>>> <gabriel@...sman.be> wrote:
>>> >
>>> > André Almeida <andrealmeid@...lia.com> writes:
>>> >
>>> > > To add overlayfs support casefold layers, create a new function
>>> > > ovl_casefold(), to be able to do case-insensitive strncmp().
>>> > >
>>> > > ovl_casefold() allocates a new buffer and stores the casefolded version
>>> > > of the string on it. If the allocation or the casefold operation fails,
>>> > > fallback to use the original string.
>>> > >
>>> > > The case-insentive name is then used in the rb-tree search/insertion
>>> > > operation. If the name is found in the rb-tree, the name can be
>>> > > discarded and the buffer is freed. If the name isn't found, it's then
>>> > > stored at struct ovl_cache_entry to be used later.
>>> > >
>>> > > Reviewed-by: Amir Goldstein <amir73il@...il.com>
>>> > > Signed-off-by: André Almeida <andrealmeid@...lia.com>
>>> > > ---
>>> > > Changes from v6:
>>> > > - Last version was using `strncmp(... tmp->len)` which was causing
>>> > > regressions. It should be `strncmp(... len)`.
>>> > > - Rename cf_len to c_len
>>> > > - Use c_len for tree operation: (cmp < 0 || len < tmp->c_len)
>>> > > - Remove needless kfree(cf_name)
>>> > > ---
>>> > > fs/overlayfs/readdir.c | 113 ++++++++++++++++++++++++++++++++++++++++---------
>>> > > 1 file changed, 94 insertions(+), 19 deletions(-)
>>> > >
>>> > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
>>> > > index b65cdfce31ce27172d28d879559f1008b9c87320..dfc661b7bc3f87efbf14991e97cee169400d823b 100644
>>> > > --- a/fs/overlayfs/readdir.c
>>> > > +++ b/fs/overlayfs/readdir.c
>>> > > @@ -27,6 +27,8 @@ struct ovl_cache_entry {
>>> > > bool is_upper;
>>> > > bool is_whiteout;
>>> > > bool check_xwhiteout;
>>> > > + const char *c_name;
>>> > > + int c_len;
>>> > > char name[];
>>> > > };
>>> > >
>>> > > @@ -45,6 +47,7 @@ struct ovl_readdir_data {
>>> > > struct list_head *list;
>>> > > struct list_head middle;
>>> > > struct ovl_cache_entry *first_maybe_whiteout;
>>> > > + struct unicode_map *map;
>>> > > int count;
>>> > > int err;
>>> > > bool is_upper;
>>> > > @@ -66,6 +69,27 @@ static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
>>> > > return rb_entry(n, struct ovl_cache_entry, node);
>>> > > }
>>> > >
>>> > > +static int ovl_casefold(struct unicode_map *map, const char *str, int len, char **dst)
>>> > > +{
>>> > > + const struct qstr qstr = { .name = str, .len = len };
>>> > > + int cf_len;
>>> > > +
>>> > > + if (!IS_ENABLED(CONFIG_UNICODE) || !map || is_dot_dotdot(str, len))
>>> > > + return 0;
>>> > > +
>>> > > + *dst = kmalloc(NAME_MAX, GFP_KERNEL);
>>> > > +
>>> > > + if (dst) {
>>> > > + cf_len = utf8_casefold(map, &qstr, *dst, NAME_MAX);
>>> > > +
>>> > > + if (cf_len > 0)
>>> > > + return cf_len;
>>> > > + }
>>> > > +
>>> > > + kfree(*dst);
>>> > > + return 0;
>>> > > +}
>>> >
>>> > Hi,
>>> >
>>> > I should just note this does not differentiates allocation errors from
>>> > casefolding errors (invalid encoding). It might be just a theoretical
>>> > error because GFP_KERNEL shouldn't fail (wink, wink) and the rest of the
>>> > operation is likely to fail too, but if you have an allocation failure, you
>>> > can end up with an inconsistent cache, because a file is added under the
>>> > !casefolded name and a later successful lookup will look for the
>>> > casefolded version.
>>>
>>> Good point.
>>> I will fix this in my tree.
>>
>> wait why should we not fail to fill the cache for both allocation
>> and encoding errors?
>>
>
> We shouldn't fail the cache for encoding errors, just for allocation errors.
>
> Perhaps I am misreading the code, so please correct me if I'm wrong. if
> ovl_casefold fails, the non-casefolded name is used in the cache. That
> makes sense if the reason utf8_casefold failed is because the string
> cannot be casefolded (i.e. an invalid utf-8 string). For those strings,
> everything is fine. But on an allocation failure, the string might have
> a real casefolded version. If we fallback to the original string as the
> key, a cache lookup won't find the entry, since we compare with memcmp.
I was thinking again about this and I suspect I misunderstood your
question. let me try to answer it again:
Ext4, f2fs and tmpfs all allow invalid utf8-encoded strings in a
casefolded directory when running on non-strict-mode. They are treated
as non-encoded byte-sequences, as if they were seen on a case-Sensitive
directory. They can't collide with other filenames because they
basically "fold" to themselves.
Now I suspect there is another problem with this series: I don't see how
it implements the semantics of strict mode. What happens if upper and
lower are in strict mode (which is valid, same encoding_flags) but there
is an invalid name in the lower? overlayfs should reject the dentry,
because any attempt to create it to the upper will fail.
André, did you consider this scenario? You can test by creating a file
with an invalid-encoded name in a casefolded directory of a
non-strict-mode filesystem and then flip the strict-mode flag in the
superblock. I can give it a try tomorrow too.
Thanks,
--
Gabriel Krisman Bertazi
Powered by blists - more mailing lists