[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhw26Tf6LMP1fkH=bTD_LXEkUJ1soWwW+BrgoePsuzVww@mail.gmail.com>
Date: Tue, 26 Aug 2025 09:19:32 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Gabriel Krisman Bertazi <krisman@...e.de>
Cc: André Almeida <andrealmeid@...lia.com>,
Miklos Szeredi <miklos@...redi.hu>, Theodore Tso <tytso@....edu>, linux-unionfs@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
kernel-dev@...lia.com
Subject: Re: [PATCH v6 4/9] ovl: Create ovl_casefold() to support casefolded strncmp()
On Tue, Aug 26, 2025 at 3:34 AM Gabriel Krisman Bertazi <krisman@...e.de> wrote:
>
> Gabriel Krisman Bertazi <gabriel@...sman.be> writes:
>
> > Amir Goldstein <amir73il@...il.com> writes:
> >
> >> On Mon, Aug 25, 2025 at 5:27 PM Amir Goldstein <amir73il@...il.com> wrote:
> >>>
> >>> On Mon, Aug 25, 2025 at 1:09 PM Gabriel Krisman Bertazi
> >>> <gabriel@...sman.be> wrote:
> >>> >
> >>> > André Almeida <andrealmeid@...lia.com> writes:
> >>> >
> >>> > > To add overlayfs support casefold layers, create a new function
> >>> > > ovl_casefold(), to be able to do case-insensitive strncmp().
> >>> > >
> >>> > > ovl_casefold() allocates a new buffer and stores the casefolded version
> >>> > > of the string on it. If the allocation or the casefold operation fails,
> >>> > > fallback to use the original string.
> >>> > >
> >>> > > The case-insentive name is then used in the rb-tree search/insertion
> >>> > > operation. If the name is found in the rb-tree, the name can be
> >>> > > discarded and the buffer is freed. If the name isn't found, it's then
> >>> > > stored at struct ovl_cache_entry to be used later.
> >>> > >
> >>> > > Reviewed-by: Amir Goldstein <amir73il@...il.com>
> >>> > > Signed-off-by: André Almeida <andrealmeid@...lia.com>
> >>> > > ---
> >>> > > Changes from v6:
> >>> > > - Last version was using `strncmp(... tmp->len)` which was causing
> >>> > > regressions. It should be `strncmp(... len)`.
> >>> > > - Rename cf_len to c_len
> >>> > > - Use c_len for tree operation: (cmp < 0 || len < tmp->c_len)
> >>> > > - Remove needless kfree(cf_name)
> >>> > > ---
> >>> > > fs/overlayfs/readdir.c | 113 ++++++++++++++++++++++++++++++++++++++++---------
> >>> > > 1 file changed, 94 insertions(+), 19 deletions(-)
> >>> > >
> >>> > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> >>> > > index b65cdfce31ce27172d28d879559f1008b9c87320..dfc661b7bc3f87efbf14991e97cee169400d823b 100644
> >>> > > --- a/fs/overlayfs/readdir.c
> >>> > > +++ b/fs/overlayfs/readdir.c
> >>> > > @@ -27,6 +27,8 @@ struct ovl_cache_entry {
> >>> > > bool is_upper;
> >>> > > bool is_whiteout;
> >>> > > bool check_xwhiteout;
> >>> > > + const char *c_name;
> >>> > > + int c_len;
> >>> > > char name[];
> >>> > > };
> >>> > >
> >>> > > @@ -45,6 +47,7 @@ struct ovl_readdir_data {
> >>> > > struct list_head *list;
> >>> > > struct list_head middle;
> >>> > > struct ovl_cache_entry *first_maybe_whiteout;
> >>> > > + struct unicode_map *map;
> >>> > > int count;
> >>> > > int err;
> >>> > > bool is_upper;
> >>> > > @@ -66,6 +69,27 @@ static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
> >>> > > return rb_entry(n, struct ovl_cache_entry, node);
> >>> > > }
> >>> > >
> >>> > > +static int ovl_casefold(struct unicode_map *map, const char *str, int len, char **dst)
> >>> > > +{
> >>> > > + const struct qstr qstr = { .name = str, .len = len };
> >>> > > + int cf_len;
> >>> > > +
> >>> > > + if (!IS_ENABLED(CONFIG_UNICODE) || !map || is_dot_dotdot(str, len))
> >>> > > + return 0;
> >>> > > +
> >>> > > + *dst = kmalloc(NAME_MAX, GFP_KERNEL);
> >>> > > +
> >>> > > + if (dst) {
Andre,
Just noticed this is a bug, should have been if (*dst), but anyway following
Gabriel's comments I have made this change in my tree (pending more
strict related changes):
static int ovl_casefold(struct ovl_readdir_data *rdd, const char *str, int len,
char **dst)
{
const struct qstr qstr = { .name = str, .len = len };
char *cf_name;
int cf_len;
if (!IS_ENABLED(CONFIG_UNICODE) || !rdd->map || is_dot_dotdot(str, len))
return 0;
cf_name = kmalloc(NAME_MAX, GFP_KERNEL);
if (!cf_name) {
rdd->err = -ENOMEM;
return -ENOMEM;
}
cf_len = utf8_casefold(rdd->map, &qstr, *dst, NAME_MAX);
if (cf_len > 0)
*dst = cf_name;
else
kfree(cf_name);
return cf_len;
}
> >>> > > + cf_len = utf8_casefold(map, &qstr, *dst, NAME_MAX);
> >>> > > +
> >>> > > + if (cf_len > 0)
> >>> > > + return cf_len;
> >>> > > + }
> >>> > > +
> >>> > > + kfree(*dst);
> >>> > > + return 0;
> >>> > > +}
> >>> >
> >>> > Hi,
> >>> >
> >>> > I should just note this does not differentiates allocation errors from
> >>> > casefolding errors (invalid encoding). It might be just a theoretical
> >>> > error because GFP_KERNEL shouldn't fail (wink, wink) and the rest of the
> >>> > operation is likely to fail too, but if you have an allocation failure, you
> >>> > can end up with an inconsistent cache, because a file is added under the
> >>> > !casefolded name and a later successful lookup will look for the
> >>> > casefolded version.
> >>>
> >>> Good point.
> >>> I will fix this in my tree.
> >>
> >> wait why should we not fail to fill the cache for both allocation
> >> and encoding errors?
> >>
> >
> > We shouldn't fail the cache for encoding errors, just for allocation errors.
> >
> > Perhaps I am misreading the code, so please correct me if I'm wrong. if
> > ovl_casefold fails, the non-casefolded name is used in the cache. That
> > makes sense if the reason utf8_casefold failed is because the string
> > cannot be casefolded (i.e. an invalid utf-8 string). For those strings,
> > everything is fine. But on an allocation failure, the string might have
> > a real casefolded version. If we fallback to the original string as the
> > key, a cache lookup won't find the entry, since we compare with memcmp.
Just to make it clear in case the name "cache lookup" confuses anyone
on this thread - we are talking about ovl readdir cache, not about the vfs
lookup cache, the the purpose of ovl readdir cache is twofold:
1. plain in-memory readdir cache
2. (more important to this discussion) implementation of "merged dir" content
So I agree with you that with non-strict mode, invalid encoded names
should be added to readdir cache as is and not in the case of allocation
failure.
>
> I was thinking again about this and I suspect I misunderstood your
> question. let me try to answer it again:
>
> Ext4, f2fs and tmpfs all allow invalid utf8-encoded strings in a
> casefolded directory when running on non-strict-mode. They are treated
> as non-encoded byte-sequences, as if they were seen on a case-Sensitive
> directory. They can't collide with other filenames because they
> basically "fold" to themselves.
>
> Now I suspect there is another problem with this series: I don't see how
> it implements the semantics of strict mode. What happens if upper and
> lower are in strict mode (which is valid, same encoding_flags) but there
> is an invalid name in the lower? overlayfs should reject the dentry,
> because any attempt to create it to the upper will fail.
Ok, so IIUC, one issue is that return value from ovl_casefold() should be
conditional to the sb encoding_flags, which was inherited from the layers.
Again, *IF* I understand correctly, then strict mode ext4 will not allow
creating an invalid-encoded name, but will strict mode ext4 allow
it as a valid lookup result?
>
> André, did you consider this scenario?
In general, as I have told Andre from v1, please stick to the most common
configs that people actually need.
We do NOT need to support every possible combination of layers configurations.
This is why we went with supporting all-or-nothing configs for casefolder dirs.
Because it is simpler for overlayfs semantics and good enough for what
users need.
So my question is to you both: do users actually use strict mode for
wine and such?
Because if they don't I would rather support the default mode only
(enforced on mount)
and add support for strict mode later per actual users demand.
> You can test by creating a file
> with an invalid-encoded name in a casefolded directory of a
> non-strict-mode filesystem and then flip the strict-mode flag in the
> superblock. I can give it a try tomorrow too.
Can the sb flags be flipped in runtime? while mounted?
I suppose you are talking about an offline change that requires
re-mount of overlayfs and re-validate the same encoding flags on all layers?
Andre,
Please also add these and other casefold functional tests to fstest to
validate correctness of the merge dir implementation with different
casefold variants in different layers.
Thanks,
Amir.
Powered by blists - more mailing lists