[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1oAiAAKzAmV5M2h@google.com>
Date: Wed, 11 Dec 2024 21:13:44 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Gabriel Krisman Bertazi <krisman@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"hanqi@...o.com" <hanqi@...o.com>, Theodore Ts'o <tytso@....edu>
Subject: Re: Unicode conversion issue
On 12/11, Linus Torvalds wrote:
> On Wed, 11 Dec 2024 at 11:58, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > The problem is that all the filesystems basically do some variation of
> >
> > if (IS_CASEFOLDED(dir) ..) {
> >
> > len = utf8_casefold(sb->s_encoding, orig_name,
> > new_name, MAXLEN);
> >
> > and then they use that "new_name" for both hashing and for comparisons.
>
> Oh, actually, f2fs does pass in the original name to
> generic_ci_match(), so I think this is solvable.
>
> The solution involves just telling f2fs to ignore the hash if it has
> seen odd characters.
But, the hash is not just used when matching the dentry, but gives a block
location withiin multi-level hash tables for faster lookup as well. If the
filename length is also changed by the unicode patch, utf8_strncasecmp_folded()
will also give an error?
>
> So I think f2fs could actually do something like this:
>
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -67,6 +67,7 @@ int f2fs_init_casefolded_name(const struct inode *dir,
> /* fall back to treating name as opaque byte sequence */
> return 0;
> }
> + fname->ignore_hash = utf8_oddname(fname->usr_fname);
> fname->cf_name.name = buf;
> fname->cf_name.len = len;
> }
> @@ -231,7 +232,7 @@ struct f2fs_dir_entry
> *f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
> continue;
> }
>
> - if (de->hash_code == fname->hash) {
> + if (fname->ignore_hash || de->hash_code == fname->hash) {
> res = f2fs_match_name(d->inode, fname,
> d->filename[bit_pos],
> le16_to_cpu(de->name_len));
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -521,6 +521,7 @@ struct f2fs_filename {
>
> /* The dirhash of this filename */
> f2fs_hash_t hash;
> + bool ignore_hash;
>
> #ifdef CONFIG_FS_ENCRYPTION
> /*
>
> where that "utf8_oddname()" is the one that goes "this filename
> contains unhashable characters".
>
> I didn't look very closely at what ext4 does, but it seems to already
> have a pattern for "don't even look at the hash because it's not
> reliable", so I think ext4 can do something similar.
>
> So then all you actually need is that utf8_oddname() that recognizes
> those ignored code-points.
>
> So I take it all back: option (1) actually doesn't look that bad, and
> would make reverting commit 5c26d2f1d3f5 ("unicode: Don't special case
> ignorable code points") unnecessary.
>
> Linus
Powered by blists - more mailing lists