linux-kernel - Re: Unicode conversion issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z1oAiAAKzAmV5M2h@google.com>
Date: Wed, 11 Dec 2024 21:13:44 +0000
From: Jaegeuk Kim <jaegeuk@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Gabriel Krisman Bertazi <krisman@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"hanqi@...o.com" <hanqi@...o.com>, Theodore Ts'o <tytso@....edu>
Subject: Re: Unicode conversion issue

On 12/11, Linus Torvalds wrote:
> On Wed, 11 Dec 2024 at 11:58, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > The problem is that all the filesystems basically do some variation of
> >
> >         if (IS_CASEFOLDED(dir) ..) {
> >
> >                 len = utf8_casefold(sb->s_encoding, orig_name,
> >                         new_name, MAXLEN);
> >
> > and then they use that "new_name" for both hashing and for comparisons.
> 
> Oh, actually, f2fs does pass in the original name to
> generic_ci_match(), so I think this is solvable.
> 
> The solution involves just telling f2fs to ignore the hash if it has
> seen odd characters.

But, the hash is not just used when matching the dentry, but gives a block
location withiin multi-level hash tables for faster lookup as well. If the
filename length is also changed by the unicode patch, utf8_strncasecmp_folded()
will also give an error?

> 
> So I think f2fs could actually do something like this:
> 
>   --- a/fs/f2fs/dir.c
>   +++ b/fs/f2fs/dir.c
>   @@ -67,6 +67,7 @@ int f2fs_init_casefolded_name(const struct inode *dir,
>                         /* fall back to treating name as opaque byte sequence */
>                         return 0;
>                 }
>   +             fname->ignore_hash = utf8_oddname(fname->usr_fname);
>                 fname->cf_name.name = buf;
>                 fname->cf_name.len = len;
>         }
>   @@ -231,7 +232,7 @@ struct f2fs_dir_entry
> *f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
>                         continue;
>                 }
> 
>   -             if (de->hash_code == fname->hash) {
>   +             if (fname->ignore_hash || de->hash_code == fname->hash) {
>                         res = f2fs_match_name(d->inode, fname,
>                                               d->filename[bit_pos],
>                                               le16_to_cpu(de->name_len));
>   --- a/fs/f2fs/f2fs.h
>   +++ b/fs/f2fs/f2fs.h
>   @@ -521,6 +521,7 @@ struct f2fs_filename {
> 
>         /* The dirhash of this filename */
>         f2fs_hash_t hash;
>   +     bool ignore_hash;
> 
>    #ifdef CONFIG_FS_ENCRYPTION
>         /*
> 
> where that "utf8_oddname()" is the one that goes "this filename
> contains unhashable characters".
> 
> I didn't look very closely at what ext4 does, but it seems to already
> have a pattern for "don't even look at the hash because it's not
> reliable", so I think ext4 can do something similar.
> 
> So then all you actually need is that utf8_oddname() that recognizes
> those ignored code-points.
> 
> So I take it all back: option (1) actually doesn't look that bad, and
> would make reverting commit 5c26d2f1d3f5 ("unicode: Don't special case
> ignorable code points") unnecessary.
> 
>                 Linus