linux-kernel - Re: Unicode conversion issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <875xnqudr1.fsf@mailhost.krisman.be>
Date: Wed, 11 Dec 2024 16:10:58 -0500
From: Gabriel Krisman Bertazi <krisman@...e.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Jaegeuk Kim <jaegeuk@...nel.org>,  Linux Kernel Mailing List
 <linux-kernel@...r.kernel.org>,  "hanqi@...o.com" <hanqi@...o.com>,
  "Theodore Ts'o" <tytso@....edu>
Subject: Re: Unicode conversion issue

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Wed, 11 Dec 2024 at 11:58, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> The problem is that all the filesystems basically do some variation of
>>
>>         if (IS_CASEFOLDED(dir) ..) {
>>
>>                 len = utf8_casefold(sb->s_encoding, orig_name,
>>                         new_name, MAXLEN);
>>
>> and then they use that "new_name" for both hashing and for comparisons.
>
> Oh, actually, f2fs does pass in the original name to
> generic_ci_match(), so I think this is solvable.
>
> The solution involves just telling f2fs to ignore the hash if it has
> seen odd characters.
>
> So I think f2fs could actually do something like this:
>
>   --- a/fs/f2fs/dir.c
>   +++ b/fs/f2fs/dir.c
>   @@ -67,6 +67,7 @@ int f2fs_init_casefolded_name(const struct inode *dir,
>                         /* fall back to treating name as opaque byte sequence */
>                         return 0;
>                 }
>   +             fname->ignore_hash = utf8_oddname(fname->usr_fname);
>                 fname->cf_name.name = buf;
>                 fname->cf_name.len = len;
>         }
>   @@ -231,7 +232,7 @@ struct f2fs_dir_entry
> *f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
>                         continue;
>                 }
>
>   -             if (de->hash_code == fname->hash) {
>   +             if (fname->ignore_hash || de->hash_code == fname->hash) {
>                         res = f2fs_match_name(d->inode, fname,
>                                               d->filename[bit_pos],
>                                               le16_to_cpu(de->name_len));

This solves it for directories with inlined dirents
(FI_INLINE_DENTRY). but for large directories, we use fname->hash to
find the right block to start the search.  So, we'd need to walk through
the entire case-insensitive directory.  In ext4, the issue only exists
on large directories, because we don't care about the hash on small
directories.


>   --- a/fs/f2fs/f2fs.h
>   +++ b/fs/f2fs/f2fs.h
>   @@ -521,6 +521,7 @@ struct f2fs_filename {
>
>         /* The dirhash of this filename */
>         f2fs_hash_t hash;
>   +     bool ignore_hash;
>
>    #ifdef CONFIG_FS_ENCRYPTION
>         /*
>
> where that "utf8_oddname()" is the one that goes "this filename
> contains unhashable characters".
>
> I didn't look very closely at what ext4 does, but it seems to already
> have a pattern for "don't even look at the hash because it's not
> reliable", so I think ext4 can do something similar.

> So then all you actually need is that utf8_oddname() that recognizes
> those ignored code-points.
>
> So I take it all back: option (1) actually doesn't look that bad, and
> would make reverting commit 5c26d2f1d3f5 ("unicode: Don't special case
> ignorable code points") unnecessary.

I think we really need to revert it. The simplest way to implement
utf8_oddname is having the full database with the Ignorable code points
available. We can then add a flag in the same data structure indicating
this is an Ignorable codepoint that should be dismissed by the
utf8_strncasecmp when doing the casefold, while still using the full
string for the hash.

-- 
Gabriel Krisman Bertazi