linux-kernel - Re: Unicode conversion issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wiC3evUXq8QTcOBFTMu1wsUR_dYiS8eGxy0Hh7VbL55yA@mail.gmail.com>
Date: Wed, 11 Dec 2024 12:18:25 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Gabriel Krisman Bertazi <krisman@...e.de>
Cc: Jaegeuk Kim <jaegeuk@...nel.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, "hanqi@...o.com" <hanqi@...o.com>, 
	"Theodore Ts'o" <tytso@....edu>
Subject: Re: Unicode conversion issue

On Wed, 11 Dec 2024 at 11:58, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> The problem is that all the filesystems basically do some variation of
>
>         if (IS_CASEFOLDED(dir) ..) {
>
>                 len = utf8_casefold(sb->s_encoding, orig_name,
>                         new_name, MAXLEN);
>
> and then they use that "new_name" for both hashing and for comparisons.

Oh, actually, f2fs does pass in the original name to
generic_ci_match(), so I think this is solvable.

The solution involves just telling f2fs to ignore the hash if it has
seen odd characters.

So I think f2fs could actually do something like this:

  --- a/fs/f2fs/dir.c
  +++ b/fs/f2fs/dir.c
  @@ -67,6 +67,7 @@ int f2fs_init_casefolded_name(const struct inode *dir,
                        /* fall back to treating name as opaque byte sequence */
                        return 0;
                }
  +             fname->ignore_hash = utf8_oddname(fname->usr_fname);
                fname->cf_name.name = buf;
                fname->cf_name.len = len;
        }
  @@ -231,7 +232,7 @@ struct f2fs_dir_entry
*f2fs_find_target_dentry(const struct f2fs_dentry_ptr *d,
                        continue;
                }

  -             if (de->hash_code == fname->hash) {
  +             if (fname->ignore_hash || de->hash_code == fname->hash) {
                        res = f2fs_match_name(d->inode, fname,
                                              d->filename[bit_pos],
                                              le16_to_cpu(de->name_len));
  --- a/fs/f2fs/f2fs.h
  +++ b/fs/f2fs/f2fs.h
  @@ -521,6 +521,7 @@ struct f2fs_filename {

        /* The dirhash of this filename */
        f2fs_hash_t hash;
  +     bool ignore_hash;

   #ifdef CONFIG_FS_ENCRYPTION
        /*

where that "utf8_oddname()" is the one that goes "this filename
contains unhashable characters".

I didn't look very closely at what ext4 does, but it seems to already
have a pattern for "don't even look at the hash because it's not
reliable", so I think ext4 can do something similar.

So then all you actually need is that utf8_oddname() that recognizes
those ignored code-points.

So I take it all back: option (1) actually doesn't look that bad, and
would make reverting commit 5c26d2f1d3f5 ("unicode: Don't special case
ignorable code points") unnecessary.

                Linus