[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <9ED1B796-23FE-422A-B6C9-5BEAE4FAA912@dilger.ca>
Date: Wed, 19 Feb 2025 13:30:05 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: Theodore Ts'o <tytso@....edu>
Cc: Ext4 Developers List <linux-ext4@...r.kernel.org>,
krisman@...e.de,
drosen@...gle.com
Subject: Re: [PATCH -v2] ext4: introduce linear search for dentries
On Feb 13, 2025, at 1:10 PM, Theodore Ts'o <tytso@....edu> wrote:
>
> This patch addresses an issue where some files in case-insensitive
> directories become inaccessible due to changes in how the kernel
> function, utf8_casefold(), generates case-folded strings from the
> commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code
> points").
>
> There are good reasons why this change should be made; it's actually
> quite stupid that Unicode seems to think that the characters ❤ and ❤️
> should be casefolded. Unfortimately because of the backwards
> compatibility issue, this commit was reverted in 231825b2e1ff.
>
> This problem is addressed by instituting a brute-force linear fallback
> if a lookup fails on case-folded directory, which does result in a
> performance hit when looking up files affected by the changing how
> thekernel treats ignorable Uniode characters, or when attempting to
> look up non-existent file names. So this fallback can be disabled by
> setting an encoding flag if in the future, the system administrator or
> the manufacturer of a mobile handset or tablet can be sure that there
> was no opportunity for a kernel to insert file names with incompatible
> encodings.
I don't have the full context here, but falling back to a full directory
scan for every failed lookup in a casefolded directory would be *very*
expensive for a large directory.
This could be made conditional upon a much narrower set of conditions:
- if the filename has non-ASCII characters (already uncommon)
- if the filename contains characters that may be case folded (normalized?)
This avoids a huge performance hit for every name lookup in very common
workloads that do not need it (i.e. most computer-generated filenames are
still only using ASCII characters).
Also, depending on the size of the directory vs. the number of case-folded
(normalized?) characters in the filename, it might be faster to do
2^(ambiguous_chars) htree lookups instead of a linear scan of the whole dir.
That could be checked easily whether 2^(ambiguous_chars) < dir blocks, since
the htree leaf blocks will always be fully scanned anyway once found. That
could be a big win if there are only a few remapped characters in a filename.
Cheers, Andreas
>
> Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points")
> Signed-off-by: Theodore Ts'o <tytso@....edu>
> Reviewed-by: Gabriel Krisman Bertazi <krisman@...e.de>
> ---
> v2:
> * Fix compile failure when CONFIG_UNICODE is not enabled
> * Added reviewed-by from Gabriel Krisman
>
> fs/ext4/namei.c | 14 ++++++++++----
> include/linux/fs.h | 10 +++++++++-
> 2 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 536d56d15072..820e7ab7f3a3 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent,
> * sure cf_name was properly initialized before
> * considering the calculated hash.
> */
> - if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
> + if (sb_no_casefold_compat_fallback(parent->i_sb) &&
> + IS_ENCRYPTED(parent) && fname->cf_name.name &&
> (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
> return false;
> @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir,
> * return. Otherwise, fall back to doing a search the
> * old fashioned way.
> */
> - if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR)
> + if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR)
> + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
> + "falling back\n"));
> + else if (!sb_no_casefold_compat_fallback(dir->i_sb) &&
> + *res_dir == NULL && IS_CASEFOLDED(dir))
> + dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold "
> + "failed, falling back\n"));
> + else
> goto cleanup_and_exit;
> - dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
> - "falling back\n"));
> ret = NULL;
> }
> nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2c3b2f8a621f..aa4ec39202c3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1258,11 +1258,19 @@ extern int send_sigurg(struct file *file);
> #define SB_NOUSER BIT(31)
>
> /* These flags relate to encoding and casefolding */
> -#define SB_ENC_STRICT_MODE_FL (1 << 0)
> +#define SB_ENC_STRICT_MODE_FL (1 << 0)
> +#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1)
>
> #define sb_has_strict_encoding(sb) \
> (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
>
> +#if IS_ENABLED(CONFIG_UNICODE)
> +#define sb_no_casefold_compat_fallback(sb) \
> + (sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL)
> +#else
> +#define sb_no_casefold_compat_fallback(sb) (1)
> +#endif
> +
> /*
> * Umount options
> */
> --
> 2.45.2
>
>
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists