linux-ext4 - Re: [PATCH 1/1] ext4: fallback to complex scan if aligned scan doesn't work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240104152717.rj7mmmij77q3mbiu@quack3>
Date: Thu, 4 Jan 2024 16:27:17 +0100
From: Jan Kara <jack@...e.cz>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
	Ritesh Harjani <ritesh.list@...il.com>,
	linux-kernel@...r.kernel.org, Jan Kara <jack@...e.cz>,
	glandvador@...oo.com, bugzilla@...l.emu.id.au
Subject: Re: [PATCH 1/1] ext4: fallback to complex scan if aligned scan
 doesn't work

On Fri 15-12-23 16:49:50, Ojaswin Mujoo wrote:
> Currently in case the goal length is a multiple of stripe size we use
> ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
> In case we are not able to find any, we again go back to calling
> ext4_mb_choose_next_group() to search for a different suitable block
> group. However, since the linear search always begins from the start,
> most of the times we end up with the same BG and the cycle continues.
> 
> With large fliesystems, the CPU can be stuck in this loop for hours
> which can slow down the whole system. Hence, until we figure out a
> better way to continue the search (rather than starting from beginning)
> in ext4_mb_choose_next_group(), lets just fallback to
> ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
> more likely to find the needed blocks.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>

If I understand the difference right, the problem is that while
ext4_mb_choose_next_group() guarantees large enough free space extent for
the CR_GOAL_LEN_FAST or CR_BEST_AVAIL_LEN passes, it does not guaranteed
large enough *aligned* free space extent. Thus for non-aligned allocations
we can fail only due to a race with another allocating process but with
aligned allocations we can consistently fail in ext4_mb_scan_aligned() and
thus livelock in the allocation loop.

If my understanding is correct, feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

								Honza



> ---
>  fs/ext4/mballoc.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d72b5e3c92ec..63f12ec02485 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2895,14 +2895,19 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>  			ac->ac_groups_scanned++;
>  			if (cr == CR_POWER2_ALIGNED)
>  				ext4_mb_simple_scan_group(ac, &e4b);
> -			else if ((cr == CR_GOAL_LEN_FAST ||
> -				 cr == CR_BEST_AVAIL_LEN) &&
> -				 sbi->s_stripe &&
> -				 !(ac->ac_g_ex.fe_len %
> -				 EXT4_B2C(sbi, sbi->s_stripe)))
> -				ext4_mb_scan_aligned(ac, &e4b);
> -			else
> -				ext4_mb_complex_scan_group(ac, &e4b);
> +			else {
> +				bool is_stripe_aligned = sbi->s_stripe &&
> +					!(ac->ac_g_ex.fe_len %
> +					  EXT4_B2C(sbi, sbi->s_stripe));
> +
> +				if ((cr == CR_GOAL_LEN_FAST ||
> +				     cr == CR_BEST_AVAIL_LEN) &&
> +				    is_stripe_aligned)
> +					ext4_mb_scan_aligned(ac, &e4b);
> +
> +				if (ac->ac_status == AC_STATUS_CONTINUE)
> +					ext4_mb_complex_scan_group(ac, &e4b);
> +			}
>  
>  			ext4_unlock_group(sb, group);
>  			ext4_mb_unload_buddy(&e4b);
> -- 
> 2.39.3
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR