[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240104152717.rj7mmmij77q3mbiu@quack3>
Date: Thu, 4 Jan 2024 16:27:17 +0100
From: Jan Kara <jack@...e.cz>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Ritesh Harjani <ritesh.list@...il.com>,
linux-kernel@...r.kernel.org, Jan Kara <jack@...e.cz>,
glandvador@...oo.com, bugzilla@...l.emu.id.au
Subject: Re: [PATCH 1/1] ext4: fallback to complex scan if aligned scan
doesn't work
On Fri 15-12-23 16:49:50, Ojaswin Mujoo wrote:
> Currently in case the goal length is a multiple of stripe size we use
> ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
> In case we are not able to find any, we again go back to calling
> ext4_mb_choose_next_group() to search for a different suitable block
> group. However, since the linear search always begins from the start,
> most of the times we end up with the same BG and the cycle continues.
>
> With large fliesystems, the CPU can be stuck in this loop for hours
> which can slow down the whole system. Hence, until we figure out a
> better way to continue the search (rather than starting from beginning)
> in ext4_mb_choose_next_group(), lets just fallback to
> ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
> more likely to find the needed blocks.
>
> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
If I understand the difference right, the problem is that while
ext4_mb_choose_next_group() guarantees large enough free space extent for
the CR_GOAL_LEN_FAST or CR_BEST_AVAIL_LEN passes, it does not guaranteed
large enough *aligned* free space extent. Thus for non-aligned allocations
we can fail only due to a race with another allocating process but with
aligned allocations we can consistently fail in ext4_mb_scan_aligned() and
thus livelock in the allocation loop.
If my understanding is correct, feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
> ---
> fs/ext4/mballoc.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d72b5e3c92ec..63f12ec02485 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2895,14 +2895,19 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> ac->ac_groups_scanned++;
> if (cr == CR_POWER2_ALIGNED)
> ext4_mb_simple_scan_group(ac, &e4b);
> - else if ((cr == CR_GOAL_LEN_FAST ||
> - cr == CR_BEST_AVAIL_LEN) &&
> - sbi->s_stripe &&
> - !(ac->ac_g_ex.fe_len %
> - EXT4_B2C(sbi, sbi->s_stripe)))
> - ext4_mb_scan_aligned(ac, &e4b);
> - else
> - ext4_mb_complex_scan_group(ac, &e4b);
> + else {
> + bool is_stripe_aligned = sbi->s_stripe &&
> + !(ac->ac_g_ex.fe_len %
> + EXT4_B2C(sbi, sbi->s_stripe));
> +
> + if ((cr == CR_GOAL_LEN_FAST ||
> + cr == CR_BEST_AVAIL_LEN) &&
> + is_stripe_aligned)
> + ext4_mb_scan_aligned(ac, &e4b);
> +
> + if (ac->ac_status == AC_STATUS_CONTINUE)
> + ext4_mb_complex_scan_group(ac, &e4b);
> + }
>
> ext4_unlock_group(sb, group);
> ext4_mb_unload_buddy(&e4b);
> --
> 2.39.3
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists