[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200521070432.AE06852051@d06av21.portsmouth.uk.ibm.com>
Date: Thu, 21 May 2020 12:34:29 +0530
From: Ritesh Harjani <riteshh@...ux.ibm.com>
To: Alex Zhuravlev <azhuravlev@...mcloud.com>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized
groups
On 5/20/20 3:15 PM, Alex Zhuravlev wrote:
> cr=0 is supposed to be an optimization to save CPU cycles, but if
> buddy data (in memory) is not initialized then all this makes no
> sense as we have to do sync IO taking a lot of cycles.
> also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
> skips groups using heuristic based on avg. fragment size. it's more
> useful to skip such groups and switch to cr=2 where groups will be
> scanned for available chunks.
>
> The goal group is not skipped to prevent allocations in foreign groups,
> which can happen after mount while buddy data is still being populated.
>
> using sparse image and dm-slow virtual device of 120TB was simulated.
> then the image was formatted and filled using debugfs to mark ~85% of
> available space as busy. the very first allocation w/o the patch could
> not complete in half an hour (according to vmstat it would take ~10-1
> hours). with the patch applied the allocation took ~20 seconds.
>
> Signed-off-by: Alex Zhuravlev <bzzz@...mcloud.com>
> Reviewed-by: Andreas Dilger <adilger@...mcloud.com>
This looks even better to me. Feel free to add:
Reviewed-by: Ritesh Harjani <riteshh@...ux.ibm.com>
>
> fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 30d5d97548c4..f719714862b5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> return 0;
> }
>
> +static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
> + ext4_group_t group)
> +{
> + struct ext4_group_desc *desc;
> +
> + if (!ext4_has_group_desc_csum(sb))
> + return 0;
> +
> + desc = ext4_get_group_desc(sb, group, NULL);
> + if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
> + return 1;
> +
> + return 0;
> +}
> +
> /*
> * The routine scans buddy structures (not bitmap!) from given order
> * to max order and tries to find big enough chunk to satisfy the req
> @@ -2060,7 +2075,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>
> /* We only do this if the grp has never been initialized */
> if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> - int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> + int ret;
> +
> + /* cr=0/1 is a very optimistic search to find large
> + * good chunks almost for free. if buddy data is
> + * not ready, then this optimization makes no sense.
> + * instead it leads to loading (synchronously) lots
> + * of groups and very slow allocations.
> + * but don't skip the goal group to keep blocks in
> + * the inode's group. */
> +
> + if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group) &&
> + ac->ac_g_ex.fe_group != group)
> + return 0;
> + ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> if (ret)
> return ret;
> }
>
Powered by blists - more mailing lists