[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 16 Aug 2023 03:47:40 +0000
From: Dongyang Li <dli@....com>
To: "adilger@...ger.ca" <adilger@...ger.ca>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Shuichi Ihara <sihara@....com>,
"wangshilong1991@...il.com" <wangshilong1991@...il.com>
Subject: Re: [PATCH 1/2] ext4: introduce EXT4_BG_TRIMMED to optimize fstrim
On Mon, 2023-08-14 at 23:04 -0600, Andreas Dilger wrote:
> On Aug 11, 2023, at 12:19 AM, Li Dongyang <dongyangli@....com> wrote:
> >
> > Currently the flag indicating block group has done fstrim is not
> > persistent, and trim status will be lost after remount, as
> > a result fstrim can not skip the already trimmed groups, which
> > could be slow on very large devices.
> >
> > This patch introduces a new block group flag EXT4_BG_TRIMMED,
> > we need 1 extra block group descriptor write after trimming each
> > block group.
> > When clearing the flag, the block group descriptor is journalled
> > already so no extra overhead.
> >
> > Add a new super block flag EXT2_FLAGS_TRACK_TRIM, to indicate if
> > we should honour EXT4_BG_TRIMMED when doing fstrim.
> > The new super block flag can be turned on/off via tune2fs.
>
> Dongyang,
> I think this is not *quite* correct in the case where the TRACK_TRIM
> flag
> is not set. I agree we want the BG_TRIMMED flag to always be cleared
> in
> that case when blocks are freed in a group (this has no added cost,
> and
> will maintain correctness even if the feature is disabled).
>
> However, it doesn't look like the patch will skip *writing* the flag
> if
> the TRACK_TRIM flag is unset, which would also add needless overhead
> in
> that case. I think it is OK to set the flag in memory to maintain
> the
> same behavior as today, and writing it to disk is fine (it will be
> ignored
> anyway), but it shouldn't trigger an extra transaction.
I agree with the skip writing flag when TRACK_TRIM is not set.
IMHO I don't think we should maintain essentially the same flags in
memory if we are making the BG_TRIMMED flag persistent.
>
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 21b903fe546e..80283be01363 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -6995,10 +6993,19 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> > ext4_grpblk_t minblocks, bool set_trimmed)
> > {
> > struct ext4_buddy e4b;
> > + struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > + struct ext4_group_desc *gdp;
> > + struct buffer_head *gd_bh;
> > int ret;
> >
> > trace_ext4_trim_all_free(sb, group, start, max);
> >
> > + gdp = ext4_get_group_desc(sb, group, &gd_bh);
> > + if (!gdp) {
> > + ret = -EIO;
> > + return ret;
> > + }
> > +
> > ret = ext4_mb_load_buddy(sb, group, &e4b);
> > if (ret) {
> > ext4_warning(sb, "Error %d loading buddy
> > information for %u",
> > @@ -7008,11 +7015,10 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> >
> > ext4_lock_group(sb, group);
> >
> > - if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
> > + if (!(es->s_flags & cpu_to_le16(EXT2_FLAGS_TRACK_TRIM) &&
> > + gdp->bg_flags & cpu_to_le16(EXT4_BG_TRIMMED)) ||
> > minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
>
> I think this should still *send* the TRIM request if BG_TRIMMED is
> not
> set, regardless of whether TRACK_TRIM is set or not, it should just
> not save the flag to disk below.
If BG_TRIMMED is not set, then TRIM request will be sent regardless
already.
Checking TRACK_TRIM here also gives us the option to use it as a
switch: force fstrim everything regardless if the group has BG_TRIMMED
or not.
>
> > ret = ext4_try_to_trim_range(sb, &e4b, start, max,
> > minblocks);
> > - if (ret >= 0 && set_trimmed)
> > - EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
>
> This should clear the "set_trimmed" flag if there was an error, so
> the
> flag is not set below.
We check if ret > 0 below, should be fine here.
>
> > } else {
> > ret = 0;
> > }
> > @@ -7020,6 +7026,34 @@ ext4_trim_all_free(struct super_block *sb,
> > ext4_group_t group,
> > ext4_unlock_group(sb, group);
> > ext4_mb_unload_buddy(&e4b);
> >
> > + if (ret > 0 && set_trimmed) {
>
> Here, this should check the TRACK_TRIM flag and not force the GDT
> write
> if the feature is disabled. *Not* writing the flag to disk is fine,
> at
> worst it means that another TRIM would be sent in case of a crash,
> which
> is what happened before this patch. Only the BG_TRIMMED flag should
> be
> set in the group descriptor in that case, based on the flag saved
> above.
Got it, will update the patch.
Thanks
Dongyang
>
> Cheers, Andreas
>
> > + int err;
> > + handle_t *handle;
> > +
> > + handle = ext4_journal_start_sb(sb, EXT4_HT_FS_TRIM,
> > 1);
> > + if (IS_ERR(handle)) {
> > + ret = PTR_ERR(handle);
> > + goto out_return;
> > + }
> > + err = ext4_journal_get_write_access(handle, sb,
> > gd_bh,
> > + EXT4_JTR_NONE);
> > + if (err) {
> > + ret = err;
> > + goto out_journal;
> > + }
> > + ext4_lock_group(sb, group);
> > + gdp->bg_flags |= cpu_to_le16(EXT4_BG_TRIMMED);
> > + ext4_group_desc_csum_set(sb, group, gdp);
> > + ext4_unlock_group(sb, group);
> > + err = ext4_handle_dirty_metadata(handle, NULL,
> > gd_bh);
> > + if (err)
> > + ret = err;
> > +out_journal:
> > + err = ext4_journal_stop(handle);
> > + if (err)
> > + ret = err;
> > + }
> > +out_return:
> > ext4_debug("trimmed %d blocks in the group %d\n",
> > ret, group);
> >
> > --
> > 2.41.0
> >
>
>
> Cheers, Andreas
>
>
>
>
>
Powered by blists - more mailing lists