lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EA9B393E-D7BF-4D47-96EA-661EB96E9F14@dilger.ca>
Date:   Mon, 14 Aug 2023 23:04:14 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Li Dongyang <dongyangli@....com>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Shuichi Ihara <sihara@....com>,
        Wang Shilong <wangshilong1991@...il.com>
Subject: Re: [PATCH 1/2] ext4: introduce EXT4_BG_TRIMMED to optimize fstrim

On Aug 11, 2023, at 12:19 AM, Li Dongyang <dongyangli@....com> wrote:
> 
> Currently the flag indicating block group has done fstrim is not
> persistent, and trim status will be lost after remount, as
> a result fstrim can not skip the already trimmed groups, which
> could be slow on very large devices.
> 
> This patch introduces a new block group flag EXT4_BG_TRIMMED,
> we need 1 extra block group descriptor write after trimming each
> block group.
> When clearing the flag, the block group descriptor is journalled
> already so no extra overhead.
> 
> Add a new super block flag EXT2_FLAGS_TRACK_TRIM, to indicate if
> we should honour EXT4_BG_TRIMMED when doing fstrim.
> The new super block flag can be turned on/off via tune2fs.

Dongyang,
I think this is not *quite* correct in the case where the TRACK_TRIM flag
is not set.  I agree we want the BG_TRIMMED flag to always be cleared in
that case when blocks are freed in a group (this has no added cost, and
will maintain correctness even if the feature is disabled).

However, it doesn't look like the patch will skip *writing* the flag if
the TRACK_TRIM flag is unset, which would also add needless overhead in
that case.  I think it is OK to set the flag in memory to maintain the
same behavior as today, and writing it to disk is fine (it will be ignored
anyway), but it shouldn't trigger an extra transaction.

> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 21b903fe546e..80283be01363 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6995,10 +6993,19 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 		   ext4_grpblk_t minblocks, bool set_trimmed)
> {
> 	struct ext4_buddy e4b;
> +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> +	struct ext4_group_desc *gdp;
> +	struct buffer_head *gd_bh;
> 	int ret;
> 
> 	trace_ext4_trim_all_free(sb, group, start, max);
> 
> +	gdp = ext4_get_group_desc(sb, group, &gd_bh);
> +	if (!gdp) {
> +		ret = -EIO;
> +		return ret;
> +	}
> +
> 	ret = ext4_mb_load_buddy(sb, group, &e4b);
> 	if (ret) {
> 		ext4_warning(sb, "Error %d loading buddy information for %u",
> @@ -7008,11 +7015,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 
> 	ext4_lock_group(sb, group);
> 
> -	if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
> +	if (!(es->s_flags & cpu_to_le16(EXT2_FLAGS_TRACK_TRIM) &&
> +	      gdp->bg_flags & cpu_to_le16(EXT4_BG_TRIMMED)) ||
> 	    minblocks < EXT4_SB(sb)->s_last_trim_minblks) {

I think this should still *send* the TRIM request if BG_TRIMMED is not
set, regardless of whether TRACK_TRIM is set or not, it should just
not save the flag to disk below.

> 		ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
> -		if (ret >= 0 && set_trimmed)
> -			EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);

This should clear the "set_trimmed" flag if there was an error, so the
flag is not set below.

> 	} else {
> 		ret = 0;
> 	}
> @@ -7020,6 +7026,34 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> 	ext4_unlock_group(sb, group);
> 	ext4_mb_unload_buddy(&e4b);
> 
> +	if (ret > 0 && set_trimmed) {

Here, this should check the TRACK_TRIM flag and not force the GDT write
if the feature is disabled.  *Not* writing the flag to disk is fine, at
worst it means that another TRIM would be sent in case of a crash, which
is what happened before this patch.  Only the BG_TRIMMED flag should be
set in the group descriptor in that case, based on the flag saved above.

Cheers, Andreas

> +		int err;
> +		handle_t *handle;
> +
> +		handle = ext4_journal_start_sb(sb, EXT4_HT_FS_TRIM, 1);
> +		if (IS_ERR(handle)) {
> +			ret = PTR_ERR(handle);
> +			goto out_return;
> +		}
> +		err = ext4_journal_get_write_access(handle, sb, gd_bh,
> +						    EXT4_JTR_NONE);
> +		if (err) {
> +			ret = err;
> +			goto out_journal;
> +		}
> +		ext4_lock_group(sb, group);
> +		gdp->bg_flags |= cpu_to_le16(EXT4_BG_TRIMMED);
> +		ext4_group_desc_csum_set(sb, group, gdp);
> +		ext4_unlock_group(sb, group);
> +		err = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> +		if (err)
> +			ret = err;
> +out_journal:
> +		err = ext4_journal_stop(handle);
> +		if (err)
> +			ret = err;
> +	}
> +out_return:
> 	ext4_debug("trimmed %d blocks in the group %d\n",
> 		ret, group);
> 
> --
> 2.41.0
> 


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ