linux-kernel - Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <656f0cf0-1092-84eb-e934-534102c2fa54@huaweicloud.com>
Date:   Mon, 4 Sep 2023 11:00:21 +0800
From:   Kemeng Shi <shikemeng@...weicloud.com>
To:     Ritesh Harjani <ritesh.list@...il.com>, tytso@....edu,
        adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap
 freeing in ext4_mb_clear_bb()



on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@...weicloud.com> writes:
> 
>> This patch separates block bitmap and buddy bitmap freeing in order to
>> udpate block bitmap with ext4_mb_mark_context in following patch.
> ^^^ update.
> 
>>
>> Separated freeing is safe with concurrent allocation as long as:
>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>> Then freed block will only be available to allocation when both buddy
>> bitmap and block bitmap are updated by freeing.
>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
> 
> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
> bitmap right. Continue below...
> 
>>
>> Separated freeing has no race with generate_buddy as:
>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>> initialization of the buddy page will be executed concurrently until
>> buddy page is unloaded. As we always do free in "load buddy, free,
>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@...weicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e650eac22237..01ad36a1cc96 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  	if (err)
>>  		goto error_return;
> 
> 
> Let me add the a piece of code before the added changes and continue...
> 
> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
> 				     GFP_NOFS|__GFP_NOFAIL);
> 	if (err)
> 		goto error_return;
>>  
>> +	ext4_lock_group(sb, block_group);
>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>> +	ext4_unlock_group(sb, block_group);
>> +
> 
> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
> freeing on-disk bitmap blocks? Can you explain why please?
> At least it is not very clear to me that why do we need
> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
> needed then I think we should separate out bitmap freeing logic and
> buddy freeing logic even further.
Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
it before bit clearing for catching error and aborting mark early
to reduce functional change.
> 
> Thoughts?
> 
>>  	/*
>>  	 * We need to make sure we don't reuse the freed block until after the
>>  	 * transaction is committed. We make an exception if the inode is to be
>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>  	} else {
>> -		/* need to update group_info->bb_free and bitmap
>> -		 * with group lock held. generate_buddy look at
>> -		 * them with group lock_held
>> -		 */
>>  		if (test_opt(sb, DISCARD)) {
>>  			err = ext4_issue_discard(sb, block_group, bit,
>>  						 count_clusters, NULL);
>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>  	}
>>  
>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>  	ext4_unlock_group(sb, block_group);
>>  
>>  	if (sbi->s_log_groups_per_flex) {
> 
> 
> Adding piece of code here...
> 
> 	if (sbi->s_log_groups_per_flex) {
> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> 		atomic64_add(count_clusters,
> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> 						  flex_group)->free_clusters);
> 	}
> 
> <...>
> 
> 	/* We dirtied the bitmap block */
> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> 	/* And the group descriptor block */
> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> 	if (!err)
> 		err = ret;
> 
> I was thinking even this can go around bitmap freeing logic above. So
> the next patch becomes very clear that all of the bitmap freeing logic
> is just simply moved into ext4_mb_mark_context() function.
> 
> -ritesh
>