linux-kernel - Re: [PATCH v2 11/13] ext4: switch to using the new extent movement method

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <wdluk2p7bmgkh3n3xzep3tf3qb7mv3x2o6ltemjcahgorgmhwb@hfu7t7ar2vol>
Date: Wed, 8 Oct 2025 14:49:08 +0200
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz, 
	yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 11/13] ext4: switch to using the new extent movement
 method

On Thu 25-09-25 17:26:07, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> Now that we have mext_move_extent(), we can switch to this new interface
> and deprecate move_extent_per_page(). First, after acquiring the
> i_rwsem, we can directly use ext4_map_blocks() to obtain a contiguous
> extent from the original inode as the extent to be moved. It can and
> it's safe to get mapping information from the extent status tree without
> needing to access the ondisk extent tree, because ext4_move_extent()
> will check the sequence cookie under the folio lock. Then, after
> populating the mext_data structure, we call ext4_move_extent() to move
> the extent. Finally, the length of the extent will be adjusted in
> mext.orig_map.m_len and the actual length moved is returned through
> m_len.
> 
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>

Two small comments below:

> +int ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
> +		      __u64 donor_blk, __u64 len, __u64 *moved_len)
>  {
>  	struct inode *orig_inode = file_inode(o_filp);
>  	struct inode *donor_inode = file_inode(d_filp);
> -	struct ext4_ext_path *path = NULL;
> -	int blocks_per_page = PAGE_SIZE >> orig_inode->i_blkbits;
> -	ext4_lblk_t o_end, o_start = orig_blk;
> -	ext4_lblk_t d_start = donor_blk;
> +	struct mext_data mext;
> +	struct super_block *sb = orig_inode->i_sb;
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	int retries = 0;
> +	u64 m_len;
>  	int ret;
>  
> +	*moved_len = 0;
> +
>  	/* Protect orig and donor inodes against a truncate */
>  	lock_two_nondirectories(orig_inode, donor_inode);
>  
>  	ret = mext_check_validity(orig_inode, donor_inode);
>  	if (ret)
> -		goto unlock;
> +		goto out;
>  
>  	/* Wait for all existing dio workers */
>  	inode_dio_wait(orig_inode);
>  	inode_dio_wait(donor_inode);
>  
> -	/* Protect extent tree against block allocations via delalloc */
> -	ext4_double_down_write_data_sem(orig_inode, donor_inode);
>  	/* Check and adjust the specified move_extent range. */
>  	ret = mext_check_adjust_range(orig_inode, donor_inode, orig_blk,
>  				      donor_blk, &len);
>  	if (ret)
>  		goto out;
> -	o_end = o_start + len;
>  
> -	*moved_len = 0;
> -	while (o_start < o_end) {
> -		struct ext4_extent *ex;
> -		ext4_lblk_t cur_blk, next_blk;
> -		pgoff_t orig_page_index, donor_page_index;
> -		int offset_in_page;
> -		int unwritten, cur_len;
> -
> -		path = get_ext_path(orig_inode, o_start, path);
> -		if (IS_ERR(path)) {
> -			ret = PTR_ERR(path);
> +	mext.orig_inode = orig_inode;
> +	mext.donor_inode = donor_inode;
> +	while (len) {
> +		mext.orig_map.m_lblk = orig_blk;
> +		mext.orig_map.m_len = len;
> +		mext.orig_map.m_flags = 0;
> +		mext.donor_lblk = donor_blk;
> +
> +		ret = ext4_map_blocks(NULL, orig_inode, &mext.orig_map, 0);
> +		if (ret < 0)
>  			goto out;
> -		}
> -		ex = path[path->p_depth].p_ext;
> -		cur_blk = le32_to_cpu(ex->ee_block);
> -		cur_len = ext4_ext_get_actual_len(ex);
> -		/* Check hole before the start pos */
> -		if (cur_blk + cur_len - 1 < o_start) {
> -			next_blk = ext4_ext_next_allocated_block(path);
> -			if (next_blk == EXT_MAX_BLOCKS) {
> -				ret = -ENODATA;
> -				goto out;
> -			}
> -			d_start += next_blk - o_start;
> -			o_start = next_blk;
> -			continue;
> -		/* Check hole after the start pos */
> -		} else if (cur_blk > o_start) {
> -			/* Skip hole */
> -			d_start += cur_blk - o_start;
> -			o_start = cur_blk;
> -			/* Extent inside requested range ?*/
> -			if (cur_blk >= o_end)
> +
> +		/* Skip moving if it is a hole or a delalloc extent. */
> +		if (mext.orig_map.m_flags &
> +		    (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN)) {
> +			ret = mext_move_extent(&mext, &m_len);
> +			if (ret == -ESTALE)
> +				continue;
> +			if (ret == -ENOSPC &&
> +			    ext4_should_retry_alloc(sb, &retries))
> +				continue;

ENOSPC here could come only from extent tree manipulations right? I was
wondering for a while why do we check it here :).

> +			if (ret == -EBUSY &&
> +			    sbi->s_journal && retries++ < 4 &&
> +			    jbd2_journal_force_commit_nested(sbi->s_journal))
> +				continue;
> +			if (ret)
>  				goto out;
> -		} else { /* in_range(o_start, o_blk, o_len) */
> -			cur_len += cur_blk - o_start;
> +
> +			*moved_len += m_len;
> +			retries = 0;
>  		}
> -		unwritten = ext4_ext_is_unwritten(ex);
> -		if (o_end - o_start < cur_len)
> -			cur_len = o_end - o_start;
> -
> -		orig_page_index = o_start >> (PAGE_SHIFT -
> -					       orig_inode->i_blkbits);
> -		donor_page_index = d_start >> (PAGE_SHIFT -
> -					       donor_inode->i_blkbits);
> -		offset_in_page = o_start % blocks_per_page;
> -		if (cur_len > blocks_per_page - offset_in_page)
> -			cur_len = blocks_per_page - offset_in_page;
> -		/*
> -		 * Up semaphore to avoid following problems:
> -		 * a. transaction deadlock among ext4_journal_start,
> -		 *    ->write_begin via pagefault, and jbd2_journal_commit
> -		 * b. racing with ->read_folio, ->write_begin, and
> -		 *    ext4_get_block in move_extent_per_page
> -		 */
> -		ext4_double_up_write_data_sem(orig_inode, donor_inode);
> -		/* Swap original branches with new branches */
> -		*moved_len += move_extent_per_page(o_filp, donor_inode,
> -				     orig_page_index, donor_page_index,
> -				     offset_in_page, cur_len,
> -				     unwritten, &ret);
> -		ext4_double_down_write_data_sem(orig_inode, donor_inode);
> -		if (ret < 0)
> -			break;
> -		o_start += cur_len;
> -		d_start += cur_len;
> +		orig_blk += mext.orig_map.m_len;
> +		donor_blk += mext.orig_map.m_len;
> +		len -= mext.orig_map.m_len;

In case we've called mext_move_extent() we should update everything only by
m_len, shouldn't we? Although I have somewhat hard time coming up with a
realistic scenario where m_len != mext.orig_map.m_len for the parameters we
call ext4_swap_extents() with... So maybe I'm missing something.

								Honza


-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR