linux-kernel - Re: [PATCH v2 11/13] ext4: switch to using the new extent movement method

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5g66nxbf3ay2bryv4legk46pudqonsbrdkxr5ljegbxaydkctk@2dyyoxguxyxu>
Date: Thu, 9 Oct 2025 11:14:51 +0200
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu, 
	adilger.kernel@...ger.ca, yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com, 
	yangerkun@...wei.com
Subject: Re: [PATCH v2 11/13] ext4: switch to using the new extent movement
 method

On Thu 09-10-25 15:20:59, Zhang Yi wrote:
> On 10/8/2025 8:49 PM, Jan Kara wrote:
> > On Thu 25-09-25 17:26:07, Zhang Yi wrote:
> >> +			if (ret == -EBUSY &&
> >> +			    sbi->s_journal && retries++ < 4 &&
> >> +			    jbd2_journal_force_commit_nested(sbi->s_journal))
> >> +				continue;
> >> +			if (ret)
> >>  				goto out;
> >> -		} else { /* in_range(o_start, o_blk, o_len) */
> >> -			cur_len += cur_blk - o_start;
> >> +
> >> +			*moved_len += m_len;
> >> +			retries = 0;
> >>  		}
> >> -		unwritten = ext4_ext_is_unwritten(ex);
> >> -		if (o_end - o_start < cur_len)
> >> -			cur_len = o_end - o_start;
> >> -
> >> -		orig_page_index = o_start >> (PAGE_SHIFT -
> >> -					       orig_inode->i_blkbits);
> >> -		donor_page_index = d_start >> (PAGE_SHIFT -
> >> -					       donor_inode->i_blkbits);
> >> -		offset_in_page = o_start % blocks_per_page;
> >> -		if (cur_len > blocks_per_page - offset_in_page)
> >> -			cur_len = blocks_per_page - offset_in_page;
> >> -		/*
> >> -		 * Up semaphore to avoid following problems:
> >> -		 * a. transaction deadlock among ext4_journal_start,
> >> -		 *    ->write_begin via pagefault, and jbd2_journal_commit
> >> -		 * b. racing with ->read_folio, ->write_begin, and
> >> -		 *    ext4_get_block in move_extent_per_page
> >> -		 */
> >> -		ext4_double_up_write_data_sem(orig_inode, donor_inode);
> >> -		/* Swap original branches with new branches */
> >> -		*moved_len += move_extent_per_page(o_filp, donor_inode,
> >> -				     orig_page_index, donor_page_index,
> >> -				     offset_in_page, cur_len,
> >> -				     unwritten, &ret);
> >> -		ext4_double_down_write_data_sem(orig_inode, donor_inode);
> >> -		if (ret < 0)
> >> -			break;
> >> -		o_start += cur_len;
> >> -		d_start += cur_len;
> >> +		orig_blk += mext.orig_map.m_len;
> >> +		donor_blk += mext.orig_map.m_len;
> >> +		len -= mext.orig_map.m_len;
> > 
> > In case we've called mext_move_extent() we should update everything only by
> > m_len, shouldn't we? Although I have somewhat hard time coming up with a
> > realistic scenario where m_len != mext.orig_map.m_len for the parameters we
> > call ext4_swap_extents() with... So maybe I'm missing something.
> 
> In the case of MEXT_SKIP_EXTENT, the target move range of the donor file
> is a hole. In this case, the m_len is return zero after calling
> mext_move_extent(), not equal to mext.orig_map.m_len, and we need to move
> forward and skip this range in the next iteration in ext4_move_extents().
> Otherwise, it will lead to an infinite loop.

Right, that would be a problem. I thought this shouldn't happen because we
call mext_move_extent() only if we have mapped or unwritten extent but if
donor inode has a hole in the same place, MEXT_SKIP_EXTENT can still
happen.

> In the other two cases, MEXT_MOVE_EXTENT and MEXT_COPY_DATA, m_len should
> be equal to mext.orig_map.m_len after calling mext_move_extent().

So this is the bit which isn't 100% clear to me. Because what looks fishy
to me is that ext4_swap_extents() can fail after swapping part of the
passed range (e.g. due to extent split failure). In that case we'll return
number smaller than mext.orig_map.m_len. Now that I'm looking again, we'll
set *erp in all those cases (there are cases where ext4_swap_extents()
returns smaller number even without setting *erp but I don't think those
can happen given the locks we hold and what we've already verified - still
it would be good to add an assert for this in mext_move_extent()) so the
problem would rather be that we don't advance by m_len in case of error
returned from mext_move_extent()?

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR