[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcf30c3c-25c3-4b1a-8b34-a5dcd98b7ebd@huaweicloud.com>
Date: Thu, 9 Oct 2025 15:20:59 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com,
yangerkun@...wei.com
Subject: Re: [PATCH v2 11/13] ext4: switch to using the new extent movement
method
On 10/8/2025 8:49 PM, Jan Kara wrote:
> On Thu 25-09-25 17:26:07, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Now that we have mext_move_extent(), we can switch to this new interface
>> and deprecate move_extent_per_page(). First, after acquiring the
>> i_rwsem, we can directly use ext4_map_blocks() to obtain a contiguous
>> extent from the original inode as the extent to be moved. It can and
>> it's safe to get mapping information from the extent status tree without
>> needing to access the ondisk extent tree, because ext4_move_extent()
>> will check the sequence cookie under the folio lock. Then, after
>> populating the mext_data structure, we call ext4_move_extent() to move
>> the extent. Finally, the length of the extent will be adjusted in
>> mext.orig_map.m_len and the actual length moved is returned through
>> m_len.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>
> Two small comments below:
>
>> +int ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
>> + __u64 donor_blk, __u64 len, __u64 *moved_len)
>> {
>> struct inode *orig_inode = file_inode(o_filp);
>> struct inode *donor_inode = file_inode(d_filp);
>> - struct ext4_ext_path *path = NULL;
>> - int blocks_per_page = PAGE_SIZE >> orig_inode->i_blkbits;
>> - ext4_lblk_t o_end, o_start = orig_blk;
>> - ext4_lblk_t d_start = donor_blk;
>> + struct mext_data mext;
>> + struct super_block *sb = orig_inode->i_sb;
>> + struct ext4_sb_info *sbi = EXT4_SB(sb);
>> + int retries = 0;
>> + u64 m_len;
>> int ret;
>>
>> + *moved_len = 0;
>> +
>> /* Protect orig and donor inodes against a truncate */
>> lock_two_nondirectories(orig_inode, donor_inode);
>>
>> ret = mext_check_validity(orig_inode, donor_inode);
>> if (ret)
>> - goto unlock;
>> + goto out;
>>
>> /* Wait for all existing dio workers */
>> inode_dio_wait(orig_inode);
>> inode_dio_wait(donor_inode);
>>
>> - /* Protect extent tree against block allocations via delalloc */
>> - ext4_double_down_write_data_sem(orig_inode, donor_inode);
>> /* Check and adjust the specified move_extent range. */
>> ret = mext_check_adjust_range(orig_inode, donor_inode, orig_blk,
>> donor_blk, &len);
>> if (ret)
>> goto out;
>> - o_end = o_start + len;
>>
>> - *moved_len = 0;
>> - while (o_start < o_end) {
>> - struct ext4_extent *ex;
>> - ext4_lblk_t cur_blk, next_blk;
>> - pgoff_t orig_page_index, donor_page_index;
>> - int offset_in_page;
>> - int unwritten, cur_len;
>> -
>> - path = get_ext_path(orig_inode, o_start, path);
>> - if (IS_ERR(path)) {
>> - ret = PTR_ERR(path);
>> + mext.orig_inode = orig_inode;
>> + mext.donor_inode = donor_inode;
>> + while (len) {
>> + mext.orig_map.m_lblk = orig_blk;
>> + mext.orig_map.m_len = len;
>> + mext.orig_map.m_flags = 0;
>> + mext.donor_lblk = donor_blk;
>> +
>> + ret = ext4_map_blocks(NULL, orig_inode, &mext.orig_map, 0);
>> + if (ret < 0)
>> goto out;
>> - }
>> - ex = path[path->p_depth].p_ext;
>> - cur_blk = le32_to_cpu(ex->ee_block);
>> - cur_len = ext4_ext_get_actual_len(ex);
>> - /* Check hole before the start pos */
>> - if (cur_blk + cur_len - 1 < o_start) {
>> - next_blk = ext4_ext_next_allocated_block(path);
>> - if (next_blk == EXT_MAX_BLOCKS) {
>> - ret = -ENODATA;
>> - goto out;
>> - }
>> - d_start += next_blk - o_start;
>> - o_start = next_blk;
>> - continue;
>> - /* Check hole after the start pos */
>> - } else if (cur_blk > o_start) {
>> - /* Skip hole */
>> - d_start += cur_blk - o_start;
>> - o_start = cur_blk;
>> - /* Extent inside requested range ?*/
>> - if (cur_blk >= o_end)
>> +
>> + /* Skip moving if it is a hole or a delalloc extent. */
>> + if (mext.orig_map.m_flags &
>> + (EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN)) {
>> + ret = mext_move_extent(&mext, &m_len);
>> + if (ret == -ESTALE)
>> + continue;
>> + if (ret == -ENOSPC &&
>> + ext4_should_retry_alloc(sb, &retries))
>> + continue;
>
> ENOSPC here could come only from extent tree manipulations right? I was
> wondering for a while why do we check it here :).
Yes, I think so as well.
>
>> + if (ret == -EBUSY &&
>> + sbi->s_journal && retries++ < 4 &&
>> + jbd2_journal_force_commit_nested(sbi->s_journal))
>> + continue;
>> + if (ret)
>> goto out;
>> - } else { /* in_range(o_start, o_blk, o_len) */
>> - cur_len += cur_blk - o_start;
>> +
>> + *moved_len += m_len;
>> + retries = 0;
>> }
>> - unwritten = ext4_ext_is_unwritten(ex);
>> - if (o_end - o_start < cur_len)
>> - cur_len = o_end - o_start;
>> -
>> - orig_page_index = o_start >> (PAGE_SHIFT -
>> - orig_inode->i_blkbits);
>> - donor_page_index = d_start >> (PAGE_SHIFT -
>> - donor_inode->i_blkbits);
>> - offset_in_page = o_start % blocks_per_page;
>> - if (cur_len > blocks_per_page - offset_in_page)
>> - cur_len = blocks_per_page - offset_in_page;
>> - /*
>> - * Up semaphore to avoid following problems:
>> - * a. transaction deadlock among ext4_journal_start,
>> - * ->write_begin via pagefault, and jbd2_journal_commit
>> - * b. racing with ->read_folio, ->write_begin, and
>> - * ext4_get_block in move_extent_per_page
>> - */
>> - ext4_double_up_write_data_sem(orig_inode, donor_inode);
>> - /* Swap original branches with new branches */
>> - *moved_len += move_extent_per_page(o_filp, donor_inode,
>> - orig_page_index, donor_page_index,
>> - offset_in_page, cur_len,
>> - unwritten, &ret);
>> - ext4_double_down_write_data_sem(orig_inode, donor_inode);
>> - if (ret < 0)
>> - break;
>> - o_start += cur_len;
>> - d_start += cur_len;
>> + orig_blk += mext.orig_map.m_len;
>> + donor_blk += mext.orig_map.m_len;
>> + len -= mext.orig_map.m_len;
>
> In case we've called mext_move_extent() we should update everything only by
> m_len, shouldn't we? Although I have somewhat hard time coming up with a
> realistic scenario where m_len != mext.orig_map.m_len for the parameters we
> call ext4_swap_extents() with... So maybe I'm missing something.
>
In the case of MEXT_SKIP_EXTENT, the target move range of the donor file
is a hole. In this case, the m_len is return zero after calling
mext_move_extent(), not equal to mext.orig_map.m_len, and we need to move
forward and skip this range in the next iteration in ext4_move_extents().
Otherwise, it will lead to an infinite loop.
In the other two cases, MEXT_MOVE_EXTENT and MEXT_COPY_DATA, m_len should
be equal to mext.orig_map.m_len after calling mext_move_extent().
Thanks,
Yi.
Powered by blists - more mailing lists