[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <021b2564-4f1d-4dd7-b98c-569668c8359a@huaweicloud.com>
Date: Sat, 11 Oct 2025 09:20:33 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com,
yangerkun@...wei.com
Subject: Re: [PATCH v3 09/12] ext4: introduce mext_move_extent()
On 10/10/2025 9:38 PM, Jan Kara wrote:
> On Fri 10-10-25 18:33:23, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> When moving extents, the current move_extent_per_page() process can only
>> move extents of length PAGE_SIZE at a time, which is highly inefficient,
>> especially when the fragmentation of the file is not particularly
>> severe, this will result in a large number of unnecessary extent split
>> and merge operations. Moreover, since the ext4 file system now supports
>> large folios, using PAGE_SIZE as the processing unit is no longer
>> practical.
>>
>> Therefore, introduce a new move extents method, mext_move_extent(). It
>> moves one extent of the origin inode at a time, but not exceeding the
>> size of a folio. The parameters for the move are passed through the new
>> mext_data data structure, which includes the origin inode, donor inode,
>> the mapping extent of the origin inode to be moved, and the starting
>> offset of the donor inode.
>>
>> The move process is similar to move_extent_per_page() and can be
>> categorized into three types: MEXT_SKIP_EXTENT, MEXT_MOVE_EXTENT, and
>> MEXT_COPY_DATA. MEXT_SKIP_EXTENT indicates that the corresponding area
>> of the donor file is a hole, meaning no actual space is allocated, so
>> the move is skipped. MEXT_MOVE_EXTENT indicates that the corresponding
>> areas of both the origin and donor files are unwritten, so no data needs
>> to be copied; only the extents are swapped. MEXT_COPY_DATA indicates
>> that the corresponding areas of both the origin and donor files contain
>> data, so data must be copied. The data copying is performed in three
>> steps: first, the data from the original location is read into the page
>> cache; then, the extents are swapped, and the page cache is rebuilt to
>> reflect the index of the physical blocks; finally, the dirty page cache
>> is marked and written back to ensure that the data is written to disk
>> before the metadata is persisted.
>>
>> One important point to note is that the folio lock and i_data_sem are
>> held only during the moving process. Therefore, before moving an extent,
>> it is necessary to check whether the sequence cookie of the area to be
>> moved has changed while holding the folio lock. If a change is detected,
>> it indicates that concurrent write-back operations may have occurred
>> during this period, and the type of the extent to be moved can no longer
>> be considered reliable. For example, it may have changed from unwritten
>> to written. In such cases, return -ESTALE, and the calling function
>> should reacquire the move extent of the original file and retry the
>> movement.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>
> ...
>
>> +static __used int mext_move_extent(struct mext_data *mext, u64 *m_len)
>> +{
>> + struct inode *orig_inode = mext->orig_inode;
>> + struct inode *donor_inode = mext->donor_inode;
>> + struct ext4_map_blocks *orig_map = &mext->orig_map;
>> + unsigned int blkbits = orig_inode->i_blkbits;
>> + struct folio *folio[2] = {NULL, NULL};
>> + loff_t from, length;
>> + enum mext_move_type move_type = 0;
>> + handle_t *handle;
>> + u64 r_len = 0;
>> + unsigned int credits;
>> + int ret, ret2;
>> +
>> + *m_len = 0;
>> + credits = ext4_chunk_trans_extent(orig_inode, 0) * 2;
>> + handle = ext4_journal_start(orig_inode, EXT4_HT_MOVE_EXTENTS, credits);
>> + if (IS_ERR(handle))
>> + return PTR_ERR(handle);
>> +
>> + ret = mext_move_begin(mext, folio, &move_type);
>> + if (ret)
>> + goto stop_handle;
>> +
>> + if (move_type == MEXT_SKIP_EXTENT)
>> + goto unlock;
>> +
>> + /*
>> + * Copy the data. First, read the original inode data into the page
>> + * cache. Then, release the existing mapping relationships and swap
>> + * the extent. Finally, re-establish the new mapping relationships
>> + * and dirty the page cache.
>> + */
>> + if (move_type == MEXT_COPY_DATA) {
>> + from = offset_in_folio(folio[0],
>> + ((loff_t)orig_map->m_lblk) << blkbits);
>> + length = ((loff_t)orig_map->m_len) << blkbits;
>> +
>> + ret = mext_folio_mkuptodate(folio[0], from, from + length);
>> + if (ret)
>> + goto unlock;
>> + }
>> +
>> + if (!filemap_release_folio(folio[0], 0) ||
>> + !filemap_release_folio(folio[1], 0)) {
>> + ret = -EBUSY;
>> + goto unlock;
>> + }
>> +
>> + /* Move extent */
>> + ext4_double_down_write_data_sem(orig_inode, donor_inode);
>> + *m_len = ext4_swap_extents(handle, orig_inode, donor_inode,
>> + orig_map->m_lblk, mext->donor_lblk,
>> + orig_map->m_len, 1, &ret);
>> + ext4_double_up_write_data_sem(orig_inode, donor_inode);
>> +
>> + /* A short-length swap cannot occur after a successful swap extent. */
>> + if (WARN_ON_ONCE(!ret && (*m_len != orig_map->m_len)))
>> + ret = -EIO;
>> +
>> + if (!(*m_len) || (move_type == MEXT_MOVE_EXTENT))
>> + goto unlock;
>> +
>> + /* Copy data */
>> + length = (*m_len) << blkbits;
>> + ret = mext_folio_mkwrite(orig_inode, folio[0], from, from + length);
>> + if (ret)
>> + goto repair_branches;
>
> I think you need to be careful here and below to not overwrite 'ret' if it
> is != 0. So something like:
>
> ret2 = mext_folio_mkwrite(..)
> if (ret2) {
> if (!ret)
> ret = ret2;
> goto repair_branches;
> }
>
> and something similar below. Otherwise the patch looks good to me.
>
> Honza
OK, although overwrite 'ret' seems fine, it's better to keep it.
Thanks,
Yi.
Powered by blists - more mailing lists