[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fxg5kszehzzaw5zbj6ptkxujzslxmudk3izentavxlkarm5mw@3yissfw5dru7>
Date: Wed, 8 Oct 2025 14:16:04 +0200
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 10/13] ext4: introduce mext_move_extent()
On Thu 25-09-25 17:26:06, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> When moving extents, the current move_extent_per_page() process can only
> move extents of length PAGE_SIZE at a time, which is highly inefficient,
> especially when the fragmentation of the file is not particularly
> severe, this will result in a large number of unnecessary extent split
> and merge operations. Moreover, since the ext4 file system now supports
> large folios, using PAGE_SIZE as the processing unit is no longer
> practical.
>
> Therefore, introduce a new move extents method, mext_move_extent(). It
> moves one extent of the origin inode at a time, but not exceeding the
> size of a folio. The parameters for the move are passed through the new
> mext_data data structure, which includes the origin inode, donor inode,
> the mapping extent of the origin inode to be moved, and the starting
> offset of the donor inode.
>
> The move process is similar to move_extent_per_page() and can be
> categorized into three types: MEXT_SKIP_EXTENT, MEXT_MOVE_EXTENT, and
> MEXT_COPY_DATA. MEXT_SKIP_EXTENT indicates that the corresponding area
> of the donor file is a hole, meaning no actual space is allocated, so
> the move is skipped. MEXT_MOVE_EXTENT indicates that the corresponding
> areas of both the origin and donor files are unwritten, so no data needs
> to be copied; only the extents are swapped. MEXT_COPY_DATA indicates
> that the corresponding areas of both the origin and donor files contain
> data, so data must be copied. The data copying is performed in three
> steps: first, the data from the original location is read into the page
> cache; then, the extents are swapped, and the page cache is rebuilt to
> reflect the index of the physical blocks; finally, the dirty page cache
> is marked and written back to ensure that the data is written to disk
> before the metadata is persisted.
>
> One important point to note is that the folio lock and i_data_sem are
> held only during the moving process. Therefore, before moving an extent,
> it is necessary to check whether the sequence cookie of the area to be
> moved has changed while holding the folio lock. If a change is detected,
> it indicates that concurrent write-back operations may have occurred
> during this period, and the type of the extent to be moved can no longer
> be considered reliable. For example, it may have changed from unwritten
> to written. In such cases, return -ESTALE, and the calling function
> should reacquire the move extent of the original file and retry the
> movement.
>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
Nice, just one nit below. Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
> +static int mext_move_begin(struct mext_data *mext, struct folio *folio[2],
> + enum mext_move_type *move_type)
> +{
> + struct inode *orig_inode = mext->orig_inode;
> + struct inode *donor_inode = mext->donor_inode;
> + unsigned int blkbits = orig_inode->i_blkbits;
> + struct ext4_map_blocks donor_map = {0};
> + loff_t orig_pos, donor_pos;
> + size_t move_len;
> + int ret;
> +
> + orig_pos = ((loff_t)mext->orig_map.m_lblk) << blkbits;
> + donor_pos = ((loff_t)mext->donor_lblk) << blkbits;
> + ret = mext_folio_double_lock(orig_inode, donor_inode,
> + orig_pos >> PAGE_SHIFT, donor_pos >> PAGE_SHIFT, folio);
> + if (ret)
> + return ret;
> +
> + /*
> + * Check the origin inode's mapping information again under the
> + * folio lock, as we do not hold the i_data_sem at all times, and
> + * it may change during the concurrent write-back operation.
> + */
> + if (mext->orig_map.m_seq != READ_ONCE(EXT4_I(orig_inode)->i_es_seq)) {
> + ret = -ESTALE;
> + goto error;
> + }
> +
> + /* Adjust the moving length according to the minor folios length. */
^^^ ... the length of shorter folio
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists