[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7vuttijv2pqx2lgan5rkcw6ofi4uhrsfbmksg4doyq34rjidte@mnfd6cbehncq>
Date: Fri, 19 Dec 2025 16:17:59 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
ojaswin@...ux.ibm.com, ritesh.list@...il.com, yi.zhang@...wei.com, yizhang089@...il.com,
libaokun1@...wei.com, yangerkun@...wei.com, yukuai@...as.com
Subject: Re: [PATCH -next 2/7] ext4: don't split extent before submitting I/O
On Sat 13-12-25 10:20:03, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> Currently, when writing back dirty pages to the filesystem with the
> dioread_nolock feature enabled and when doing DIO, if the area to be
> written back is part of an unwritten extent, the
> EXT4_GET_BLOCKS_IO_CREATE_EXT flag is set during block allocation before
> submitting I/O. The function ext4_split_convert_extents() then attempts
> to split this extent in advance. This approach is designed to prevents
> extent splitting and conversion to the written type from failing due to
> insufficient disk space at the time of I/O completion, which could
> otherwise result in data loss.
>
> However, we already have two mechanisms to ensure successful extent
> conversion. The first is the EXT4_GET_BLOCKS_METADATA_NOFAIL flag, which
> is a best effort, it permits the use of 2% of the reserved space or
> 4,096 blocks in the file system when splitting extents. This flag covers
> most scenarios where extent splitting might fail. The second is the
> EXT4_EXT_MAY_ZEROOUT flag, which is also set during extent splitting. If
> the reserved space is insufficient and splitting fails, it does not
> retry the allocation. Instead, it directly zeros out the extra part of
> the extent, thereby avoiding splitting and directly converting the
> entire extent to the written type.
>
> These two mechanisms also exist when I/Os are completed because there is
> a concurrency window between write-back and fallocate, which may still
> require us to split extents upon I/O completion. There is no much
> difference between splitting extents before submitting I/O. Therefore,
> It seems possible to defer the splitting until I/O completion, it won't
> increase the risk of I/O failure and data loss. On the contrary, if some
> I/Os can be merged when I/O completion, it can also reduce unnecessary
> splitting operations, thereby alleviating the pressure on reserved
> space.
>
> In addition, deferring extent splitting until I/O completion can
> also simplify the IO submission process and avoid initiating unnecessary
> journal handles when writing unwritten extents.
>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
> ---
> fs/ext4/extents.c | 13 +------------
> fs/ext4/inode.c | 4 ++--
> 2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e53959120b04..c98f7c5482b4 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3787,21 +3787,10 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
> ext_debug(inode, "logical block %llu, max_blocks %u\n",
> (unsigned long long)ee_block, ee_len);
>
> - /* If extent is larger than requested it is a clear sign that we still
> - * have some extent state machine issues left. So extent_split is still
> - * required.
> - * TODO: Once all related issues will be fixed this situation should be
> - * illegal.
> - */
> if (ee_block != map->m_lblk || ee_len > map->m_len) {
> int flags = EXT4_GET_BLOCKS_CONVERT |
> EXT4_GET_BLOCKS_METADATA_NOFAIL;
> -#ifdef CONFIG_EXT4_DEBUG
> - ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
> - " len %u; IO logical block %llu, len %u",
> - inode->i_ino, (unsigned long long)ee_block, ee_len,
> - (unsigned long long)map->m_lblk, map->m_len);
> -#endif
> +
> path = ext4_split_convert_extents(handle, inode, map, path,
> flags, NULL);
> if (IS_ERR(path))
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index bb8165582840..ffde24ff7347 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2376,7 +2376,7 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
>
> dioread_nolock = ext4_should_dioread_nolock(inode);
> if (dioread_nolock)
> - get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
> + get_blocks_flags |= EXT4_GET_BLOCKS_UNWRIT_EXT;
>
> err = ext4_map_blocks(handle, inode, map, get_blocks_flags);
> if (err < 0)
> @@ -3744,7 +3744,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> else if (EXT4_LBLK_TO_B(inode, map->m_lblk) >= i_size_read(inode))
> m_flags = EXT4_GET_BLOCKS_CREATE;
> else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> - m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
> + m_flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
>
> if (flags & IOMAP_ATOMIC)
> ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags,
> --
> 2.46.1
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists