lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251213022008.1766912-3-yi.zhang@huaweicloud.com>
Date: Sat, 13 Dec 2025 10:20:03 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: linux-ext4@...r.kernel.org
Cc: linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	tytso@....edu,
	adilger.kernel@...ger.ca,
	jack@...e.cz,
	ojaswin@...ux.ibm.com,
	ritesh.list@...il.com,
	yi.zhang@...wei.com,
	yi.zhang@...weicloud.com,
	yizhang089@...il.com,
	libaokun1@...wei.com,
	yangerkun@...wei.com,
	yukuai@...as.com
Subject: [PATCH -next 2/7] ext4: don't split extent before submitting I/O

From: Zhang Yi <yi.zhang@...wei.com>

Currently, when writing back dirty pages to the filesystem with the
dioread_nolock feature enabled and when doing DIO, if the area to be
written back is part of an unwritten extent, the
EXT4_GET_BLOCKS_IO_CREATE_EXT flag is set during block allocation before
submitting I/O. The function ext4_split_convert_extents() then attempts
to split this extent in advance. This approach is designed to prevents
extent splitting and conversion to the written type from failing due to
insufficient disk space at the time of I/O completion, which could
otherwise result in data loss.

However, we already have two mechanisms to ensure successful extent
conversion. The first is the EXT4_GET_BLOCKS_METADATA_NOFAIL flag, which
is a best effort, it permits the use of 2% of the reserved space or
4,096 blocks in the file system when splitting extents. This flag covers
most scenarios where extent splitting might fail. The second is the
EXT4_EXT_MAY_ZEROOUT flag, which is also set during extent splitting. If
the reserved space is insufficient and splitting fails, it does not
retry the allocation. Instead, it directly zeros out the extra part of
the extent, thereby avoiding splitting and directly converting the
entire extent to the written type.

These two mechanisms also exist when I/Os are completed because there is
a concurrency window between write-back and fallocate, which may still
require us to split extents upon I/O completion. There is no much
difference between splitting extents before submitting I/O. Therefore,
It seems possible to defer the splitting until I/O completion, it won't
increase the risk of I/O failure and data loss. On the contrary, if some
I/Os can be merged when I/O completion, it can also reduce unnecessary
splitting operations, thereby alleviating the pressure on reserved
space.

In addition, deferring extent splitting until I/O completion can
also simplify the IO submission process and avoid initiating unnecessary
journal handles when writing unwritten extents.

Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
---
 fs/ext4/extents.c | 13 +------------
 fs/ext4/inode.c   |  4 ++--
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e53959120b04..c98f7c5482b4 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3787,21 +3787,10 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode,
 	ext_debug(inode, "logical block %llu, max_blocks %u\n",
 		  (unsigned long long)ee_block, ee_len);
 
-	/* If extent is larger than requested it is a clear sign that we still
-	 * have some extent state machine issues left. So extent_split is still
-	 * required.
-	 * TODO: Once all related issues will be fixed this situation should be
-	 * illegal.
-	 */
 	if (ee_block != map->m_lblk || ee_len > map->m_len) {
 		int flags = EXT4_GET_BLOCKS_CONVERT |
 			    EXT4_GET_BLOCKS_METADATA_NOFAIL;
-#ifdef CONFIG_EXT4_DEBUG
-		ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu,"
-			     " len %u; IO logical block %llu, len %u",
-			     inode->i_ino, (unsigned long long)ee_block, ee_len,
-			     (unsigned long long)map->m_lblk, map->m_len);
-#endif
+
 		path = ext4_split_convert_extents(handle, inode, map, path,
 						  flags, NULL);
 		if (IS_ERR(path))
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index bb8165582840..ffde24ff7347 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2376,7 +2376,7 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 
 	dioread_nolock = ext4_should_dioread_nolock(inode);
 	if (dioread_nolock)
-		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
+		get_blocks_flags |= EXT4_GET_BLOCKS_UNWRIT_EXT;
 
 	err = ext4_map_blocks(handle, inode, map, get_blocks_flags);
 	if (err < 0)
@@ -3744,7 +3744,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
 	else if (EXT4_LBLK_TO_B(inode, map->m_lblk) >= i_size_read(inode))
 		m_flags = EXT4_GET_BLOCKS_CREATE;
 	else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
-		m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
+		m_flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
 
 	if (flags & IOMAP_ATOMIC)
 		ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags,
-- 
2.46.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ