lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240529012003.4006535-8-harshadshirwadkar@gmail.com>
Date: Wed, 29 May 2024 01:20:00 +0000
From: Harshad Shirwadkar <harshadshirwadkar@...il.com>
To: linux-ext4@...r.kernel.org
Cc: tytso@....edu,
	saukad@...gle.com,
	harshads@...gle.com,
	Harshad Shirwadkar <harshadshirwadkar@...il.com>
Subject: [PATCH v6 07/10] ext4: add nolock mode to ext4_map_blocks()

Add nolock flag to ext4_map_blocks() which skips grabbing
i_data_sem in ext4_map_blocks. In FC commit path, we first
mark the inode as committing and thereby prevent any mutations
on it. Thus, it should be safe to call ext4_map_blocks()
without i_data_sem in this case. This is a workaround to
the problem mentioned in RFC V4 version cover letter[1] of this
patch series which pointed out that there is in incosistency between
ext4_map_blocks() behavior when EXT4_GET_BLOCKS_CACHED_NOWAIT is
passed. This patch gets rid of the need to call ext4_map_blocks()
with EXT4_GET_BLOCKS_CACHED_NOWAIT and instead call it with
EXT4_GET_BLOCKS_NOLOCK. I verified that generic/311 which failed
in cached_nowait mode passes with nolock mode.

[1] https://lwn.net/Articles/902022/

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
---
 fs/ext4/ext4.h        |  1 +
 fs/ext4/fast_commit.c | 16 ++++++++--------
 fs/ext4/inode.c       | 14 ++++++++++++--
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index d802040e94df..196c513f82dd 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -720,6 +720,7 @@ enum {
 #define EXT4_GET_BLOCKS_IO_SUBMIT		0x0400
 	/* Caller is in the atomic contex, find extent if it has been cached */
 #define EXT4_GET_BLOCKS_CACHED_NOWAIT		0x0800
+#define EXT4_GET_BLOCKS_NOLOCK			0x1000
 
 /*
  * The bit position of these flags must not overlap with any of the
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index b81b0292aa59..0b7064f8dfa5 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -559,13 +559,6 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 		!list_empty(&ei->i_fc_list))
 		return;
 
-	/*
-	 * If we come here, we may sleep while waiting for the inode to
-	 * commit. We shouldn't be holding i_data_sem in write mode when we go
-	 * to sleep since the commit path needs to grab the lock while
-	 * committing the inode.
-	 */
-	WARN_ON(lockdep_is_held_type(&ei->i_data_sem, 1));
 
 	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
 #if (BITS_PER_LONG < 64)
@@ -898,7 +891,14 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 	while (cur_lblk_off <= new_blk_size) {
 		map.m_lblk = cur_lblk_off;
 		map.m_len = new_blk_size - cur_lblk_off + 1;
-		ret = ext4_map_blocks(NULL, inode, &map, 0);
+		/*
+		 * Given that this inode is being committed,
+		 * EXT4_STATE_FC_COMMITTING is already set on this inode.
+		 * Which means all the mutations on the inode are paused
+		 * until the commit operation is complete. Thus it is safe
+		 * call ext4_map_blocks() in no lock mode.
+		 */
+		ret = ext4_map_blocks(NULL, inode, &map, EXT4_GET_BLOCKS_NOLOCK);
 		if (ret < 0)
 			return -ECANCELED;
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 61ffbdc2fb16..f00408017c7a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -546,7 +546,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * Try to see if we can get the block without requesting a new
 	 * file system block.
 	 */
-	down_read(&EXT4_I(inode)->i_data_sem);
+	if (!(flags & EXT4_GET_BLOCKS_NOLOCK))
+		down_read(&EXT4_I(inode)->i_data_sem);
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
 		retval = ext4_ext_map_blocks(handle, inode, map, 0);
 	} else {
@@ -573,7 +574,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
 				      map->m_pblk, status);
 	}
-	up_read((&EXT4_I(inode)->i_data_sem));
+	/*
+	 * We should never call ext4_map_blocks() in nolock mode outside
+	 * of fast commit path.
+	 */
+	WARN_ON((flags & EXT4_GET_BLOCKS_NOLOCK) &&
+		!ext4_test_inode_state(inode,
+				       EXT4_STATE_FC_COMMITTING));
+	if (!(flags & EXT4_GET_BLOCKS_NOLOCK))
+		up_read((&EXT4_I(inode)->i_data_sem));
 
 found:
 	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
@@ -614,6 +623,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * the write lock of i_data_sem, and call get_block()
 	 * with create == 1 flag.
 	 */
+	WARN_ON((flags & EXT4_GET_BLOCKS_NOLOCK) != 0);
 	down_write(&EXT4_I(inode)->i_data_sem);
 
 	/*
-- 
2.45.1.288.g0e0cd299f1-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ