lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 May 2024 12:16:18 +0100
From: "Luis Henriques (SUSE)" <luis.henriques@...ux.dev>
To: Theodore Ts'o <tytso@....edu>,
	Andreas Dilger <adilger@...ger.ca>,
	Jan Kara <jack@...e.cz>,
	Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc: linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	"Luis Henriques (SUSE)" <luis.henriques@...ux.dev>
Subject: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit

When a full journal commit is on-going, any fast commit has to be enqueued
into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
is done only once, i.e. if an inode is already queued in a previous fast
commit entry it won't be enqueued again.  However, if a full commit starts
_after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
be done into FC_Q_STAGING.  And this is not being done in function
ext4_fc_track_template().

This patch fixes the issue by flagging an inode that is already enqueued in
either queues.  Later, during the fast commit clean-up callback, if the
inode has a tid that is bigger than the one being handled, that inode is
re-enqueued into STAGING and the spliced back into MAIN.

This bug was found using fstest generic/047.  This test creates several 32k
bytes files, sync'ing each of them after it's creation, and then shutting
down the filesystem.  Some data may be loss in this operation; for example a
file may have it's size truncated to zero.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@...ux.dev>
---
Hi!

(Now Cc'ing Harshad, as I should have done in the initial RFC.)

This v2 is a complete different solution, hinted by Jan Kara.  I hope my
understanding of his suggestion is correct.  Also, I've dropped the second
patch as it didn't made sense, as Jan also pointed out.

Finally, I haven't yet done a review of Harshad's patchset [1] (hope to
get to it soon), but a quick test shows the issue is still present there.
The good news is that patch can be trivially applied on top of it.

[1] https://lore.kernel.org/all/20240520055153.136091-1-harshadshirwadkar@gmail.com

Cheers,
--
Luis

 fs/ext4/ext4.h        | 11 ++++++++++-
 fs/ext4/fast_commit.c | 11 +++++++++++
 fs/ext4/super.c       |  1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 983dad8c07ec..4c308c18c3da 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1062,9 +1062,18 @@ struct ext4_inode_info {
 	/* Fast commit wait queue for this inode */
 	wait_queue_head_t i_fc_wait;
 
-	/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
+	/*
+	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len,
+	 * i_fc_next
+	 */
 	struct mutex i_fc_lock;
 
+	/*
+	 * Used to flag an inode as part of the next fast commit; will be
+	 * reset during fast commit clean-up
+	 */
+	tid_t i_fc_next;
+
 	/*
 	 * i_disksize keeps track of what the inode size is ON DISK, not
 	 * in memory.  During truncate, i_size is set to the new size by
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 87c009e0c59a..bfdf249f0783 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -402,6 +402,8 @@ static int ext4_fc_track_template(
 				 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
 				&sbi->s_fc_q[FC_Q_STAGING] :
 				&sbi->s_fc_q[FC_Q_MAIN]);
+	else
+		ei->i_fc_next = tid;
 	spin_unlock(&sbi->s_fc_lock);
 
 	return ret;
@@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
 				 i_fc_list) {
 		list_del_init(&iter->i_fc_list);
+		if (iter->i_fc_next == tid)
+			iter->i_fc_next = 0;
+		else if (iter->i_fc_next > tid)
+			/*
+			 * re-enqueue inode into STAGING, which will later be
+			 * splice back into MAIN
+			 */
+			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
+				      &sbi->s_fc_q[FC_Q_STAGING]);
 		ext4_clear_inode_state(&iter->vfs_inode,
 				       EXT4_STATE_FC_COMMITTING);
 		if (iter->i_sync_tid <= tid)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 893ab80dafba..56f416656d96 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1437,6 +1437,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
 	ext4_fc_init_inode(&ei->vfs_inode);
 	mutex_init(&ei->i_fc_lock);
+	ei->i_fc_next = 0;
 	return &ei->vfs_inode;
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ