[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ikdnwnux5ks25eqn5nbya6dpb2wrtsb22m57757ewdzd4qo6rx@e46o3ugl5wfa>
Date: Wed, 14 May 2025 12:07:35 +0200
From: Jan Kara <jack@...e.cz>
To: Jeongjun Park <aha310510@...il.com>
Cc: tytso@....edu, jack@...e.cz, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org, syzbot+de24c3fe3c4091051710@...kaller.appspotmail.com
Subject: Re: [PATCH] jbd2: fix data-race and null-ptr-deref in
jbd2_journal_dirty_metadata()
On Wed 14-05-25 02:04:41, Jeongjun Park wrote:
> Since handle->h_transaction may be a NULL pointer, so we should change it
> to call is_handle_aborted(handle) first before dereferencing it.
>
> And the following data-race was reported in my fuzzer:
>
> ==================================================================
> BUG: KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata
>
> write to 0xffff888011024104 of 4 bytes by task 10881 on cpu 1:
> jbd2_journal_dirty_metadata+0x2a5/0x770 fs/jbd2/transaction.c:1556
> __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358
> ext4_do_update_inode fs/ext4/inode.c:5220 [inline]
> ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869
> __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074
> ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103
> ....
>
> read to 0xffff888011024104 of 4 bytes by task 10880 on cpu 0:
> jbd2_journal_dirty_metadata+0xf2/0x770 fs/jbd2/transaction.c:1512
> __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358
> ext4_do_update_inode fs/ext4/inode.c:5220 [inline]
> ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869
> __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074
> ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103
> ....
>
> value changed: 0x00000000 -> 0x00000001
> ==================================================================
>
> According to this crash report, there is a read/write data-race in
> jh->b_modified.
>
> This is because the b_state_lock is locked too late.
>
> For some reason, jbd2_journal_dirty_metadata() has been written in a way
> that it does not lock b_state_lock before checking jh->b_transaction.
Yes and that is deliberate because for some buffers like bitmaps the
contention (and cacheline bouncing) on the b_state_lock can be actually
quite significant. See commit 6e06ae88edae ("jbd2: speedup
jbd2_journal_dirty_metadata()") for a bit more description.
> However, This makes the code that checks jh->b_transaction messy and
> causes a data-race in jh->b_* variables. Since locking b_state_lock
> earlier is not enough to significantly affect performance and most of the
> functions defined in transaction.c lock b_state_lock before
> reading/writing jh->b_*.
Well, the code is written with the expectation that b_transaction, b_jlist,
and b_modified fields can be changing underneath. Since this was
implemented 10 years ago, we didn't really bother with properly annotating
this but the code as is is safe. I agree the data_race() annotation for the
b_modified check was missed when adding data_race() annotations 4 years
ago so that should be fixed.
Honza
> Thereforce, I think it would be appropriate to modify
> jbd2_journal_dirty_metadata() to lock b_state_lock earlier as well.
>
> Reported-by: syzbot+de24c3fe3c4091051710@...kaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=de24c3fe3c4091051710
> Fixes: 6e06ae88edae ("jbd2: speedup jbd2_journal_dirty_metadata()")
> Signed-off-by: Jeongjun Park <aha310510@...il.com>
> ---
> fs/jbd2/transaction.c | 49 +++++++++++++++----------------------------
> 1 file changed, 17 insertions(+), 32 deletions(-)
>
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index cbc4785462f5..7e6dbf37396f 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1496,41 +1496,25 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
> jbd2_debug(5, "journal_head %p\n", jh);
> JBUFFER_TRACE(jh, "entry");
>
> - /*
> - * This and the following assertions are unreliable since we may see jh
> - * in inconsistent state unless we grab bh_state lock. But this is
> - * crucial to catch bugs so let's do a reliable check until the
> - * lockless handling is fully proven.
> - */
> - if (data_race(jh->b_transaction != transaction &&
> - jh->b_next_transaction != transaction)) {
> - spin_lock(&jh->b_state_lock);
> - J_ASSERT_JH(jh, jh->b_transaction == transaction ||
> - jh->b_next_transaction == transaction);
> - spin_unlock(&jh->b_state_lock);
> - }
> + spin_lock(&jh->b_state_lock);
> +
> + J_ASSERT_JH(jh, jh->b_transaction == transaction ||
> + jh->b_next_transaction == transaction);
> +
> if (jh->b_modified == 1) {
> /* If it's in our transaction it must be in BJ_Metadata list. */
> - if (data_race(jh->b_transaction == transaction &&
> - jh->b_jlist != BJ_Metadata)) {
> - spin_lock(&jh->b_state_lock);
> - if (jh->b_transaction == transaction &&
> - jh->b_jlist != BJ_Metadata)
> - pr_err("JBD2: assertion failure: h_type=%u "
> - "h_line_no=%u block_no=%llu jlist=%u\n",
> - handle->h_type, handle->h_line_no,
> - (unsigned long long) bh->b_blocknr,
> - jh->b_jlist);
> - J_ASSERT_JH(jh, jh->b_transaction != transaction ||
> - jh->b_jlist == BJ_Metadata);
> - spin_unlock(&jh->b_state_lock);
> - }
> - goto out;
> + if (jh->b_transaction == transaction &&
> + jh->b_jlist != BJ_Metadata)
> + pr_err("JBD2: assertion failure: h_type=%u "
> + "h_line_no=%u block_no=%llu jlist=%u\n",
> + handle->h_type, handle->h_line_no,
> + (unsigned long long) bh->b_blocknr,
> + jh->b_jlist);
> + J_ASSERT_JH(jh, jh->b_transaction != transaction ||
> + jh->b_jlist == BJ_Metadata);
> + goto out_unlock_bh;
> }
>
> - journal = transaction->t_journal;
> - spin_lock(&jh->b_state_lock);
> -
> if (is_handle_aborted(handle)) {
> /*
> * Check journal aborting with @jh->b_state_lock locked,
> @@ -1543,6 +1527,8 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
> goto out_unlock_bh;
> }
>
> + journal = transaction->t_journal;
> +
> if (jh->b_modified == 0) {
> /*
> * This buffer's got modified and becoming part
> @@ -1628,7 +1614,6 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
> spin_unlock(&journal->j_list_lock);
> out_unlock_bh:
> spin_unlock(&jh->b_state_lock);
> -out:
> JBUFFER_TRACE(jh, "exit");
> return ret;
> }
> --
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists