[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d673f289-2385-4949-ac80-f3a502d4deb2@lkcamp.dev>
Date: Mon, 26 Aug 2024 01:22:54 -0300
From: Vinicius Peixoto <vpeixoto@...amp.dev>
To: syzbot+8512f3dbd96253ffbe27@...kaller.appspotmail.com
Cc: jack@...e.com, jlbec@...lplan.org, joseph.qi@...ux.alibaba.com,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org, mark@...heh.com,
ocfs2-devel@...ts.linux.dev, syzkaller-bugs@...glegroups.com, tytso@....edu,
~lkcamp/discussion@...ts.sr.ht
Subject: Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail
Hi all,
I noticed this report from syzbot when going through the preliminary
tasks for the Linux Kernel Mentorship Program, and thought I'd take a
stab at solving it. I apologize in advance for any mistakes as I'm still
very new to kernel development. Either way, here's my analysis:
From what I can tell by looking at the reproducer from syzbot, it is
trying to mount a file filled with bogus data as an ocfs2 disk, and this
is triggering an assertion in jbd2_cleanup_journal_tail, which in turn
causes a panic.
The problematic call stack goes roughly like this:
mount_bdev
-> ofcs2_mount_volume
-> ofcs2_check_volume
-> ofcs2_journal_load
-> jbd2_journal_load
-> journal_reset (fails)
Since the disk data is bogus, journal_reset fails with -EINVAL ("JBD2:
Journal too short (blocks 2-1024)"); this leaves journal->j_head ==
NULL. However, jbd2_journal_load clears the JBD2_ABORT flag right before
calling journal_reset. This leads to a problem later when
ofcs2_mount_volume tries to flush the journal as part of the cleanup
when aborting the mount operation:
-> ofcs2_mount_volume (error; goto out_system_inodes)
-> ofcs2_journal_shutdown
-> jbd2_journal_flush
-> jbd2_cleanup_journal_tail (J_ASSERT fails)
This failure happens because of the following code:
if (is_journal_aborted(journal))
return -EIO;
if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
return 1;
J_ASSERT(blocknr != 0);
Since JBD2_ABORT was cleared in jbd2_journal_load earlier, we enter
jbd2_journal_get_log_tail, which will set *blocknr = journal->j_head
(which is NULL) and then trigger the assertion, causing a panic.
I confirmed that setting the JBD2_ABORT flag in journal_reset before
returning -EINVAL fixes the problem:
static int journal_reset(journal_t *journal)
journal_fail_superblock(journal);
+ journal->j_flags |= JBD2_ABORT;
return -EINVAL;
You can find a proper patch file + the syzbot re-test result in [1].
However, I'm not entirely sure whether this is the correct decision, and
I wanted to confirm that this is an appropriate solution before sending
a proper patch to the mailing list.
Thanks in advance,
Vinicius
[1] https://syzkaller.appspot.com/bug?extid=8512f3dbd96253ffbe27
Powered by blists - more mailing lists