[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <db4da8db-d501-4181-994b-c25845908161@suse.com>
Date: Wed, 15 Jan 2025 13:00:23 +0800
From: Heming Zhao <heming.zhao@...e.com>
To: Jan Kara <jack@...e.cz>, Theodore Ts'o <tytso@....edu>, li.kai4@....com
Cc: jack@...e.com, linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller@...glegroups.com, Joseph Qi <joseph.qi@...ux.alibaba.com>,
ocfs2-devel@...ts.linux.dev, Liebes Wang <wanghaichi0403@...il.com>,
syzbot <syzbot+96ee12698391289383dd@...kaller.appspotmail.com>
Subject: Re: WARNING in jbd2_journal_update_sb_log_tail
Hello Jan,
On 1/15/25 09:32, Liebes Wang wrote:
> The bisection log shows the first cause commit is a09decff5c32060639a685581c380f51b14e1fc2:
> a09decff5c32 jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
>
> The full bisection log is attached. Hope this helps.
This bisearch commit a09decff5c32 appears to be the root cause
of this issue. It fixed one issue but introduced another.
Syzbot tested the patch with calling jbd2_journal_wipe() with 'write=1'.
The Syzbot test result [1] shows that the same WARN_ON() is triggered
in a subsequent routine – the classic whack-a-mole!
Back to commit a09decff5c32, it opened a door to allow jbd2 to update
sb regardless of whether the value of sb items are correct.
To fix a09decff5c32, it seems that jbd2 needs to add more sanity check
codes in a sub-routine of jbd2_journal_load().
btw, in my view, this is a jbd2 issue not ocfs2/ext4 issue.
[1]: https://lore.kernel.org/ocfs2-devel/04a9ad29-51de-4b50-a5bb-56f91817639d@suse.com/T/#m86d01f83d808868bb5e6548d30f79b4f9f889b13
-- Heming
>
> Heming Zhao <heming.zhao@...e.com <mailto:heming.zhao@...e.com>> 于2025年1月14日周二 22:51写道:
>
> Hi Ted,
>
> On 1/14/25 21:38, Theodore Ts'o wrote:
> > On Tue, Jan 14, 2025 at 02:25:21PM +0800, Heming Zhao wrote:
> >>
> >> The root cause appears to be that the jbd2 bypass recovery logic
> >> is incorrect.
> >
> > Heming, thanks for taking a look.
> >
> > I'm not convinced the root cause is what you've stated. When
> > jbd2_journal_wipe() calls jbd2_mark_journal_empty(), s_start gets set
> > to zero:
>
> Actually, ocfs2 calls jbd2_journal_wipe() with 'write=0' (hard coded),
> so jbd2_mark_journal_empty() isn't called during the ocfs2 mount
> phase. This means the following deduction won't apply in this case.
>
> -- Heming
>
> >
> > sb->s_start = cpu_to_be32(0);
> >
> > This then gets checked in jbd2_journal_recovery:
> >
> > if (!sb->s_start) {
> > jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n",
> > be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head));
> > journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
> > journal->j_head = be32_to_cpu(sb->s_head);
> > return 0;
> > }
> >
> > I suspect that there is something else wrong with jbd2's superblock,
> > since this normally works in the absence of malicious fs image
> > fuzzing, such that when jbd2_journal_load() calls reset_journal()
> > after jbd2_journal_recover() correctly bypasses recovery, the WARN_ON
> > gets triggered.
> >
> > I'd suggest that you enable jbd2 debugging so we can see all of the
> > jbd2_debug() message to understand what might be going on.
> >
> > By the way, given that this is only a WARN_ON, and it involves
> > malicious image fuzzing, this is probably a valid jbd2 bug, but it's
> > not actually a security bug. Sure, someone silly enough to pick up a
> > maliciously corrupted USB thumb drive dropped in a parking lot and
> > insert it into their desktop, and the distribution is silly enoough to
> > allow automount, the worse that can happen is that the system to
> > reboot if the system is configured to panic on a WARNING. So feel
> > free to prioritize your investigation appropriately. :-)
> >
> > Cheers,
> >
> > - Ted
>
Powered by blists - more mailing lists