linux-kernel - Re: WARNING in jbd2_journal_update_sb_log

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2htuqhxolgddrwm5ka7ad4axu3gz7nprqnmspdtkwld7mzch3r@tqhjlxfidxsb>
Date: Wed, 15 Jan 2025 18:53:41 +0100
From: Jan Kara <jack@...e.cz>
To: Heming Zhao <heming.zhao@...e.com>
Cc: Jan Kara <jack@...e.cz>, Theodore Ts'o <tytso@....edu>, 
	li.kai4@....com, jack@...e.com, linux-ext4@...r.kernel.org, 
	linux-kernel@...r.kernel.org, syzkaller@...glegroups.com, 
	Joseph Qi <joseph.qi@...ux.alibaba.com>, ocfs2-devel@...ts.linux.dev, 
	Liebes Wang <wanghaichi0403@...il.com>, syzbot <syzbot+96ee12698391289383dd@...kaller.appspotmail.com>
Subject: Re: WARNING in jbd2_journal_update_sb_log_tail

On Wed 15-01-25 13:00:23, Heming Zhao wrote:
> Hello Jan,
> 
> On 1/15/25 09:32, Liebes Wang wrote:
> > The bisection log shows the first cause commit is a09decff5c32060639a685581c380f51b14e1fc2:
> > a09decff5c32 jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
> > 
> > The full bisection log is attached. Hope this helps.
> 
> This bisearch commit a09decff5c32 appears to be the root cause
> of this issue. It fixed one issue but introduced another.
> 
> Syzbot tested the patch with calling jbd2_journal_wipe() with 'write=1'.
> The Syzbot test result [1] shows that the same WARN_ON() is triggered
> in a subsequent routine – the classic whack-a-mole!
> 
> Back to commit a09decff5c32, it opened a door to allow jbd2 to update
> sb regardless of whether the value of sb items are correct.
> 
> To fix a09decff5c32, it seems that jbd2 needs to add more sanity check
> codes in a sub-routine of jbd2_journal_load().
> 
> btw, in my view, this is a jbd2 issue not ocfs2/ext4 issue.
> 
> [1]: https://lore.kernel.org/ocfs2-devel/04a9ad29-51de-4b50-a5bb-56f91817639d@suse.com/T/#m86d01f83d808868bb5e6548d30f79b4f9f889b13

Thanks for debugging this! So I'm not 100% convinced this is only jbd2 bug
because jbd2_journal_recover() was never intended to be called after
jbd2_journal_skip_recovery() (called from jbd2_journal_wipe()). You're
supposed to call either jbd2_journal_wipe() or jbd2_journal_recover() but
not both. So IMO this needs fixing in OCFS2 code. That being said you've
also pointed at one bug in jbd2 code - the WARN_ON(!sb->s_sequence) in
jbd2_journal_update_sb_log_tail() is indeed wrong. We were inconsistent
inside jbd2 whether TID 0 is considered valid or not and relatively
recently we've decided to accept TID 0 as valid but this place was left
out. I'll send a fix for that.

								Honza

> > Heming Zhao <heming.zhao@...e.com <mailto:heming.zhao@...e.com>> 于2025年1月14日周二 22:51写道：
> > 
> >     Hi Ted,
> > 
> >     On 1/14/25 21:38, Theodore Ts'o wrote:
> >      > On Tue, Jan 14, 2025 at 02:25:21PM +0800, Heming Zhao wrote:
> >      >>
> >      >> The root cause appears to be that the jbd2 bypass recovery logic
> >      >> is incorrect.
> >      >
> >      > Heming, thanks for taking a look.
> >      >
> >      > I'm not convinced the root cause is what you've stated.  When
> >      > jbd2_journal_wipe() calls jbd2_mark_journal_empty(), s_start gets set
> >      > to zero:
> > 
> >     Actually, ocfs2 calls jbd2_journal_wipe() with 'write=0' (hard coded),
> >     so jbd2_mark_journal_empty() isn't called during the ocfs2 mount
> >     phase. This means the following deduction won't apply in this case.
> > 
> >     -- Heming
> > 
> >      >
> >      >       sb->s_start    = cpu_to_be32(0);
> >      >
> >      > This then gets checked in jbd2_journal_recovery:
> >      >
> >      >       if (!sb->s_start) {
> >      >               jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n",
> >      >                         be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head));
> >      >               journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
> >      >               journal->j_head = be32_to_cpu(sb->s_head);
> >      >               return 0;
> >      >       }
> >      >
> >      > I suspect that there is something else wrong with jbd2's superblock,
> >      > since this normally works in the absence of malicious fs image
> >      > fuzzing, such that when jbd2_journal_load() calls reset_journal()
> >      > after jbd2_journal_recover() correctly bypasses recovery, the WARN_ON
> >      > gets triggered.
> >      >
> >      > I'd suggest that you enable jbd2 debugging so we can see all of the
> >      > jbd2_debug() message to understand what might be going on.
> >      >
> >      > By the way, given that this is only a WARN_ON, and it involves
> >      > malicious image fuzzing, this is probably a valid jbd2 bug, but it's
> >      > not actually a security bug.  Sure, someone silly enough to pick up a
> >      > maliciously corrupted USB thumb drive dropped in a parking lot and
> >      > insert it into their desktop, and the distribution is silly enoough to
> >      > allow automount, the worse that can happen is that the system to
> >      > reboot if the system is configured to panic on a WARNING.  So feel
> >      > free to prioritize your investigation appropriately.  :-)
> >      >
> >      > Cheers,
> >      >
> >      >                                               - Ted
> > 
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR