[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2724c5c2c17127866e6ed5c69b8f1f24d4c2db3.camel@gmail.com>
Date: Wed, 28 Aug 2024 19:55:24 +0800
From: Julian Sun <sunjunchao2870@...il.com>
To: Jan Kara <jack@...e.cz>, Theodore Ts'o <tytso@....edu>
Cc: Vinicius Peixoto <vpeixoto@...amp.dev>,
syzbot+8512f3dbd96253ffbe27@...kaller.appspotmail.com, jack@...e.com,
jlbec@...lplan.org, joseph.qi@...ux.alibaba.com,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org, mark@...heh.com,
ocfs2-devel@...ts.linux.dev, syzkaller-bugs@...glegroups.com,
~lkcamp/discussion@...ts.sr.ht
Subject: Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in
jbd2_cleanup_journal_tail
On Wed, 2024-08-28 at 12:54 +0200, Jan Kara wrote:
> On Mon 26-08-24 09:32:08, Theodore Ts'o wrote:
> > On Mon, Aug 26, 2024 at 01:22:54AM -0300, Vinicius Peixoto wrote:
> > > Since the disk data is bogus, journal_reset fails with -EINVAL
> > > ("JBD2:
> > > Journal too short (blocks 2-1024)"); this leaves journal->j_head
> > > == NULL.
> > > However, jbd2_journal_load clears the JBD2_ABORT flag right
> > > before calling
> > > journal_reset. This leads to a problem later when
> > > ofcs2_mount_volume tries
> > > to flush the journal as part of the cleanup when aborting the
> > > mount
> > > operation:
> > >
> > > -> ofcs2_mount_volume (error; goto out_system_inodes)
> > > -> ofcs2_journal_shutdown
> > > -> jbd2_journal_flush
> > > -> jbd2_cleanup_journal_tail (J_ASSERT fails)
> > > ...
> > >
> > > I confirmed that setting the JBD2_ABORT flag in journal_reset
> > > before
> > > returning -EINVAL fixes the problem:
> > >
> > > static int journal_reset(journal_t *journal)
> > > journal_fail_superblock(journal);
> > > + journal->j_flags |= JBD2_ABORT;
> > > return -EINVAL;
> > >
> > > You can find a proper patch file + the syzbot re-test result in
> > > [1].
> > > However, I'm not entirely sure whether this is the correct
> > > decision, and I
> > > wanted to confirm that this is an appropriate solution before
> > > sending a
> > > proper patch to the mailing list.
> >
> > The reason why this isn't an issue with ext4 is because the normal
> > "right" way to do this is if jbd2_journal_load() returns an error,
> > is
> > to call jbd2_journal_destroy() to release the data structures with
> > the
> > journal --- and then don't try to use the journal afterwards.
>
> Yep. OCFS2 guys are actually looking into fixing this in OCFS2
> (
> https://lore.kernel.org/ocfs2-devel/20240819131120.746077-1-sunjunchao2870@gmail.com
> /)
> Not sure what's the status though. Julian, did you send v2 of your
> fix?
Yeah, I have already submitted the v2[1] of the patch and it is
awaiting review.
[1]:
https://lore.kernel.org/ocfs2-devel/20240823083150.17590-1-sunjunchao2870@gmail.com/
>
> Honza
>
> > The weird thing is that there are two code paths in ocfs2 that
> > calls
> > jbd2_journal_load(). One is in ocfs2_replay_journal() which does
> > what
> > ext4 does. The other is ocfs2_load_journal() which does *not* do
> > this, and this is the one which you saw in the syzkaller
> > reproducer.
> > It looks like one codepath is used to replay the ocfs2 for some
> > other
> > node, and the is to load (and presumably later, replay) the journal
> > for the mounting node.
> >
> > There are also some other things which look very confusing, such as
> > the
> > fact that ocfs2_journal_shutdown calls igrab:
> >
> > /* need to inc inode use count - jbd2_journal_destroy will
> > iput. */
> > if (!igrab(inode))
> > BUG();
> >
> > ... which is *weird*. Normaly the journal inode refcount is
> > expected
> > to be bumped before calling jbd2_journal_init_inode() --- which
> > normally is done by an iget() function, and is needed to make sure
> > the
> > journal inode isn't released out from under the jbd2 layer. It
> > looks
> > like ocfs2 uses the journal inode for other stuff so get the
> > journal
> > inode without calling something like iget(). Which is OK, I guess,
> > but it means that there are a whole bunch of places where it has to
> > call !BUG_ON(igrab(journal->j_inode) before calling
> > jbd2_journal_destroy(). It would seem like the *right* thing to do
> > is
> > to bump the refcount in ocfs2_journal_init(), and if for some
> > reason
> > the igrab fails, it can just return an error early, instead of
> > having
> > to resort to BUG_ON() later, and if you don't realize that you have
> > to
> > do this weird igrab() before calling jbd2_journal_destroy(), you'll
> > end up leaking the journal inode.
> >
> > Anyway, I think the right thing to do is to restructure how ocfs2
> > calls the jbd2 journal, but that's going to require a more careful
> > examination of the code paths. Your patch is a bit of a blunt
> > force
> > hack that should be harmless, but it's probably not the best way to
> > fix the problem, although doing it "right" would be more
> > complicated....
> >
> > - Ted
Thanks,
--
Julian Sun <sunjunchao2870@...il.com>
Powered by blists - more mailing lists