linux-kernel - Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d673f289-2385-4949-ac80-f3a502d4deb2@lkcamp.dev>
Date: Mon, 26 Aug 2024 01:22:54 -0300
From: Vinicius Peixoto <vpeixoto@...amp.dev>
To: syzbot+8512f3dbd96253ffbe27@...kaller.appspotmail.com
Cc: jack@...e.com, jlbec@...lplan.org, joseph.qi@...ux.alibaba.com,
 linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org, mark@...heh.com,
 ocfs2-devel@...ts.linux.dev, syzkaller-bugs@...glegroups.com, tytso@....edu,
 ~lkcamp/discussion@...ts.sr.ht
Subject: Re: [syzbot] [ext4?] [ocfs2?] kernel BUG in jbd2_cleanup_journal_tail

Hi all,

I noticed this report from syzbot when going through the preliminary 
tasks for the Linux Kernel Mentorship Program, and thought I'd take a 
stab at solving it. I apologize in advance for any mistakes as I'm still 
very new to kernel development. Either way, here's my analysis:

 From what I can tell by looking at the reproducer from syzbot, it is 
trying to mount a file filled with bogus data as an ocfs2 disk, and this 
is triggering an assertion in jbd2_cleanup_journal_tail, which in turn 
causes a panic.

The problematic call stack goes roughly like this:

mount_bdev
   -> ofcs2_mount_volume
     -> ofcs2_check_volume
       -> ofcs2_journal_load
         -> jbd2_journal_load
           -> journal_reset (fails)

Since the disk data is bogus, journal_reset fails with -EINVAL ("JBD2: 
Journal too short (blocks 2-1024)"); this leaves journal->j_head == 
NULL. However, jbd2_journal_load clears the JBD2_ABORT flag right before 
calling journal_reset. This leads to a problem later when 
ofcs2_mount_volume tries to flush the journal as part of the cleanup 
when aborting the mount operation:

   -> ofcs2_mount_volume (error; goto out_system_inodes)
     -> ofcs2_journal_shutdown
       -> jbd2_journal_flush
         -> jbd2_cleanup_journal_tail (J_ASSERT fails)

This failure happens because of the following code:

         if (is_journal_aborted(journal))
                 return -EIO;

         if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
                 return 1;
         J_ASSERT(blocknr != 0);

Since JBD2_ABORT was cleared in jbd2_journal_load earlier, we enter 
jbd2_journal_get_log_tail, which will set *blocknr = journal->j_head 
(which is NULL) and then trigger the assertion, causing a panic.

I confirmed that setting the JBD2_ABORT flag in journal_reset before 
returning -EINVAL fixes the problem:

         static int journal_reset(journal_t *journal)
                         journal_fail_superblock(journal);
         +               journal->j_flags |= JBD2_ABORT;
                         return -EINVAL;

You can find a proper patch file + the syzbot re-test result in [1]. 
However, I'm not entirely sure whether this is the correct decision, and 
I wanted to confirm that this is an appropriate solution before sending 
a proper patch to the mailing list.

Thanks in advance,
Vinicius

[1] https://syzkaller.appspot.com/bug?extid=8512f3dbd96253ffbe27