linux-ext4 - Re: kernel BUG at fs/ext4/inode.c:1914

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y45eV/nA2tj8C94W@mit.edu>
Date:   Mon, 5 Dec 2022 16:10:47 -0500
From:   "Theodore Ts'o" <tytso@....edu>
To:     Ivan Zahariev <famzah@...soft.com>
Cc:     linux-ext4@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: kernel BUG at fs/ext4/inode.c:1914 - page_buffers()

On Wed, Nov 23, 2022 at 04:48:01PM +0200, Ivan Zahariev wrote:
> Hello,
> 
> Starting with kernel 5.15 for the past eight months we have a total of 12
> kernel panics at a fleet of 1000 KVM/Qemu machines which look the following
> way:
> 
>     kernel BUG at fs/ext4/inode.c:1914
> 
> Switching from kernel 4.14 to 5.15 almost immediately triggered the problem.
> Therefore we are very confident that userland activity is more or less the
> same and is not the root cause. The kernel function which triggers the BUG
> is __ext4_journalled_writepage(). In 5.15 the code for
> __ext4_journalled_writepage() in "fs/ext4/inode.c" is the same as the
> current kernel "master". The line where the BUG is triggered is:
> 
>     struct buffer_head *page_bufs = page_buffers(page)
	...
> 
> Back to the problem! 99% of the difference between 4.14 and the latest
> kernel for __ext4_journalled_writepage() in "fs/ext4/inode.c" comes from the
> following commit: https://github.com/torvalds/linux/commit/5c48a7df91499e371ef725895b2e2d21a126e227
> 
> Is it safe that we revert this patch on the latest 5.15 kernel, so that we
> can confirm if this resolves the issue for us?

No, it's not safe to revert this patch; this fixes a potential
use-after-free bug identified by Syzkaller (and use-after-frees can
sometimes be leveraged into security volunerabilities.

I have not tested this change yet, but looking at the code and the
change made in patch yet, this *might* be a possible fix:

	size = i_size_read(inode);
-	if (page->mapping != mapping || page_offset(page) > size) {
+	if (page->mapping != mapping || page_offset(page) >= size) {
		/* The page got truncated from under us */
		ext4_journal_stop(handle);
		ret = 0;
		goto out;
	}

Is it fair to say that your workload is using data=journaled and is
frequently truncating that might have been recently modified (hence
triggering the race between truncate and journalled writepages)?

I wonder if you could come up with a more reliable reproducer so we
can test a particular patch.

Thanks,

					- Ted