lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230112150708.y2ws5q3wu2xxow3p@quack3>
Date:   Thu, 12 Jan 2023 16:07:08 +0100
From:   Jan Kara <jack@...e.cz>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Ivan Zahariev <famzah@...soft.com>, linux-ext4@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: kernel BUG at fs/ext4/inode.c:1914 - page_buffers()

On Mon 05-12-22 16:10:47, Theodore Ts'o wrote:
> On Wed, Nov 23, 2022 at 04:48:01PM +0200, Ivan Zahariev wrote:
> > Hello,
> > 
> > Starting with kernel 5.15 for the past eight months we have a total of 12
> > kernel panics at a fleet of 1000 KVM/Qemu machines which look the following
> > way:
> > 
> >     kernel BUG at fs/ext4/inode.c:1914
> > 
> > Switching from kernel 4.14 to 5.15 almost immediately triggered the problem.
> > Therefore we are very confident that userland activity is more or less the
> > same and is not the root cause. The kernel function which triggers the BUG
> > is __ext4_journalled_writepage(). In 5.15 the code for
> > __ext4_journalled_writepage() in "fs/ext4/inode.c" is the same as the
> > current kernel "master". The line where the BUG is triggered is:
> > 
> >     struct buffer_head *page_bufs = page_buffers(page)
> 	...
> > 
> > Back to the problem! 99% of the difference between 4.14 and the latest
> > kernel for __ext4_journalled_writepage() in "fs/ext4/inode.c" comes from the
> > following commit: https://github.com/torvalds/linux/commit/5c48a7df91499e371ef725895b2e2d21a126e227
> > 
> > Is it safe that we revert this patch on the latest 5.15 kernel, so that we
> > can confirm if this resolves the issue for us?
> 
> No, it's not safe to revert this patch; this fixes a potential
> use-after-free bug identified by Syzkaller (and use-after-frees can
> sometimes be leveraged into security volunerabilities.
> 
> I have not tested this change yet, but looking at the code and the
> change made in patch yet, this *might* be a possible fix:
> 
> 	size = i_size_read(inode);
> -	if (page->mapping != mapping || page_offset(page) > size) {
> +	if (page->mapping != mapping || page_offset(page) >= size) {
> 		/* The page got truncated from under us */
> 		ext4_journal_stop(handle);
> 		ret = 0;
> 		goto out;
> 	}
> 
> Is it fair to say that your workload is using data=journaled and is
> frequently truncating that might have been recently modified (hence
> triggering the race between truncate and journalled writepages)?
> 
> I wonder if you could come up with a more reliable reproducer so we
> can test a particular patch.

So after a bit of thought I agree that the commit 5c48a7df91499 ("ext4: fix
an use-after-free issue about data=journal writeback mode") is broken. The
problem is when we unlock the page in __ext4_journalled_writepage() anybody
else can come, writeout the page, and reclaim page buffers (due to memory
pressure). Previously, bh references were preventing the buffer reclaim to
happen but now there's nothing to prevent it.

My rewrite of data=journal writeback path fixes this problem as a
side-effect but perhaps we need a quickfix for stable kernels? Something
like attached patch?

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

View attachment "0001-ext4-Fix-crash-in-__ext4_journalled_writepage.patch" of type "text/x-patch" (3223 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ