[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251203223300.GB71988@macsyma.lan>
Date: Wed, 3 Dec 2025 17:33:00 -0500
From: "Theodore Tso" <tytso@....edu>
To: Matthew Wilcox <willy@...radead.org>
Cc: Deepanshu Kartikey <kartikey406@...il.com>,
Zhang Yi <yi.zhang@...weicloud.com>, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org,
syzbot+b0a0670332b6b3230a0a@...kaller.appspotmail.com,
adilger.kernel@...ger.ca, djwong@...nel.org
Subject: Re: [PATCH v2] ext4: check folio uptodate state in
ext4_page_mkwrite()
On Wed, Dec 03, 2025 at 09:35:29PM +0000, Matthew Wilcox wrote:
> You snipped out all the context when adding me to the cc, and I'm on
> holiday until after Plumbers, so I'm disinclined to go looking for
> context.
Sorry about that. A quick summary. Deepanshu was attempting to
address a Syzbot complaint[1].
[1] https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a
The TL;DR summary is that the syzbot complaint involved a maliciously
corrupted file system which resulted file system getting detected when
delayed allocation writeback attempts to do a block allocation. Error
handling calls mpage_release_unused_pages(invalidate=true), which
clears the uptodate flag via folio_clear_uptodate().
Because syzbot mounts the file system using errors=continue (which is
the worst case; we're not panic'ing the kernel or forcing the file
system to be read-only), we now have a situation where we have a folio
which can be mapped, but !uptodate, but the file system can still be
subject to changes.
In the syzkaller reproducer, the potential malware might call
ftruncate on the file, and this results ext4_truncate() calling
ext4_block_truncate_page() which thentries to zero the page tail,
which triggers a write fault, resulting in ext4_page_mkwrite() on a
page/folio which is not uptodate. It then tries to mark the folio
dirty, mapped but !uptdate, and then __folio_mark_dirty() triggers:
WARN_ON_ONCE(!folio_test_uptodate()).
Since in Syzkaller assumes users are stupid enough to panic on warn,
this is an urgent security issue because it's a denial of service
attack which is CVE worthy --- where the system admiinstrator is
stupid enough to allow an untrusted user to mount an untrusted,
maliciously crafted file system, instead of using fuse2fs. The
security people thinkt his is super-duper important. Personally, I
don't think it's all that urgent, so by all means, don't feel obliged
to think about this while on vacation. :-)
Anyway, that's the context. Deepanshu has a proposed fix here[2]
which puts a folio_lock() into every single write page fault for ext4:
+ folio_lock(folio);
+ if (!folio_test_uptodate(folio)) {
+ folio_unlock(folio);
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+ folio_unlock(folio);
[2] https://lore.kernel.org/r/20251122015742.362444-1-kartikey406@gmail.com
This seems.... unfortunate to me, so the first question is, "is
locking the folio really necessary"? (I suspect the answer is no),
and two, should this check be done in the mm layer calling
page_mkwrite(), or in ext4_page_mkwrite()?
Presumably, this might happen for other file systems, with either
syzkaller coming up with this rather implausible scenario of really
stupid, unfortunately system adminsitrator choices --- or in real
life, if we do have some system adminisrtators making really stupid,
unfortunate life choices. So maybe we should this check should be
done above the file system layer?
- Ted
>
> On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote:
> > My main concern with your patch is folio_lock() is *incredibly*
> > heavyweight and is going to be a real scalability concern if we need
> > to take it every single time we need to make a page writeable.
> >
> > So could we perhaps do something like this? So the first question is
> > do we need to take the lock at all? I'm not sure we need to worry
> > about the case where the page is not uptodate because we're racing
> > with the page being brought into memory; if we that could happen under
> > normal circumstances we would be triggering the warning even without
> > these situations such as a delayed allocaiton write failing due to a
> > corrupted file system image. So can we just do this?
> >
> > if (!folio_test_uptodate(folio)) {
> > ret = VM_FAULT_SIGBUS;
> > goto out;
> > }
> >
> > If it is legitmate that ext4_page_mkwrite() could be called while the
> > page is still being read in (and again, I don't think it is), then we
> > could do something like this:
> >
> > if (!folio_test_uptodate(folio)) {
> > folio_lock(folio);
> > if (!folio_test_uptodate(folio)) {
> > folio_unlock(folio);
> > ret = VM_FAULT_SIGBUS;
> > goto out;
> > }
> > folio_unlock(folio);
> > }
> >
> > Matthew, as the page cache maintainer, do we actually need this extra
> > rigamarole. Or can we just skip taking the lock before checking to
> > see if the folio is uptodate in ext4_page_mkwrite()?
> >
> > - Ted
Powered by blists - more mailing lists