[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADhLXY4_yYdGQCYxq3=gQ6ZTJ7y_=dGsEBqdJ4g7JizX+ocVYA@mail.gmail.com>
Date: Thu, 4 Dec 2025 15:24:50 +0530
From: Deepanshu Kartikey <kartikey406@...il.com>
To: Theodore Tso <tytso@....edu>
Cc: Matthew Wilcox <willy@...radead.org>, Zhang Yi <yi.zhang@...weicloud.com>,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
syzbot+b0a0670332b6b3230a0a@...kaller.appspotmail.com,
adilger.kernel@...ger.ca, djwong@...nel.org
Subject: Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite()
On Thu, Dec 4, 2025 at 4:04 AM Theodore Tso <tytso@....edu> wrote:
>
> On Wed, Dec 03, 2025 at 09:35:29PM +0000, Matthew Wilcox wrote:
> > You snipped out all the context when adding me to the cc, and I'm on
> > holiday until after Plumbers, so I'm disinclined to go looking for
> > context.
>
> Sorry about that. A quick summary. Deepanshu was attempting to
> address a Syzbot complaint[1].
>
> [1] https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a
>
> The TL;DR summary is that the syzbot complaint involved a maliciously
> corrupted file system which resulted file system getting detected when
> delayed allocation writeback attempts to do a block allocation. Error
> handling calls mpage_release_unused_pages(invalidate=true), which
> clears the uptodate flag via folio_clear_uptodate().
>
> Because syzbot mounts the file system using errors=continue (which is
> the worst case; we're not panic'ing the kernel or forcing the file
> system to be read-only), we now have a situation where we have a folio
> which can be mapped, but !uptodate, but the file system can still be
> subject to changes.
>
> In the syzkaller reproducer, the potential malware might call
> ftruncate on the file, and this results ext4_truncate() calling
> ext4_block_truncate_page() which thentries to zero the page tail,
> which triggers a write fault, resulting in ext4_page_mkwrite() on a
> page/folio which is not uptodate. It then tries to mark the folio
> dirty, mapped but !uptdate, and then __folio_mark_dirty() triggers:
> WARN_ON_ONCE(!folio_test_uptodate()).
>
> Since in Syzkaller assumes users are stupid enough to panic on warn,
> this is an urgent security issue because it's a denial of service
> attack which is CVE worthy --- where the system admiinstrator is
> stupid enough to allow an untrusted user to mount an untrusted,
> maliciously crafted file system, instead of using fuse2fs. The
> security people thinkt his is super-duper important. Personally, I
> don't think it's all that urgent, so by all means, don't feel obliged
> to think about this while on vacation. :-)
>
> Anyway, that's the context. Deepanshu has a proposed fix here[2]
> which puts a folio_lock() into every single write page fault for ext4:
>
> + folio_lock(folio);
> + if (!folio_test_uptodate(folio)) {
> + folio_unlock(folio);
> + ret = VM_FAULT_SIGBUS;
> + goto out;
> + }
> + folio_unlock(folio);
>
> [2] https://lore.kernel.org/r/20251122015742.362444-1-kartikey406@gmail.com
>
> This seems.... unfortunate to me, so the first question is, "is
> locking the folio really necessary"? (I suspect the answer is no),
> and two, should this check be done in the mm layer calling
> page_mkwrite(), or in ext4_page_mkwrite()?
>
> Presumably, this might happen for other file systems, with either
> syzkaller coming up with this rather implausible scenario of really
> stupid, unfortunately system adminsitrator choices --- or in real
> life, if we do have some system adminisrtators making really stupid,
> unfortunate life choices. So maybe we should this check should be
> done above the file system layer?
>
> - Ted
>
>
>
>
>
>
>
> >
> > On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote:
> > > My main concern with your patch is folio_lock() is *incredibly*
> > > heavyweight and is going to be a real scalability concern if we need
> > > to take it every single time we need to make a page writeable.
> > >
> > > So could we perhaps do something like this? So the first question is
> > > do we need to take the lock at all? I'm not sure we need to worry
> > > about the case where the page is not uptodate because we're racing
> > > with the page being brought into memory; if we that could happen under
> > > normal circumstances we would be triggering the warning even without
> > > these situations such as a delayed allocaiton write failing due to a
> > > corrupted file system image. So can we just do this?
> > >
> > > if (!folio_test_uptodate(folio)) {
> > > ret = VM_FAULT_SIGBUS;
> > > goto out;
> > > }
> > >
> > > If it is legitmate that ext4_page_mkwrite() could be called while the
> > > page is still being read in (and again, I don't think it is), then we
> > > could do something like this:
> > >
> > > if (!folio_test_uptodate(folio)) {
> > > folio_lock(folio);
> > > if (!folio_test_uptodate(folio)) {
> > > folio_unlock(folio);
> > > ret = VM_FAULT_SIGBUS;
> > > goto out;
> > > }
> > > folio_unlock(folio);
> > > }
> > >
> > > Matthew, as the page cache maintainer, do we actually need this extra
> > > rigamarole. Or can we just skip taking the lock before checking to
> > > see if the folio is uptodate in ext4_page_mkwrite()?
> > >
> > > - Ted
Hi Ted,
Thank you for the detailed summary and context.
Based on Matthew's earlier feedback that we need to "prevent !uptodate
folios from being referenced by the page tables," I believe the
correct fix is not in ext4_page_mkwrite() at all, but rather in
mpage_release_unused_pages().
When we invalidate folios due to writeback failure, we should also
unmap them from page tables:
static void mpage_release_unused_pages(..., bool invalidate)
{
// ... existing code ...
if (invalidate) {
if (folio_mapped(folio))
folio_clear_dirty_for_io(folio);
// Unmap from page tables before invalidating
if (folio_mapped(folio))
unmap_mapping_folio(folio);
block_invalidate_folio(folio, 0, folio_size(folio));
folio_clear_uptodate(folio);
}
}
This way:
1. No performance impact on normal operations (only in error path)
2. No folio_lock() needed in ext4_page_mkwrite()
3. Prevents stale PTEs from referencing invalidated folios
4. Any subsequent access triggers a new page fault instead
To address your question about whether this should be in MM layer:
other filesystems could hit this same issue if they use delayed
allocation and invalidate folios on writeback failure. However,
since the invalidation is happening in ext4-specific code
(mpage_release_unused_pages), fixing it there seems appropriate.
If this pattern appears in other filesystems, perhaps the fix
could be moved to a common helper.
I'll send v3 with this approach. Does this look correct?
Best regards,
Deepanshu
Powered by blists - more mailing lists