linux-kernel - Re: [PATCH v2] ext4: check folio uptodate state in ext4_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADhLXY4_yYdGQCYxq3=gQ6ZTJ7y_=dGsEBqdJ4g7JizX+ocVYA@mail.gmail.com>
Date: Thu, 4 Dec 2025 15:24:50 +0530
From: Deepanshu Kartikey <kartikey406@...il.com>
To: Theodore Tso <tytso@....edu>
Cc: Matthew Wilcox <willy@...radead.org>, Zhang Yi <yi.zhang@...weicloud.com>, 
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org, 
	syzbot+b0a0670332b6b3230a0a@...kaller.appspotmail.com, 
	adilger.kernel@...ger.ca, djwong@...nel.org
Subject: Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite()

On Thu, Dec 4, 2025 at 4:04 AM Theodore Tso <tytso@....edu> wrote:
>
> On Wed, Dec 03, 2025 at 09:35:29PM +0000, Matthew Wilcox wrote:
> > You snipped out all the context when adding me to the cc, and I'm on
> > holiday until after Plumbers, so I'm disinclined to go looking for
> > context.
>
> Sorry about that.  A quick summary.  Deepanshu was attempting to
> address a Syzbot complaint[1].
>
> [1] https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a
>
> The TL;DR summary is that the syzbot complaint involved a maliciously
> corrupted file system which resulted file system getting detected when
> delayed allocation writeback attempts to do a block allocation.  Error
> handling calls mpage_release_unused_pages(invalidate=true), which
> clears the uptodate flag via folio_clear_uptodate().
>
> Because syzbot mounts the file system using errors=continue (which is
> the worst case; we're not panic'ing the kernel or forcing the file
> system to be read-only), we now have a situation where we have a folio
> which can be mapped, but !uptodate, but the file system can still be
> subject to changes.
>
> In the syzkaller reproducer, the potential malware might call
> ftruncate on the file, and this results ext4_truncate() calling
> ext4_block_truncate_page() which thentries to zero the page tail,
> which triggers a write fault, resulting in ext4_page_mkwrite() on a
> page/folio which is not uptodate.  It then tries to mark the folio
> dirty, mapped but !uptdate, and then __folio_mark_dirty() triggers:
> WARN_ON_ONCE(!folio_test_uptodate()).
>
> Since in Syzkaller assumes users are stupid enough to panic on warn,
> this is an urgent security issue because it's a denial of service
> attack which is CVE worthy --- where the system admiinstrator is
> stupid enough to allow an untrusted user to mount an untrusted,
> maliciously crafted file system, instead of using fuse2fs.  The
> security people thinkt his is super-duper important.  Personally, I
> don't think it's all that urgent, so by all means, don't feel obliged
> to think about this while on vacation.  :-)
>
> Anyway, that's the context.  Deepanshu has a proposed fix here[2]
> which puts a folio_lock() into every single write page fault for ext4:
>
> +       folio_lock(folio);
> +       if (!folio_test_uptodate(folio)) {
> +               folio_unlock(folio);
> +               ret = VM_FAULT_SIGBUS;
> +               goto out;
> +       }
> +       folio_unlock(folio);
>
> [2] https://lore.kernel.org/r/20251122015742.362444-1-kartikey406@gmail.com
>
> This seems.... unfortunate to me, so the first question is, "is
> locking the folio really necessary"?  (I suspect the answer is no),
> and two, should this check be done in the mm layer calling
> page_mkwrite(), or in ext4_page_mkwrite()?
>
> Presumably, this might happen for other file systems, with either
> syzkaller coming up with this rather implausible scenario of really
> stupid, unfortunately system adminsitrator choices --- or in real
> life, if we do have some system adminisrtators making really stupid,
> unfortunate life choices.  So maybe we should this check should be
> done above the file system layer?
>
>                                                 - Ted
>
>
>
>
>
>
>
> >
> > On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote:
> > > My main concern with your patch is folio_lock() is *incredibly*
> > > heavyweight and is going to be a real scalability concern if we need
> > > to take it every single time we need to make a page writeable.
> > >
> > > So could we perhaps do something like this?  So the first question is
> > > do we need to take the lock at all?  I'm not sure we need to worry
> > > about the case where the page is not uptodate because we're racing
> > > with the page being brought into memory; if we that could happen under
> > > normal circumstances we would be triggering the warning even without
> > > these situations such as a delayed allocaiton write failing due to a
> > > corrupted file system image.   So can we just do this?
> > >
> > >     if (!folio_test_uptodate(folio)) {
> > >             ret = VM_FAULT_SIGBUS;
> > >             goto out;
> > >     }
> > >
> > > If it is legitmate that ext4_page_mkwrite() could be called while the
> > > page is still being read in (and again, I don't think it is), then we
> > > could do something like this:
> > >
> > >     if (!folio_test_uptodate(folio)) {
> > >             folio_lock(folio);
> > >             if (!folio_test_uptodate(folio)) {
> > >                     folio_unlock(folio);
> > >                     ret = VM_FAULT_SIGBUS;
> > >                     goto out;
> > >             }
> > >             folio_unlock(folio);
> > >     }
> > >
> > > Matthew, as the page cache maintainer, do we actually need this extra
> > > rigamarole.  Or can we just skip taking the lock before checking to
> > > see if the folio is uptodate in ext4_page_mkwrite()?
> > >
> > >                                                     - Ted


Hi Ted,

Thank you for the detailed summary and context.

Based on Matthew's earlier feedback that we need to "prevent !uptodate
folios from being referenced by the page tables," I believe the
correct fix is not in ext4_page_mkwrite() at all, but rather in
mpage_release_unused_pages().

When we invalidate folios due to writeback failure, we should also
unmap them from page tables:

  static void mpage_release_unused_pages(..., bool invalidate)
  {
      // ... existing code ...

      if (invalidate) {
          if (folio_mapped(folio))
              folio_clear_dirty_for_io(folio);

          // Unmap from page tables before invalidating
          if (folio_mapped(folio))
              unmap_mapping_folio(folio);

          block_invalidate_folio(folio, 0, folio_size(folio));
          folio_clear_uptodate(folio);
      }
  }

This way:
1. No performance impact on normal operations (only in error path)
2. No folio_lock() needed in ext4_page_mkwrite()
3. Prevents stale PTEs from referencing invalidated folios
4. Any subsequent access triggers a new page fault instead

To address your question about whether this should be in MM layer:
other filesystems could hit this same issue if they use delayed
allocation and invalidate folios on writeback failure. However,
since the invalidation is happening in ext4-specific code
(mpage_release_unused_pages), fixing it there seems appropriate.
If this pattern appears in other filesystems, perhaps the fix
could be moved to a common helper.

I'll send v3 with this approach. Does this look correct?

Best regards,
Deepanshu