linux-kernel - Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJnrk1bOE+k7xdwr3umTDxfB1wPnGgYcAFJ3JFXqQLPjyy4Qhg@mail.gmail.com>
Date: Mon, 29 Dec 2025 16:58:31 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: Sasha Levin <sashal@...nel.org>
Cc: Matthew Wilcox <willy@...radead.org>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate
 and folio_end_read

On Wed, Dec 24, 2025 at 1:21 PM Sasha Levin <sashal@...nel.org> wrote:
>
> On Wed, Dec 24, 2025 at 05:27:03PM +0000, Matthew Wilcox wrote:
> >
> > WARNING: fs/iomap/buffered-io.c:254 at ifs_free+0x130/0x148, CPU#0: msync04/406
> >
> >That's this one:
> >
> >        WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
> >                        folio_test_uptodate(folio));
> >
> >which would be fully explained by fuse calling folio_clear_uptodate()
> >in fuse_send_write_pages().  I have come to believe that allowing
> >filesystems to call folio_clear_uptodate() is just dangerous.  It
> >causes assertions to fire all over the place (eg if the page is mapped
> >into memory, the MM contains assertions that it must be uptodate).
> >
> >So I think the first step is simply to delete the folio_clear_uptodate()
> >calls in fuse:

Hmm... this fuse_perform_write() call path is for writethrough. In
writethrough, fuse first writes the data to the page cache and then to
the server. I think because we're doing the writes in that order (eg
first to the page cache, then the server), the clear uptodate is
needed if the server write is a short write or an error since we can't
revert the page cache data back to its original content (eg we want to
write 2 KB starting at offset 0, the folio representing that in the
page cache is uptodate, we retrieve that folio and write 2 KB to it,
then when we try writing it to the server, the server can only write
out 1 KB, where now there's a discrepancy between the page cache
contents and the disk contents, where we're unable to make these
consistent by undoing the page cache write for the chunk between 1 KB
and 2 KB). If we could switch the ordering and write it to the server
first and then to the page cache, then we could get rid of the clear
uptodates, but to switch this ordering requires a bigger change where
we'd need to add support for copying out data from a userspace iter to
the server (currently, only copying out data from folios are
supported). I'm happy to work on this though if you think we should
try our best to fully eradicate folio_clear_uptodate() from fuse.

There's also another folio_clear_uptodate() call in
fuse_try_move_folio() in fuse/dev.c when the server gifts pages to the
kernel through vmsplice. This one I think is needed else
folio_end_read() will xor uptodate state of an already uptodate folio
(commit 76a51ac ("fuse: clear PG_uptodate when using a stolen page")
says a bit more about this).

> [snip]
>
> Here's the log of a run with the change you've provided applied: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.18-rc7-13807-g26a15474eb13/testrun/30620754/suite/log-parser-test/test/exception-warning-fsiomapbuffered-io-at-ifs_free/log

Hmm, I think this WARN_ON_ONCE is getting triggered from the
folio_mark_uptodate() call in fuse_fill_write_pages().

This is happening because iomap integration hasn't (yet) been added to
the fuse writethrough path, as it's not necessary / urgent (whereas
for buffered writes, it is in order for fuse to use large folios). imo
updating the folio uptodate/dirty state but not the bitmap is
logically fine as the worst outcome from this is that we miss being
able to skip some extra read calls that we could saved if we did add
the iomap bitmap integration. However, I didn't realize there's a
WARN_ON_ONCE checking the ifs uptodate bitmap state (but curiously no
WARN_ON_ONCE checking the ifs dirty bitmap state).

With that said, I think it makes sense to either a) do the
iomap_set_range_uptodate() / iomap_clear_folio_uptodate() bitmap
updating you proposed as a fix for this WARN_ON_ONCE for now to
unblock things, until iomap integration gets added to the fuse
writethrough path, which I'll now prioritize, or b) remove that
warning. The warning does seem otherwise useful though so it seems
like we should probably just go with a).

Thanks,
Joanne

>
> --
> Thanks,
> Sasha