[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2dyj6zrxbd2wjnor2wswis5p5z7brtfgzjnhbexhjsd3kqnvx2@y6i2wnvr6gdr>
Date: Wed, 22 Oct 2025 08:38:30 +0100
From: Pedro Falcato <pfalcato@...e.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Kiryl Shutsemau <kirill@...temov.name>,
Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>,
Matthew Wilcox <willy@...radead.org>, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, Kiryl Shutsemau <kas@...nel.org>
Subject: Re: [PATCH] mm/filemap: Implement fast short reads
On Tue, Oct 21, 2025 at 09:13:28PM -1000, Linus Torvalds wrote:
> On Tue, 21 Oct 2025 at 21:08, Pedro Falcato <pfalcato@...e.de> wrote:
> >
> > I think we may still have a problematic (rare, possibly theoretical) race here where:
> >
> > T0 T1 T3
> > filemap_read_fast_rcu() | |
> > folio = xas_load(&xas); | |
> > /* ... */ | /* truncate or reclaim frees folio, bumps delete |
> > | seq */ | folio_alloc() from e.g secretmem
> > | | set_direct_map_invalid_noflush(!!)
> > memcpy_from_file_folio() | |
> >
> > We may have to use copy_from_kernel_nofault() here? Or is something else stopping this from happening?
>
> Explain how the sequence count doesn't catch this?
>
> We read the sequence count before we do the xas_load(), and we verify
> it after we've done the memcpy_from_folio.
>
> The whole *point* is that the copy itself is not race-free. That's
> *why* we do the sequence count.
>
> And only after the sequence count has been verified do we then copy
> the result to user space.
>
> So the "maybe this buffer content is garbage" happens, but it only
> happens in the temporary kernel on-stack buffer, not visibly to the
> user.
The problem isn't that the contents might be garbage, but that the direct map
may be swept from under us, as we don't have a reference to the folio. So the
folio can be transparently freed under us (as designed), but some user can
call fun stuff like set_direct_map_invalid_noflush() and we're not handling
any "oopsie we faulted reading the folio" here. The sequence count doesn't
help here, because we, uhh, faulted. Does this make sense?
TL;DR I don't think it's safe to touch the direct map of folios we don't own
without the seatbelt of a copy_from_kernel_nofault or so.
--
Pedro
Powered by blists - more mailing lists