[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiWmTpQwz5FZ_=At_Tw+Nm_5Fcy-9is_jXCMo9T0mshZQ@mail.gmail.com>
Date: Mon, 27 Oct 2025 08:50:53 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Kiryl Shutsemau <kirill@...temov.name>, David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Yang Shi <shy828301@...il.com>, Dave Chinner <david@...morbit.com>,
Suren Baghdasaryan <surenb@...gle.com>, linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/filemap: Implement fast short reads
On Mon, 27 Oct 2025 at 03:49, Hugh Dickins <hughd@...gle.com> wrote:
>
> This makes a fundamental change to speculative page cache assumptions.
Yes, but I'm a bit surprised by people who find that scary.
The page cache does *much* more scary things elsewhere, particularly
the whole folio_try_get() dance (in filemap_get_entry() and other
places).
I suspect you ignore that just because it's been that way forever, so
you're comfortable with it.
I'd argue that that is much *much* more subtle because it means that
somebody may be incrementing the page count of a page that has already
been re-allocated by somebody else.
Talk about cognitive load: that code makes you think that "hey, the
tryget means that if it has been released, we don't get a ref to it",
because that's how many of our *other* speculative RCU accesses do in
fact work.
But that's not how the page cache works, exactly because freeing isn't
actually RCU-delayed.
So while the code visually follows the exact same pattern as some
other "look up speculatively under RCU, skip if it's not there any
more", it actually does exactly the same thing as the "copy data under
RCU, then check later if it was ok". Except it does "increment
refcount under RCU, then check later if it was actually valid".
That said, I wonder if we might not consider making page cache freeing
be RCU-delayed. This has come up before (exactly *because* of that
"folio_try_get()").
Because while I am pretty sure that filemap_get_entry() is fine (and a
number of other core users), I'm not convinced that some of the other
users of folio_try_get() are necessarily aware of just how subtle that
thing is.
Anyway, I'm certainly not going to push that patch very hard.
But I do think that a "3x performance improvement on a case that is
known to be an issue for at least one real-world customer" shouldn't
be called "a niche case". I've seen *way* more niche than that.
(I do think RCU-freeing folios would potentially be an interesting
thing to look into, but I don't think the patch under discussion is
necessarily the reason to do so).
Linus
Powered by blists - more mailing lists