[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <341d1aed-13ad-41ee-ad30-487c5baec399@kernel.org>
Date: Mon, 1 Dec 2025 17:23:13 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
catalin.marinas@....com, will@...nel.org
Cc: lorenzo.stoakes@...cle.com, ryan.roberts@....com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
mhocko@...e.com, riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com,
willy@...radead.org, baohua@...nel.org, linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] support batched checks of the references for large
folios
On 11/25/25 01:56, Baolin Wang wrote:
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> By supporting batched checking of the young flags and flushing TLB entries,
> I observed a 33% performance improvement in my file-backed folios reclaim tests.
Can you point at the benchmark or briefly explain what it does? What
exactly are we measuring that improves by 33%?
--
Cheers
David
Powered by blists - more mailing lists