[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wNiKfKpTPKUNim6-X=KiMxhp_9sPh+hkh8gB8AR2ue7g@mail.gmail.com>
Date: Tue, 25 Nov 2025 17:29:53 +0800
From: Barry Song <21cnbao@...il.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, david@...nel.org, catalin.marinas@....com,
will@...nel.org, lorenzo.stoakes@...cle.com, ryan.roberts@....com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
mhocko@...e.com, riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com,
willy@...radead.org, linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] support batched checks of the references for large folios
Hi Baolin,
On Tue, Nov 25, 2025 at 8:57 AM Baolin Wang
<baolin.wang@...ux.alibaba.com> wrote:
>
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> By supporting batched checking of the young flags and flushing TLB entries,
> I observed a 33% performance improvement in my file-backed folios reclaim tests.
nice!
>
> BTW, I still noticed a hotspot in try_to_unmap() in my test. Hope Barry can
> resend the optimization patch for try_to_unmap() [1].
Thanks for waking me up. Yes, it's still on my list—I've just had a lot of
non-technical issues come up that seriously slowed my progress. Sorry for
the delay.
And I suppose we also need that for try_to_migrate().
>
> [1] https://lore.kernel.org/all/20250513084620.58231-1-21cnbao@gmail.com/
>
> Baolin Wang (2):
> arm64: mm: support batch clearing of the young flag for large folios
> mm: rmap: support batched checks of the references for large folios
>
> arch/arm64/include/asm/pgtable.h | 23 ++++++++++++-----
> arch/arm64/mm/contpte.c | 44 ++++++++++++++++++++++----------
> include/linux/mmu_notifier.h | 9 ++++---
> include/linux/pgtable.h | 19 ++++++++++++++
> mm/rmap.c | 22 ++++++++++++++--
> 5 files changed, 92 insertions(+), 25 deletions(-)
>
Thanks
Barry
Powered by blists - more mailing lists