[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMgjq7BJ0uHCWoZgum+19dk=_tgkT1vLu0Qs=4dFEe0go1UTRA@mail.gmail.com>
Date: Wed, 26 Nov 2025 01:38:37 +0800
From: Kairui Song <ryncsn@...il.com>
To: Barry Song <21cnbao@...il.com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
david@...nel.org, catalin.marinas@....com, will@...nel.org,
lorenzo.stoakes@...cle.com, ryan.roberts@....com, Liam.Howlett@...cle.com,
vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com, willy@...radead.org,
Chris Li <chrisl@...nel.org>, linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] support batched checks of the references for large folios
On Tue, Nov 25, 2025 at 6:15 PM Barry Song <21cnbao@...il.com> wrote:
>
> Hi Baolin,
>
> On Tue, Nov 25, 2025 at 8:57 AM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
> >
> > Currently, folio_referenced_one() always checks the young flag for each PTE
> > sequentially, which is inefficient for large folios. This inefficiency is
> > especially noticeable when reclaiming clean file-backed large folios, where
> > folio_referenced() is observed as a significant performance hotspot.
> >
> > Moreover, on Arm architecture, which supports contiguous PTEs, there is already
> > an optimization to clear the young flags for PTEs within a contiguous range.
> > However, this is not sufficient. We can extend this to perform batched operations
> > for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
> >
> > By supporting batched checking of the young flags and flushing TLB entries,
> > I observed a 33% performance improvement in my file-backed folios reclaim tests.
>
> nice!
>
> >
> > BTW, I still noticed a hotspot in try_to_unmap() in my test. Hope Barry can
> > resend the optimization patch for try_to_unmap() [1].
>
> Thanks for waking me up. Yes, it's still on my list—I've just had a lot of
> non-technical issues come up that seriously slowed my progress. Sorry for
> the delay.
>
> And I suppose we also need that for try_to_migrate().
>
> >
> > [1] https://lore.kernel.org/all/20250513084620.58231-1-21cnbao@gmail.com/
Hi Barry, Baolin.
About the try_to_unmap part, I also noticed that patch and the comment
issue "We only support batched swap_duplicate() for unmapping" in that
patch. I guess one reason is add_swap_count_continuation right? That
limitation will be killed by swap table phase 3:
It can be previewed here:
https://lore.kernel.org/linux-mm/20250514201729.48420-28-ryncsn@gmail.com/
And I think we will be able to handle that much easier by then. Sorry
that it is taking a while to land upstream though.
Powered by blists - more mailing lists