[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46d0b6bf-db7e-418f-a497-983db4d4d786@kernel.org>
Date: Mon, 9 Feb 2026 10:38:02 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org,
catalin.marinas@....com, will@...nel.org
Cc: lorenzo.stoakes@...cle.com, ryan.roberts@....com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
mhocko@...e.com, riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com,
willy@...radead.org, baohua@...nel.org, dev.jain@....com,
linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large
folios
On 12/26/25 07:07, Baolin Wang wrote:
> Similar to folio_referenced_one(), we can apply batched unmapping for file
> large folios to optimize the performance of file folios reclamation.
>
> Barry previously implemented batched unmapping for lazyfree anonymous large
> folios[1] and did not further optimize anonymous large folios or file-backed
> large folios at that stage. As for file-backed large folios, the batched
> unmapping support is relatively straightforward, as we only need to clear
> the consecutive (present) PTE entries for file-backed large folios.
>
> Performance testing:
> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
> 75% performance improvement on my Arm64 32-core server (and 50%+ improvement
> on my X86 machine) with this patch.
>
> W/o patch:
> real 0m1.018s
> user 0m0.000s
> sys 0m1.018s
>
> W/ patch:
> real 0m0.249s
> user 0m0.000s
> sys 0m0.249s
>
> [1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
> Reviewed-by: Ryan Roberts <ryan.roberts@....com>
> Acked-by: Barry Song <baohua@...nel.org>
> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
> ---
> mm/rmap.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 985ab0b085ba..e1d16003c514 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> end_addr = pmd_addr_end(addr, vma->vm_end);
> max_nr = (end_addr - addr) >> PAGE_SHIFT;
>
> - /* We only support lazyfree batching for now ... */
> - if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
> + /* We only support lazyfree or file folios batching for now ... */
> + if (folio_test_anon(folio) && folio_test_swapbacked(folio))
> return 1;
Right, the anon folio handling would require a bit more work in the
} else if (folio_test_anon(folio)) {
branch.
Do you intend to tackle that one as well?
I'll reply to the fixup.
--
Cheers,
David
Powered by blists - more mailing lists