[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260107014601.dxvq6b7ljgxwg7iu@master>
Date: Wed, 7 Jan 2026 01:46:01 +0000
From: Wei Yang <richard.weiyang@...il.com>
To: Barry Song <21cnbao@...il.com>
Cc: Wei Yang <richard.weiyang@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
akpm@...ux-foundation.org, david@...nel.org,
catalin.marinas@....com, will@...nel.org,
lorenzo.stoakes@...cle.com, ryan.roberts@....com,
Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
surenb@...gle.com, mhocko@...e.com, riel@...riel.com,
harry.yoo@...cle.com, jannh@...gle.com, willy@...radead.org,
dev.jain@....com, linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file
large folios
On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
>On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@...il.com> wrote:
>>
>> On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
>> >Similar to folio_referenced_one(), we can apply batched unmapping for file
>> >large folios to optimize the performance of file folios reclamation.
>> >
>> >Barry previously implemented batched unmapping for lazyfree anonymous large
>> >folios[1] and did not further optimize anonymous large folios or file-backed
>> >large folios at that stage. As for file-backed large folios, the batched
>> >unmapping support is relatively straightforward, as we only need to clear
>> >the consecutive (present) PTE entries for file-backed large folios.
>> >
>> >Performance testing:
>> >Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
>> >reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
>> >75% performance improvement on my Arm64 32-core server (and 50%+ improvement
>> >on my X86 machine) with this patch.
>> >
>> >W/o patch:
>> >real 0m1.018s
>> >user 0m0.000s
>> >sys 0m1.018s
>> >
>> >W/ patch:
>> >real 0m0.249s
>> >user 0m0.000s
>> >sys 0m0.249s
>> >
>> >[1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
>> >Reviewed-by: Ryan Roberts <ryan.roberts@....com>
>> >Acked-by: Barry Song <baohua@...nel.org>
>> >Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> >---
>> > mm/rmap.c | 7 ++++---
>> > 1 file changed, 4 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/mm/rmap.c b/mm/rmap.c
>> >index 985ab0b085ba..e1d16003c514 100644
>> >--- a/mm/rmap.c
>> >+++ b/mm/rmap.c
>> >@@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> > end_addr = pmd_addr_end(addr, vma->vm_end);
>> > max_nr = (end_addr - addr) >> PAGE_SHIFT;
>> >
>> >- /* We only support lazyfree batching for now ... */
>> >- if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
>> >+ /* We only support lazyfree or file folios batching for now ... */
>> >+ if (folio_test_anon(folio) && folio_test_swapbacked(folio))
>> > return 1;
>> >+
>> > if (pte_unused(pte))
>> > return 1;
>> >
>> >@@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> > *
>> > * See Documentation/mm/mmu_notifier.rst
>> > */
>> >- dec_mm_counter(mm, mm_counter_file(folio));
>> >+ add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
>> > }
>> > discard:
>> > if (unlikely(folio_test_hugetlb(folio))) {
>> >--
>> >2.47.3
>> >
>>
>> Hi, Baolin
>>
>> When reading your patch, I come up one small question.
>>
>> Current try_to_unmap_one() has following structure:
>>
>> try_to_unmap_one()
>> while (page_vma_mapped_walk(&pvmw)) {
>> nr_pages = folio_unmap_pte_batch()
>>
>> if (nr_pages = folio_nr_pages(folio))
>> goto walk_done;
>> }
>>
>> I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
>>
>> If my understanding is correct, page_vma_mapped_walk() would start from
>> (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
>> (pvmw->address + nr_pages * PAGE_SIZE), right?
>>
>> Not sure my understanding is correct, if so do we have some reason not to
>> skip the cleared range?
>
>I don’t quite understand your question. For nr_pages > 1 but not equal
>to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
>
>take a look:
>
>next_pte:
> do {
> pvmw->address += PAGE_SIZE;
> if (pvmw->address >= end)
> return not_found(pvmw);
> /* Did we cross page table boundary? */
> if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
> if (pvmw->ptl) {
> spin_unlock(pvmw->ptl);
> pvmw->ptl = NULL;
> }
> pte_unmap(pvmw->pte);
> pvmw->pte = NULL;
> pvmw->flags |= PVMW_PGTABLE_CROSSED;
> goto restart;
> }
> pvmw->pte++;
> } while (pte_none(ptep_get(pvmw->pte)));
>
Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
will be skipped.
I mean maybe we can skip it in try_to_unmap_one(), for example:
diff --git a/mm/rmap.c b/mm/rmap.c
index 9e5bd4834481..ea1afec7c802 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
if (nr_pages == folio_nr_pages(folio))
goto walk_done;
+ else {
+ pvmw.address += PAGE_SIZE * (nr_pages - 1);
+ pvmw.pte += nr_pages - 1;
+ }
continue;
walk_abort:
ret = false;
Not sure this is reasonable.
--
Wei Yang
Help you, Help me
Powered by blists - more mailing lists