[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ba95609-302b-456a-a863-2bd5df51baf2@redhat.com>
Date: Wed, 25 Jun 2025 13:01:28 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>, Lance Yang <lance.yang@...ux.dev>
Cc: akpm@...ux-foundation.org, baolin.wang@...ux.alibaba.com,
chrisl@...nel.org, kasong@...cent.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-riscv@...ts.infradead.org, lorenzo.stoakes@...cle.com,
ryan.roberts@....com, v-songbaohua@...o.com, x86@...nel.org,
ying.huang@...el.com, zhengtangquan@...o.com,
Lance Yang <ioworker0@...il.com>
Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large
folios during reclamation
On 25.06.25 12:57, Barry Song wrote:
>>>
>>> Note that I don't quite understand why we have to batch the whole thing
>>> or fallback to
>>> individual pages. Why can't we perform other batches that span only some
>>> PTEs? What's special
>>> about 1 PTE vs. 2 PTEs vs. all PTEs?
>>
>> That's a good point about the "all-or-nothing" batching logic ;)
>>
>> It seems the "all-or-nothing" approach is specific to the lazyfree use
>> case, which needs to unmap the entire folio for reclamation. If that's
>> not possible, it falls back to the single-page slow path.
>
> Other cases advance the PTE themselves, while try_to_unmap_one() relies
> on page_vma_mapped_walk() to advance the PTE. Unless we want to manually
> modify pvmw.pte and pvmw.address outside of page_vma_mapped_walk(), which
> to me seems like a violation of layers. :-)
Please explain to me why the following is not clearer and better:
diff --git a/mm/rmap.c b/mm/rmap.c
index 8200d705fe4ac..09e2c2f28aa58 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1845,23 +1845,31 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page,
#endif
}
-/* We support batch unmapping of PTEs for lazyfree large folios */
-static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
- struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
+ struct page_vma_mapped_walk *pvmw, enum ttu_flags flags,
+ pte_t pte)
{
const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
- int max_nr = folio_nr_pages(folio);
- pte_t pte = ptep_get(ptep);
+ struct vm_area_struct *vma = pvmw->vma;
+ unsigned long end_addr, addr = pvmw->address;
+ unsigned int max_nr;
+
+ if (flags & TTU_HWPOISON)
+ return 1;
+ if (!folio_test_large(folio))
+ return 1;
+
+ /* We may only batch within a single VMA and a single page table. */
+ end_addr = min_t(unsigned long, ALIGN(addr + 1, PMD_SIZE), vma->vm_end);
+ max_nr = (end_addr - addr) >> PAGE_SHIFT;
+ /* We only support lazyfree batching for now ... */
if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
- return false;
+ return 1;
if (pte_unused(pte))
- return false;
- if (pte_pfn(pte) != folio_pfn(folio))
- return false;
-
- return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
- NULL, NULL) == max_nr;
+ return 1;
+ return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
+ NULL, NULL, NULL);
}
/*
@@ -2024,9 +2032,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (pte_dirty(pteval))
folio_mark_dirty(folio);
} else if (likely(pte_present(pteval))) {
- if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
- can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
- nr_pages = folio_nr_pages(folio);
+ nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
end_addr = address + nr_pages * PAGE_SIZE;
flush_cache_range(vma, address, end_addr);
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists