lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xfVnPKSPvHYWa1cPK7kZC4F6hKreny9RsKfSCyC4RHuQ@mail.gmail.com>
Date: Wed, 7 Jan 2026 15:21:08 +1300
From: Barry Song <21cnbao@...il.com>
To: Wei Yang <richard.weiyang@...il.com>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>, akpm@...ux-foundation.org, 
	david@...nel.org, catalin.marinas@....com, will@...nel.org, 
	lorenzo.stoakes@...cle.com, ryan.roberts@....com, Liam.Howlett@...cle.com, 
	vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, 
	riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com, willy@...radead.org, 
	dev.jain@....com, linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios

On Wed, Jan 7, 2026 at 2:46 PM Wei Yang <richard.weiyang@...il.com> wrote:
>
> On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
> >On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@...il.com> wrote:
> >>
> >> On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
> >> >Similar to folio_referenced_one(), we can apply batched unmapping for file
> >> >large folios to optimize the performance of file folios reclamation.
> >> >
> >> >Barry previously implemented batched unmapping for lazyfree anonymous large
> >> >folios[1] and did not further optimize anonymous large folios or file-backed
> >> >large folios at that stage. As for file-backed large folios, the batched
> >> >unmapping support is relatively straightforward, as we only need to clear
> >> >the consecutive (present) PTE entries for file-backed large folios.
> >> >
> >> >Performance testing:
> >> >Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
> >> >reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
> >> >75% performance improvement on my Arm64 32-core server (and 50%+ improvement
> >> >on my X86 machine) with this patch.
> >> >
> >> >W/o patch:
> >> >real    0m1.018s
> >> >user    0m0.000s
> >> >sys     0m1.018s
> >> >
> >> >W/ patch:
> >> >real   0m0.249s
> >> >user   0m0.000s
> >> >sys    0m0.249s
> >> >
> >> >[1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
> >> >Reviewed-by: Ryan Roberts <ryan.roberts@....com>
> >> >Acked-by: Barry Song <baohua@...nel.org>
> >> >Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
> >> >---
> >> > mm/rmap.c | 7 ++++---
> >> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >> >
> >> >diff --git a/mm/rmap.c b/mm/rmap.c
> >> >index 985ab0b085ba..e1d16003c514 100644
> >> >--- a/mm/rmap.c
> >> >+++ b/mm/rmap.c
> >> >@@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> >> >       end_addr = pmd_addr_end(addr, vma->vm_end);
> >> >       max_nr = (end_addr - addr) >> PAGE_SHIFT;
> >> >
> >> >-      /* We only support lazyfree batching for now ... */
> >> >-      if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
> >> >+      /* We only support lazyfree or file folios batching for now ... */
> >> >+      if (folio_test_anon(folio) && folio_test_swapbacked(folio))
> >> >               return 1;
> >> >+
> >> >       if (pte_unused(pte))
> >> >               return 1;
> >> >
> >> >@@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >> >                        *
> >> >                        * See Documentation/mm/mmu_notifier.rst
> >> >                        */
> >> >-                      dec_mm_counter(mm, mm_counter_file(folio));
> >> >+                      add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
> >> >               }
> >> > discard:
> >> >               if (unlikely(folio_test_hugetlb(folio))) {
> >> >--
> >> >2.47.3
> >> >
> >>
> >> Hi, Baolin
> >>
> >> When reading your patch, I come up one small question.
> >>
> >> Current try_to_unmap_one() has following structure:
> >>
> >>     try_to_unmap_one()
> >>         while (page_vma_mapped_walk(&pvmw)) {
> >>             nr_pages = folio_unmap_pte_batch()
> >>
> >>             if (nr_pages = folio_nr_pages(folio))
> >>                 goto walk_done;
> >>         }
> >>
> >> I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
> >>
> >> If my understanding is correct, page_vma_mapped_walk() would start from
> >> (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
> >> (pvmw->address + nr_pages * PAGE_SIZE), right?
> >>
> >> Not sure my understanding is correct, if so do we have some reason not to
> >> skip the cleared range?
> >
> >I don’t quite understand your question. For nr_pages > 1 but not equal
> >to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
> >
> >take a look:
> >
> >next_pte:
> >                do {
> >                        pvmw->address += PAGE_SIZE;
> >                        if (pvmw->address >= end)
> >                                return not_found(pvmw);
> >                        /* Did we cross page table boundary? */
> >                        if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
> >                                if (pvmw->ptl) {
> >                                        spin_unlock(pvmw->ptl);
> >                                        pvmw->ptl = NULL;
> >                                }
> >                                pte_unmap(pvmw->pte);
> >                                pvmw->pte = NULL;
> >                                pvmw->flags |= PVMW_PGTABLE_CROSSED;
> >                                goto restart;
> >                        }
> >                        pvmw->pte++;
> >                } while (pte_none(ptep_get(pvmw->pte)));
> >
>
> Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
> will be skipped.
>
> I mean maybe we can skip it in try_to_unmap_one(), for example:
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9e5bd4834481..ea1afec7c802 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>                  */
>                 if (nr_pages == folio_nr_pages(folio))
>                         goto walk_done;
> +               else {
> +                       pvmw.address += PAGE_SIZE * (nr_pages - 1);
> +                       pvmw.pte += nr_pages - 1;
> +               }
>                 continue;
>  walk_abort:
>                 ret = false;


I feel this couples the PTE walk iteration with the unmap
operation, which does not seem fine to me. It also appears
to affect only corner cases.

>
> Not sure this is reasonable.
>

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ