linux-kernel - Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd626b19-2636-477e-ab01-26380b12d4b1@linux.alibaba.com>
Date: Wed, 7 Jan 2026 10:29:18 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Barry Song <21cnbao@...il.com>, Wei Yang <richard.weiyang@...il.com>
Cc: akpm@...ux-foundation.org, david@...nel.org, catalin.marinas@....com,
 will@...nel.org, lorenzo.stoakes@...cle.com, ryan.roberts@....com,
 Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
 mhocko@...e.com, riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com,
 willy@...radead.org, dev.jain@....com, linux-mm@...ck.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large
 folios



On 1/7/26 10:21 AM, Barry Song wrote:
> On Wed, Jan 7, 2026 at 2:46 PM Wei Yang <richard.weiyang@...il.com> wrote:
>>
>> On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
>>> On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@...il.com> wrote:
>>>>
>>>> On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
>>>>> Similar to folio_referenced_one(), we can apply batched unmapping for file
>>>>> large folios to optimize the performance of file folios reclamation.
>>>>>
>>>>> Barry previously implemented batched unmapping for lazyfree anonymous large
>>>>> folios[1] and did not further optimize anonymous large folios or file-backed
>>>>> large folios at that stage. As for file-backed large folios, the batched
>>>>> unmapping support is relatively straightforward, as we only need to clear
>>>>> the consecutive (present) PTE entries for file-backed large folios.
>>>>>
>>>>> Performance testing:
>>>>> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
>>>>> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
>>>>> 75% performance improvement on my Arm64 32-core server (and 50%+ improvement
>>>>> on my X86 machine) with this patch.
>>>>>
>>>>> W/o patch:
>>>>> real    0m1.018s
>>>>> user    0m0.000s
>>>>> sys     0m1.018s
>>>>>
>>>>> W/ patch:
>>>>> real   0m0.249s
>>>>> user   0m0.000s
>>>>> sys    0m0.249s
>>>>>
>>>>> [1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
>>>>> Reviewed-by: Ryan Roberts <ryan.roberts@....com>
>>>>> Acked-by: Barry Song <baohua@...nel.org>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>>> ---
>>>>> mm/rmap.c | 7 ++++---
>>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 985ab0b085ba..e1d16003c514 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>>>>>        end_addr = pmd_addr_end(addr, vma->vm_end);
>>>>>        max_nr = (end_addr - addr) >> PAGE_SHIFT;
>>>>>
>>>>> -      /* We only support lazyfree batching for now ... */
>>>>> -      if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
>>>>> +      /* We only support lazyfree or file folios batching for now ... */
>>>>> +      if (folio_test_anon(folio) && folio_test_swapbacked(folio))
>>>>>                return 1;
>>>>> +
>>>>>        if (pte_unused(pte))
>>>>>                return 1;
>>>>>
>>>>> @@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>>                         *
>>>>>                         * See Documentation/mm/mmu_notifier.rst
>>>>>                         */
>>>>> -                      dec_mm_counter(mm, mm_counter_file(folio));
>>>>> +                      add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
>>>>>                }
>>>>> discard:
>>>>>                if (unlikely(folio_test_hugetlb(folio))) {
>>>>> --
>>>>> 2.47.3
>>>>>
>>>>
>>>> Hi, Baolin
>>>>
>>>> When reading your patch, I come up one small question.
>>>>
>>>> Current try_to_unmap_one() has following structure:
>>>>
>>>>      try_to_unmap_one()
>>>>          while (page_vma_mapped_walk(&pvmw)) {
>>>>              nr_pages = folio_unmap_pte_batch()
>>>>
>>>>              if (nr_pages = folio_nr_pages(folio))
>>>>                  goto walk_done;
>>>>          }
>>>>
>>>> I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
>>>>
>>>> If my understanding is correct, page_vma_mapped_walk() would start from
>>>> (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
>>>> (pvmw->address + nr_pages * PAGE_SIZE), right?
>>>>
>>>> Not sure my understanding is correct, if so do we have some reason not to
>>>> skip the cleared range?
>>>
>>> I don’t quite understand your question. For nr_pages > 1 but not equal
>>> to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
>>>
>>> take a look:
>>>
>>> next_pte:
>>>                 do {
>>>                         pvmw->address += PAGE_SIZE;
>>>                         if (pvmw->address >= end)
>>>                                 return not_found(pvmw);
>>>                         /* Did we cross page table boundary? */
>>>                         if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
>>>                                 if (pvmw->ptl) {
>>>                                         spin_unlock(pvmw->ptl);
>>>                                         pvmw->ptl = NULL;
>>>                                 }
>>>                                 pte_unmap(pvmw->pte);
>>>                                 pvmw->pte = NULL;
>>>                                 pvmw->flags |= PVMW_PGTABLE_CROSSED;
>>>                                 goto restart;
>>>                         }
>>>                         pvmw->pte++;
>>>                 } while (pte_none(ptep_get(pvmw->pte)));
>>>
>>
>> Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
>> will be skipped.
>>
>> I mean maybe we can skip it in try_to_unmap_one(), for example:
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 9e5bd4834481..ea1afec7c802 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>                   */
>>                  if (nr_pages == folio_nr_pages(folio))
>>                          goto walk_done;
>> +               else {
>> +                       pvmw.address += PAGE_SIZE * (nr_pages - 1);
>> +                       pvmw.pte += nr_pages - 1;
>> +               }
>>                  continue;
>>   walk_abort:
>>                  ret = false;
> 
> 
> I feel this couples the PTE walk iteration with the unmap
> operation, which does not seem fine to me. It also appears
> to affect only corner cases.

Agree. There may be no performance gains, so I also prefer to leave it 
as is.