[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a985416-c8c5-429f-a83a-3c66be939439@linux.alibaba.com>
Date: Thu, 15 May 2025 11:40:59 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>,
David Hildenbrand <david@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
Kairui Song <kasong@...cent.com>, Chris Li <chrisl@...nel.org>,
Baoquan He <bhe@...hat.com>, Dan Schatzberg <schatzberg.dan@...il.com>,
Kaixiong Yu <yukaixiong@...wei.com>, Fan Ni <fan.ni@...sung.com>,
Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH RFC] mm: make try_to_unmap_one support batched unmap for
anon large folios
On 2025/5/15 09:35, Barry Song wrote:
> On Wed, May 14, 2025 at 8:11 PM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/5/13 16:46, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@...o.com>
>>>
>>> My commit 354dffd29575c ("mm: support batched unmap for lazyfree large
>>> folios during reclamation") introduced support for unmapping entire
>>> lazyfree anonymous large folios at once, instead of one page at a time.
>>> This patch extends that support to generic (non-lazyfree) anonymous
>>> large folios.
>>>
>>> Handling __folio_try_share_anon_rmap() and swap_duplicate() becomes
>>> extremely complex—if not outright impractical—for non-exclusive
>>> anonymous folios. As a result, this patch limits support to exclusive
>>> large folios. Fortunately, most anonymous folios are exclusive in
>>> practice, so this restriction should be acceptable in the majority of
>>> cases.
>>>
>>> SPARC is currently the only architecture that implements
>>> arch_unmap_one(), which also needs to be batched for consistency.
>>> However, this is not yet supported, so the platform is excluded for
>>> now.
>>>
>>> Using the following micro-benchmark to measure the time taken to perform
>>> PAGEOUT on 256MB of 64KiB anonymous large folios.
>>>
>>> #define _GNU_SOURCE
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <sys/mman.h>
>>> #include <string.h>
>>> #include <time.h>
>>> #include <unistd.h>
>>> #include <errno.h>
>>>
>>> #define SIZE_MB 256
>>> #define SIZE_BYTES (SIZE_MB * 1024 * 1024)
>>>
>>> int main() {
>>> void *addr = mmap(NULL, SIZE_BYTES, PROT_READ | PROT_WRITE,
>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>> if (addr == MAP_FAILED) {
>>> perror("mmap failed");
>>> return 1;
>>> }
>>>
>>> memset(addr, 0, SIZE_BYTES);
>>>
>>> struct timespec start, end;
>>> clock_gettime(CLOCK_MONOTONIC, &start);
>>>
>>> if (madvise(addr, SIZE_BYTES, MADV_PAGEOUT) != 0) {
>>> perror("madvise(MADV_PAGEOUT) failed");
>>> munmap(addr, SIZE_BYTES);
>>> return 1;
>>> }
>>>
>>> clock_gettime(CLOCK_MONOTONIC, &end);
>>>
>>> long duration_ns = (end.tv_sec - start.tv_sec) * 1e9 +
>>> (end.tv_nsec - start.tv_nsec);
>>> printf("madvise(MADV_PAGEOUT) took %ld ns (%.3f ms)\n",
>>> duration_ns, duration_ns / 1e6);
>>>
>>> munmap(addr, SIZE_BYTES);
>>> return 0;
>>> }
>>>
>>> w/o patch:
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1337334000 ns (1337.334 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1340471008 ns (1340.471 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1385718992 ns (1385.719 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1366070000 ns (1366.070 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 1347834992 ns (1347.835 ms)
>>>
>>> w/patch:
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 698178000 ns (698.178 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 708570000 ns (708.570 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 693884000 ns (693.884 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 693366000 ns (693.366 ms)
>>> ~ # ./a.out
>>> madvise(MADV_PAGEOUT) took 690790000 ns (690.790 ms)
>>>
>>> We found that the time to reclaim this memory was reduced by half.
>>
>> Do you have some performance numbers for the base page?
>
> We verified that folio_test_large(folio) needs to run in a batched context;
> otherwise, nr_pages remains 1 for each folio.
>
> if (folio_test_large(folio) && !(flags &
> TTU_HWPOISON) &&
> can_batch_unmap_folio_ptes(address, folio, pvmw.pte,
> anon_exclusive))
> nr_pages = folio_nr_pages(folio);
>
> I didn't expect any noticeable performance change for base pages, but testing
> shows the patch appears to make them slightly faster—likely due to test noise or
> jitter.
>
> W/o patch:
>
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5686488000 ns (5686.488 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5628330992 ns (5628.331 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5771742992 ns (5771.743 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5672108000 ns (5672.108 ms)
>
>
> W/ patch:
>
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5481578000 ns (5481.578 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5425394992 ns (5425.395 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5522109008 ns (5522.109 ms)
> ~ # ./a.out
> madvise(MADV_PAGEOUT) took 5506832000 ns (5506.832 ms)
Thanks. My expectation is also that the batch processing of large folios
should not have a performance impact on the base pages, but it would be
best to clearly state this in the commit message.
Powered by blists - more mailing lists