[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3fe90c96-da4d-4240-bd58-0bed5fe7cf5f@lucifer.local>
Date: Thu, 8 May 2025 19:35:52 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
jannh@...gle.com, pfalcato@...e.de, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, david@...hat.com, peterx@...hat.com,
ryan.roberts@....com, mingo@...nel.org, libang.li@...group.com,
maobibo@...ngson.cn, zhengqi.arch@...edance.com, baohua@...nel.org,
anshuman.khandual@....com, willy@...radead.org, ioworker0@...il.com,
yang@...amperecomputing.com, baolin.wang@...ux.alibaba.com,
ziy@...dia.com, hughd@...gle.com
Subject: Re: [PATCH v2 0/2] Optimize mremap() for large folios
Dev - a general comment here - but let's slow things down a little please
:)
The mprotect() version of this is still outstanding fixes and likely will
need quite a bit of checking before we can ensure it's stabilised.
And now we have this mremap() series as well which also has had quite a few
quite significant issues that have needed addressing.
So can we try to focus on one at a time, and really try to nail down the
series before moving on to the next?
We also have outstanding review on the v1, which has now been split, which
does happen sometimes but perhaps suggests that it'd work better if you
waited a couple days or such to ensure things are settled before sending a
new version when there's quite a bit of feedback?
This isn't a criticism really, sorry I don't mean to sound negative or such
- but this is more a process thing so we reviewers can keep up with things,
keep things rolling, and ensure you get your changes merged asap :)
Thanks, Lorenzo
On Wed, May 07, 2025 at 11:32:54AM +0530, Dev Jain wrote:
> Currently move_ptes() iterates through ptes one by one. If the underlying
> folio mapped by the ptes is large, we can process those ptes in a batch
> using folio_pte_batch(), thus clearing and setting the PTEs in one go.
> For arm64 specifically, this results in a 16x reduction in the number of
> ptep_get() calls (since on a contig block, ptep_get() on arm64 will iterate
> through all 16 entries to collect a/d bits), and we also elide extra TLBIs
> through get_and_clear_full_ptes, replacing ptep_get_and_clear.
>
> Mapping 512K of memory, memsetting it, remapping it to src + 512K, and
> munmapping it 10,000 times, the average execution time reduces from 1.9 to
> 1.2 seconds, giving a 37% performance optimization, on Apple M3 (arm64).
>
> Test program for reference:
>
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <string.h>
> #include <errno.h>
>
> #define SIZE (1UL << 20) // 512 KB
>
> int main(void) {
> void *new_addr, *addr;
>
> for (int i = 0; i < 10000; ++i) {
> addr = mmap((void *)(1UL << 30), SIZE, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> if (addr == MAP_FAILED) {
> perror("mmap");
> return 1;
> }
> memset(addr, 0xAA, SIZE);
>
> new_addr = mremap(addr, SIZE, SIZE, MREMAP_MAYMOVE | MREMAP_FIXED, addr + SIZE);
> if (new_addr != (addr + SIZE)) {
> perror("mremap");
> return 1;
> }
> munmap(new_addr, SIZE);
> }
>
> }
>
> v1->v2:
> - Expand patch descriptions, move pte declarations to a new line,
> reduce indentation in patch 2 by introducing mremap_folio_pte_batch(),
> fix loop iteration (Lorenzo)
> - Merge patch 2 and 3 (Anshuman, Lorenzo)
> - Fix maybe_contiguous_pte_pfns (Willy)
>
> Dev Jain (2):
> mm: Call pointers to ptes as ptep
> mm: Optimize mremap() by PTE batching
>
> include/linux/pgtable.h | 29 ++++++++++++++++++++++
> mm/mremap.c | 54 +++++++++++++++++++++++++++++------------
> 2 files changed, 68 insertions(+), 15 deletions(-)
>
> --
> 2.30.2
>
Powered by blists - more mailing lists