[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jnv4n23bydfkn6kz4cpgtkrryfme2zjybp2frc2v734gnn32zj@3oboohu3bqiy>
Date: Thu, 12 Jun 2025 13:13:52 +0100
From: Pedro Falcato <pfalcato@...e.de>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com,
lorenzo.stoakes@...cle.com, vbabka@...e.cz, jannh@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, david@...hat.com, peterx@...hat.com, ryan.roberts@....com,
mingo@...nel.org, libang.li@...group.com, maobibo@...ngson.cn,
zhengqi.arch@...edance.com, baohua@...nel.org, anshuman.khandual@....com,
willy@...radead.org, ioworker0@...il.com, yang@...amperecomputing.com,
baolin.wang@...ux.alibaba.com, ziy@...dia.com, hughd@...gle.com
Subject: Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching
On Tue, Jun 10, 2025 at 09:20:43AM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes
> are painted with the contig bit, then ptep_get() will iterate through all 16
> entries to collect a/d bits. Hence this optimization will result in a 16x
> reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
> will eventually call contpte_try_unfold() on every contig block, thus
> flushing the TLB for the complete large folio range. Instead, use
> get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
> do them on the starting and ending contig block.
>
> For split folios, there will be no pte batching; nr_ptes will be 1. For
> pagetable splitting, the ptes will still point to the same large folio;
> for arm64, this results in the optimization described above, and for other
> arches (including the general case), a minor improvement is expected due to
> a reduction in the number of function calls.
>
> Signed-off-by: Dev Jain <dev.jain@....com>
Reviewed-by: Pedro Falcato <pfalcato@...e.de>
--
Pedro
Powered by blists - more mailing lists