linux-kernel - Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jnv4n23bydfkn6kz4cpgtkrryfme2zjybp2frc2v734gnn32zj@3oboohu3bqiy>
Date: Thu, 12 Jun 2025 13:13:52 +0100
From: Pedro Falcato <pfalcato@...e.de>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, Liam.Howlett@...cle.com, 
	lorenzo.stoakes@...cle.com, vbabka@...e.cz, jannh@...gle.com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, david@...hat.com, peterx@...hat.com, ryan.roberts@....com, 
	mingo@...nel.org, libang.li@...group.com, maobibo@...ngson.cn, 
	zhengqi.arch@...edance.com, baohua@...nel.org, anshuman.khandual@....com, 
	willy@...radead.org, ioworker0@...il.com, yang@...amperecomputing.com, 
	baolin.wang@...ux.alibaba.com, ziy@...dia.com, hughd@...gle.com
Subject: Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching

On Tue, Jun 10, 2025 at 09:20:43AM +0530, Dev Jain wrote:
> Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes
> are painted with the contig bit, then ptep_get() will iterate through all 16
> entries to collect a/d bits. Hence this optimization will result in a 16x
> reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
> will eventually call contpte_try_unfold() on every contig block, thus
> flushing the TLB for the complete large folio range. Instead, use
> get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
> do them on the starting and ending contig block.
> 
> For split folios, there will be no pte batching; nr_ptes will be 1. For
> pagetable splitting, the ptes will still point to the same large folio;
> for arm64, this results in the optimization described above, and for other
> arches (including the general case), a minor improvement is expected due to
> a reduction in the number of function calls.
> 
> Signed-off-by: Dev Jain <dev.jain@....com>

Reviewed-by: Pedro Falcato <pfalcato@...e.de>

-- 
Pedro