linux-kernel - Re: [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250806055111.1519608-1-liqiang01@kylinos.cn>
Date: Wed,  6 Aug 2025 13:51:11 +0800
From: Li Qiang <liqiang01@...inos.cn>
To: akpm@...ux-foundation.org,
	david@...hat.com
Cc: linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com,
	vbabka@...e.cz,
	rppt@...nel.org,
	surenb@...gle.com,
	mhocko@...e.com
Subject: Re: [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for performance

Tue, 5 Aug 2025 14:35:22, Lorenzo Stoakes wrote:
> I'm not sure, actual workloads would be best but presumably you don't have
> one where you've noticed a demonstrable difference otherwise you'd have
> mentioned...
> 
> At any rate I've come around on this series, and think this is probably
> reasonable, but I would like to see what increasing max-inline-insns-single
> does first?

Thank you for your suggestions. I'll pay closer attention 
to email formatting in future communications.

Regarding the performance tests on x86_64 architecture:

Parameter Observation:
When setting max-inline-insns-single=400 (matching arm64's 
default value) without applying my patch, the compiler 
automatically inlines the critical functions.

Benchmark Results:

Configuration			Baseline		With Patch			max-inline-insns-single=400
UnixBench Score			1824			1835 (+0.6%)			1840 (+0.9%)
vmlinux Size (bytes)	35,379,608		35,379,786 (+0.005%)	35,529,641 (+0.4%)

Key Findings:

The patch provides significant performance gain (0.6%) with 
minimal size impact (0.005% increase). While 
max-inline-insns-single=400 yields slightly better 
performance (0.9%), it incurs a larger size penalty (0.4% increase).

Conclusion:
The patch achieves a better performance/size trade-off 
compared to globally adjusting the inline threshold. The targeted 
approach (selective __always_inline) appears more efficient for 
this specific optimization.