[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0668d246-ccbb-4a74-96d8-c13bf180053f@suse.cz>
Date: Thu, 7 Aug 2025 12:25:41 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Li Qiang <liqiang01@...inos.cn>, akpm@...ux-foundation.org,
 david@...hat.com
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, rppt@...nel.org,
 surenb@...gle.com, mhocko@...e.com, Nadav Amit <nadav.amit@...il.com>
Subject: Re: [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for
 performance
On 8/6/25 07:51, Li Qiang wrote:
> Tue, 5 Aug 2025 14:35:22, Lorenzo Stoakes wrote:
>> I'm not sure, actual workloads would be best but presumably you don't have
>> one where you've noticed a demonstrable difference otherwise you'd have
>> mentioned...
>> 
>> At any rate I've come around on this series, and think this is probably
>> reasonable, but I would like to see what increasing max-inline-insns-single
>> does first?
> 
> Thank you for your suggestions. I'll pay closer attention 
> to email formatting in future communications.
> 
> Regarding the performance tests on x86_64 architecture:
> 
> Parameter Observation:
> When setting max-inline-insns-single=400 (matching arm64's 
> default value) without applying my patch, the compiler 
> automatically inlines the critical functions.
> 
> Benchmark Results:
> 
> Configuration			Baseline		With Patch			max-inline-insns-single=400
> UnixBench Score			1824			1835 (+0.6%)			1840 (+0.9%)
> vmlinux Size (bytes)	35,379,608		35,379,786 (+0.005%)	35,529,641 (+0.4%)
> 
> Key Findings:
> 
> The patch provides significant performance gain (0.6%) with 
> minimal size impact (0.005% increase). While 
> max-inline-insns-single=400 yields slightly better 
> performance (0.9%), it incurs a larger size penalty (0.4% increase).
> 
> Conclusion:
> The patch achieves a better performance/size trade-off 
> compared to globally adjusting the inline threshold. The targeted 
> approach (selective __always_inline) appears more efficient for 
> this specific optimization.
Another attempt at my opensuse tumbleweed system gcc 15.1.1:
add/remove: 1/0 grow/shrink: 4/7 up/down: 1069/-520 (549)
Function                                     old     new   delta
unmap_page_range                            6493    7424    +931
add_mm_counter                                 -     112    +112
finish_fault                                1101    1117     +16
do_swap_page                                6523    6531      +8
remap_pfn_range_internal                    1358    1360      +2
pte_to_swp_entry                             123     122      -1
pte_move_swp_offset                          219     218      -1
restore_exclusive_pte                        356     325     -31
__handle_mm_fault                           3988    3949     -39
do_wp_page                                  3926    3810    -116
copy_page_range                             8051    7930    -121
swap_pte_batch                               817     606    -211
Total: Before=66483, After=67032, chg +0.83%
The functions changed by your patch were already inlined, and yet this force
inlining apparently changed some heuristics to change also completely
unrelated functions.
So that just shows how fragile these kinds of attempts to hand-hold gcc for
a specific desired behavior is, and I'd be wary of it.
Powered by blists - more mailing lists
 
