[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0668d246-ccbb-4a74-96d8-c13bf180053f@suse.cz>
Date: Thu, 7 Aug 2025 12:25:41 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Li Qiang <liqiang01@...inos.cn>, akpm@...ux-foundation.org,
david@...hat.com
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, rppt@...nel.org,
surenb@...gle.com, mhocko@...e.com, Nadav Amit <nadav.amit@...il.com>
Subject: Re: [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for
performance
On 8/6/25 07:51, Li Qiang wrote:
> Tue, 5 Aug 2025 14:35:22, Lorenzo Stoakes wrote:
>> I'm not sure, actual workloads would be best but presumably you don't have
>> one where you've noticed a demonstrable difference otherwise you'd have
>> mentioned...
>>
>> At any rate I've come around on this series, and think this is probably
>> reasonable, but I would like to see what increasing max-inline-insns-single
>> does first?
>
> Thank you for your suggestions. I'll pay closer attention
> to email formatting in future communications.
>
> Regarding the performance tests on x86_64 architecture:
>
> Parameter Observation:
> When setting max-inline-insns-single=400 (matching arm64's
> default value) without applying my patch, the compiler
> automatically inlines the critical functions.
>
> Benchmark Results:
>
> Configuration Baseline With Patch max-inline-insns-single=400
> UnixBench Score 1824 1835 (+0.6%) 1840 (+0.9%)
> vmlinux Size (bytes) 35,379,608 35,379,786 (+0.005%) 35,529,641 (+0.4%)
>
> Key Findings:
>
> The patch provides significant performance gain (0.6%) with
> minimal size impact (0.005% increase). While
> max-inline-insns-single=400 yields slightly better
> performance (0.9%), it incurs a larger size penalty (0.4% increase).
>
> Conclusion:
> The patch achieves a better performance/size trade-off
> compared to globally adjusting the inline threshold. The targeted
> approach (selective __always_inline) appears more efficient for
> this specific optimization.
Another attempt at my opensuse tumbleweed system gcc 15.1.1:
add/remove: 1/0 grow/shrink: 4/7 up/down: 1069/-520 (549)
Function old new delta
unmap_page_range 6493 7424 +931
add_mm_counter - 112 +112
finish_fault 1101 1117 +16
do_swap_page 6523 6531 +8
remap_pfn_range_internal 1358 1360 +2
pte_to_swp_entry 123 122 -1
pte_move_swp_offset 219 218 -1
restore_exclusive_pte 356 325 -31
__handle_mm_fault 3988 3949 -39
do_wp_page 3926 3810 -116
copy_page_range 8051 7930 -121
swap_pte_batch 817 606 -211
Total: Before=66483, After=67032, chg +0.83%
The functions changed by your patch were already inlined, and yet this force
inlining apparently changed some heuristics to change also completely
unrelated functions.
So that just shows how fragile these kinds of attempts to hand-hold gcc for
a specific desired behavior is, and I'd be wary of it.
Powered by blists - more mailing lists