[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9tGh0Fa96gACmpQ@gmail.com>
Date: Wed, 19 Mar 2025 23:34:47 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
Juergen Gross <jgross@...e.com>,
Stefano Stabellini <sstabellini@...nel.org>,
"Ahmed S . Darwish" <darwi@...utronix.de>,
Andrew Cooper <andrew.cooper3@...rix.com>,
John Ogness <john.ogness@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] compiler/gcc: Make asm() templates asm __inline__() by
default
* Uros Bizjak <ubizjak@...il.com> wrote:
> On Tue, Mar 18, 2025 at 9:11 PM Ingo Molnar <mingo@...nel.org> wrote:
>
> > #ifdef CONFIG_CC_HAS_ASM_INLINE
> > # define asm_inline __asm__ __inline
> > # define asm(...) asm_inline(__VA_ARGS__)
> > #else
> > # define asm_inline asm
> > #endif
> >
> > And I fixed up the places where this isn't syntactically correct:
> >
> > 35 files changed, 82 insertions(+), 79 deletions(-)
> >
> > I haven't looked at code generation much yet, but text size changes are
> > minimal:
> >
> > text data bss dec hex filename
> > 29429076 7931870 1401196 38762142 24f769e vmlinux.before
> > 29429631 7931870 1401200 38762701 24f78cd vmlinux.after
> >
> > Which is promising, assuming I haven't messed up anywhere.
>
> Please use bloat-o-meter, it is more precise.
Here's the bloat-o-meter output between vanilla and patched vmlinux:
add/remove: 6/20 grow/shrink: 43/13 up/down: 4245/-2812 (1433)
Function old new delta
__ia32_sys_pidfd_send_signal 21 818 +797
__x64_sys_pidfd_send_signal 22 818 +796
nl80211_send_wiphy 11189 11867 +678
icl_update_topdown_event 473 691 +218
intel_joiner_adjust_timings.isra - 145 +145
deactivate_locked_super 148 249 +101
tcp_v4_send_synack 301 389 +88
kill_anon_super 53 137 +84
xa_destroy 291 371 +80
__xa_set_mark 135 209 +74
ip_fraglist_prepare 204 269 +65
store_hwp_dynamic_boost 136 200 +64
csum_partial 239 302 +63
ip_fraglist_init 155 217 +62
store_max_perf_pct 252 311 +59
ip_frag_next 377 434 +57
ip_do_fragment 1644 1701 +57
__ip_local_out 283 340 +57
store_min_perf_pct 273 329 +56
__udp4_lib_rcv 2917 2970 +53
freeze_super 1124 1175 +51
super_wake - 47 +47
ieee80211_parse_tx_radiotap 1267 1311 +44
ic_bootp_recv 1333 1371 +38
vlv_compute_watermarks 2267 2299 +32
intel_atomic_commit_tail 5423 5455 +32
_g4x_compute_pipe_wm 397 429 +32
__ieee80211_xmit_fast 2611 2643 +32
intel_format_info_is_yuv_semiplanar 43 72 +29
skl_main_to_aux_plane 113 136 +23
ip_auto_config 4029 4050 +21
__memcpy_flushcache 369 386 +17
validate_beacon_head 240 256 +16
tcp_v4_rcv 4850 4866 +16
intel_enable_transcoder 1443 1459 +16
intel_cx0pll_state_verify 1859 1875 +16
intel_cx0_phy_check_hdmi_link_rate 181 197 +16
inet_gro_receive 569 585 +16
__pfx_super_wake - 16 +16
__pfx_intel_joiner_adjust_timings.isra - 16 +16
__pfx_intel_cx0_read.constprop - 16 +16
intel_fb_is_gen12_ccs_aux_plane.isra 95 107 +12
_intel_modeset_primary_pipes 79 91 +12
intel_cx0_read.constprop - 10 +10
intel_fb_is_ccs_aux_plane 110 117 +7
intel_crtc_readout_derived_state 513 516 +3
wq_update_node_max_active 538 540 +2
intel_set_cdclk_pre_plane_update 838 840 +2
intel_sseu_subslice_total 67 68 +1
intel_crtc_compute_config 890 889 -1
_intel_modeset_secondary_pipes 50 47 -3
intel_mtl_pll_enable 7472 7460 -12
bxt_set_cdclk 1328 1316 -12
vfs_get_tree 205 189 -16
intel_fb_modifier_to_tiling 186 170 -16
__pfx_nl80211_send_iftype_data 16 - -16
__pfx_lane_mask_to_lane 16 - -16
__pfx_kill_super_notify.part 16 - -16
__pfx_ip_fast_csum 16 - -16
__pfx_intel_pstate_update_policies 16 - -16
__pfx_intel_joiner_adjust_timings 16 - -16
__pfx_format_is_yuv_semiplanar.part.isra 16 - -16
__pfx_cdclk_divider 16 - -16
__pfx___icl_update_topdown_event 16 - -16
__pfx___do_sys_pidfd_send_signal 16 - -16
intel_c20_sram_read.constprop 173 149 -24
xas_split_alloc 302 270 -32
nl80211_parse_sched_scan 3311 3279 -32
kill_litter_super 70 31 -39
format_is_yuv_semiplanar.part.isra 41 - -41
lane_mask_to_lane 44 - -44
ip_fast_csum 48 - -48
intel_cx0pll_readout_hw_state 1085 1037 -48
intel_pstate_update_policies 66 - -66
cdclk_divider 69 - -69
kill_super_notify.part 111 - -111
__icl_update_topdown_event 112 - -112
intel_joiner_adjust_timings 140 - -140
intel_fill_fb_info 2998 2813 -185
intel_tile_width_bytes 747 531 -216
nl80211_send_iftype_data 574 - -574
__do_sys_pidfd_send_signal 811 - -811
Total: Before=22547058, After=22548491, chg +0.01%
A lot fewer functions are affected than I expected from such a
large-scope change.
> Actually, functions with the most impact (x86 locking functions and
> __arch_hweight) were recently converted to asm_inline, so besides
> __untagged_addr, the remaining have very little impact, if at all
> (c.f. amd_clear_divider() ). There is also no need to convert asm()
> without directives inside.
I did the test with Linus-vanilla (81e4f8d68c66) to maximize the
potential effect, which doesn't have those changes yet.
See tip:WIP.x86/core.
> My proposal would be to convert the remaining few cases (the remaining
> asms involving ALTERNATIVE and exceptions) "by hand" to asm_inline()
> and stick a rule in checkpatch to use asm_inline() in the code
> involving asm(), like we have the rule with asm volatile.
>
> I don't think redefining an important C keyword is a good approach, it
> obfuscates its meaning too much. And as has been shown by Ingo's
> experiment, there is a substantial effort to fix false positives.
> Instead of fixing these, we can trivially convert the remaining cases
> to asm_volatile() as well, without obfuscating asm(). Checkpatch can
> take care of future cases.
This would work for me too. The cross-arch impact and churn seems
substantial.
Thanks,
Ingo
Powered by blists - more mailing lists