[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250311150745.4492-1-ubizjak@gmail.com>
Date: Tue, 11 Mar 2025 16:07:28 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Uros Bizjak <ubizjak@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: [PATCH v2] x86/hweight: Fix and improve __arch_hweight{32,64}() assembly
a) Use ASM_CALL_CONSTRAINT to prevent inline asm that includes call
instruction from being scheduled before the frame pointer gets set
up by the containing function. This unconstrained scheduling might
cause objtool to print a "call without frame pointer save/setup"
warning. Current versions of compilers don't seem to trigger this
condition, but without this constraint there's nothing to prevent
the compiler from scheduling the insn in front of frame creation.
b) Use asm_inline to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
c) Use named operands in inline asm.
bloat-o-meter reports slight reduction of the code size
for x86_64 defconfig object file, compiled with gcc-14.2:
add/remove: 6/12 grow/shrink: 59/50 up/down: 3389/-3560 (-171)
Total: Before=22734393, After=22734222, chg -0.00%
where 29 instances of code blocks involving POPCNT now gets inlined,
resulting in the removal of several functions:
format_is_yuv_semiplanar.part.isra 41 - -41
cdclk_divider 69 - -69
intel_joiner_adjust_timings 140 - -140
nl80211_send_wowlan_tcp_caps 369 - -369
nl80211_send_iftype_data 579 - -579
__do_sys_pidfd_send_signal 809 - -809
One noticeable change is:
pcpu_page_first_chunk 1075 1060 -15
Where the compiler now inlines 4 more instances of POPCNT insns,
but still manages to compile to a function with smaller code size.
Signed-off-by: Uros Bizjak <ubizjak@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
---
v2: - Expand ASM_CALL_CONSTRANT commit message to mention that
current compilers don't schedule insn before freame creation.
- Use bloat-o-meter to assess code size changes
---
arch/x86/include/asm/arch_hweight.h | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index ba88edd0d58b..20b0633744e4 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -16,10 +16,10 @@ static __always_inline unsigned int __arch_hweight32(unsigned int w)
{
unsigned int res;
- asm (ALTERNATIVE("call __sw_hweight32", "popcntl %1, %0", X86_FEATURE_POPCNT)
- : "="REG_OUT (res)
- : REG_IN (w));
-
+ asm_inline (ALTERNATIVE("call __sw_hweight32",
+ "popcntl %[val], %[cnt]", X86_FEATURE_POPCNT)
+ : [cnt] "="REG_OUT (res), ASM_CALL_CONSTRAINT
+ : [val] REG_IN (w));
return res;
}
@@ -44,10 +44,10 @@ static __always_inline unsigned long __arch_hweight64(__u64 w)
{
unsigned long res;
- asm (ALTERNATIVE("call __sw_hweight64", "popcntq %1, %0", X86_FEATURE_POPCNT)
- : "="REG_OUT (res)
- : REG_IN (w));
-
+ asm_inline (ALTERNATIVE("call __sw_hweight64",
+ "popcntq %[val], %[cnt]", X86_FEATURE_POPCNT)
+ : [cnt] "="REG_OUT (res), ASM_CALL_CONSTRAINT
+ : [val] REG_IN (w));
return res;
}
#endif /* CONFIG_X86_32 */
--
2.42.0
Powered by blists - more mailing lists