linux-kernel - Re: [PATCH -tip] x86/locking/atomic: Use asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z8lxmPmnJhBmPRvl@gmail.com>
Date: Thu, 6 Mar 2025 10:57:44 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns


* Uros Bizjak <ubizjak@...il.com> wrote:

> According to:
> 
>   https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
> 
> the usage of asm pseudo directives in the asm template can confuse
> the compiler to wrongly estimate the size of the generated
> code.
> 
> The LOCK_PREFIX macro expands to several asm pseudo directives, so
> its usage in atomic locking insns causes instruction length estimate
> to fail significantly (the specially instrumented compiler reports
> the estimated length of these asm templates to be 6 instructions long).
> 
> This incorrect estimate further causes unoptimal inlining decisions,
> unoptimal instruction scheduling and unoptimal code block alignments
> for functions that use these locking primitives.
> 
> Use asm_inline instead:
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
> 
> which is a feature that makes GCC pretend some inline assembler code
> is tiny (while it would think it is huge), instead of just asm.
> 
> For code size estimation, the size of the asm is then taken as
> the minimum size of one instruction, ignoring how many instructions
> compiler thinks it is.
> 
> The code size of the resulting x86_64 defconfig object file increases
> for 33.264 kbytes, representing 1.2% code size increase:
> 
>    text    data     bss     dec     hex filename
> 27450107        4633332  814148 32897587        1f5fa33 vmlinux-old.o
> 27483371        4633784  814148 32931303        1f67de7 vmlinux-new.o
> 
> mainly due to different inlining decisions of -O2 build.

So my request here would be not more benchmark figures (I don't think 
it's a realistic expectation for contributors to be able to measure 
much of an effect with such a type of change, let alone be certain
what a macro or micro-benchmark measures is causally connected with
the patch), but I'd like to ask for some qualitative analysis on the
code generation side:

 - +1.2% code size increase is a lot, especially if it's under the 
   default build flags of the kernel. Where does the extra code come 
   from?

 - Is there any effect on Clang? Are its inlining decisions around 
   these asm() statements comparable, worse/better?

A couple of concrete examples would go a long way:

 - "Function XXX was inlined 3 times before the patch, and it was 
    inlined 30 times after the patch. I have reviewed two such inlining 
    locations, and they have added more code to unlikely or 
    failure-handling branches collected near the function epilogue, 
    while the fast-path of the function was more optimal."

Or you might end up finding:

 - "Function YYY was inlined 3x more frequently after the patch, but 
    the inlining decision increased register pressure and created less 
    optimal code in the fast-path, increasing both code size and likely 
    decreasing fast-path performance."

Obviously we'd be sad about the second case, but it's well within the 
spectrum of possibilities when we look at "+1.2% object code size 
increase".

What we cannot do is to throw up our hands and claim "-O2 trades 
performance for size, and thus this patch improves performance".
We don't know that for sure and 30 years of kernel development
created a love-and-hate relationship and a fair level of distrust
between kernel developers and compiler inlining decisions,
especially around x86 asm() statements ...

So these are roughly the high level requirements around such patches.
Does this make sense?

Thanks,

	Ingo