linux-kernel - Re: [PATCH -tip] x86/locking/atomic: Use asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFULd4b-YJnC1LFrvqLXTTsZQqchGQar=q3vUmeN-c8Kcrd51A@mail.gmail.com>
Date: Thu, 6 Mar 2025 14:56:35 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	"H. Peter Anvin" <hpa@...or.com>, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns

On Thu, Mar 6, 2025 at 10:57 AM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@...il.com> wrote:
>
> > According to:
> >
> >   https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
> >
> > the usage of asm pseudo directives in the asm template can confuse
> > the compiler to wrongly estimate the size of the generated
> > code.
> >
> > The LOCK_PREFIX macro expands to several asm pseudo directives, so
> > its usage in atomic locking insns causes instruction length estimate
> > to fail significantly (the specially instrumented compiler reports
> > the estimated length of these asm templates to be 6 instructions long).
> >
> > This incorrect estimate further causes unoptimal inlining decisions,
> > unoptimal instruction scheduling and unoptimal code block alignments
> > for functions that use these locking primitives.
> >
> > Use asm_inline instead:
> >
> >   https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
> >
> > which is a feature that makes GCC pretend some inline assembler code
> > is tiny (while it would think it is huge), instead of just asm.
> >
> > For code size estimation, the size of the asm is then taken as
> > the minimum size of one instruction, ignoring how many instructions
> > compiler thinks it is.
> >
> > The code size of the resulting x86_64 defconfig object file increases
> > for 33.264 kbytes, representing 1.2% code size increase:
> >
> >    text    data     bss     dec     hex filename
> > 27450107        4633332  814148 32897587        1f5fa33 vmlinux-old.o
> > 27483371        4633784  814148 32931303        1f67de7 vmlinux-new.o
> >
> > mainly due to different inlining decisions of -O2 build.
>
> So my request here would be not more benchmark figures (I don't think
> it's a realistic expectation for contributors to be able to measure
> much of an effect with such a type of change, let alone be certain
> what a macro or micro-benchmark measures is causally connected with
> the patch), but I'd like to ask for some qualitative analysis on the
> code generation side:
>
>  - +1.2% code size increase is a lot, especially if it's under the
>    default build flags of the kernel. Where does the extra code come
>    from?
>
>  - Is there any effect on Clang? Are its inlining decisions around
>    these asm() statements comparable, worse/better?

FTR, clang recognizes "asm inline", but there was no difference in code sizes:

  text    data     bss     dec     hex filename
27577163        4503078  807732 32887973        1f5d4a5 vmlinux-clang-patched.o
27577181        4503078  807732 32887991        1f5d4b7
vmlinux-clang-unpatched.o

Uros.