[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ZSS+gLeqFs4hNSoEsN-T-UUOGVzLX9OjR07maMrP+CHQ@mail.gmail.com>
Date: Fri, 28 Feb 2025 14:13:50 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: x86@...nel.org, linux-kernel@...r.kernel.org
Cc: Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
On Fri, Feb 28, 2025 at 1:38 PM Uros Bizjak <ubizjak@...il.com> wrote:
>
> According to:
>
> https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
>
> the usage of asm pseudo directives in the asm template can confuse
> the compiler to wrongly estimate the size of the generated
> code.
>
> The LOCK_PREFIX macro expands to several asm pseudo directives, so
> its usage in atomic locking insns causes instruction length estimate
> to fail significantly (the specially instrumented compiler reports
> the estimated length of these asm templates to be 6 instructions long).
>
> This incorrect estimate further causes unoptimal inlining decisions,
> unoptimal instruction scheduling and unoptimal code block alignments
> for functions that use these locking primitives.
>
> Use asm_inline instead:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
>
> which is a feature that makes GCC pretend some inline assembler code
> is tiny (while it would think it is huge), instead of just asm.
>
> For code size estimation, the size of the asm is then taken as
> the minimum size of one instruction, ignoring how many instructions
> compiler thinks it is.
>
> The code size of the resulting x86_64 defconfig object file increases
> for 33.264 kbytes, representing 1.2% code size increase:
>
> text data bss dec hex filename
> 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
>
> mainly due to different inlining decisions of -O2 build.
FTR, -Os (where generated code size really matters) x86_64 defconfig
object file *decreases* for 24.388 kbytes, representing 1.0% code size
*decrease*:
text data bss dec hex filename
23883860 4617284 814212 29315356 1bf511c vmlinux-old.o
23859472 4615404 814212 29289088 1beea80 vmlinux-new.o
again mainly due to different inlining decisions of -Os build.
Uros.
Powered by blists - more mailing lists