linux-kernel - Re: [PATCH -tip] x86/locking/atomic: Use asm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250303131234.0a2e20e4@pumpkin>
Date: Mon, 3 Mar 2025 13:12:34 +0000
From: David Laight <david.laight.linux@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
 linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, Thomas
 Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav
 Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
 Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
 locking insns

On Fri, 28 Feb 2025 14:58:47 -0800
Dave Hansen <dave.hansen@...el.com> wrote:

> On 2/28/25 14:31, Uros Bizjak wrote:
> > On Fri, Feb 28, 2025 at 5:48 PM Dave Hansen <dave.hansen@...el.com> wrote:  
> >>
> >> On 2/28/25 04:35, Uros Bizjak wrote:  
> >>> The code size of the resulting x86_64 defconfig object file increases
> >>> for 33.264 kbytes, representing 1.2% code size increase:
> >>>
> >>>    text    data     bss     dec     hex filename
> >>> 27450107        4633332  814148 32897587        1f5fa33 vmlinux-old.o
> >>> 27483371        4633784  814148 32931303        1f67de7 vmlinux-new.o  
> >>
> >> So, first of all, thank you for including some objective measurement of
> >> the impact if your patches. It's much appreciated.
> >>
> >> But I think the patches need to come with a solid theory of why they're
> >> good. The minimum bar for that, I think, is *some* kind of actual
> >> real-world performance test. I'm not picky. Just *something* that spends
> >> a lot of time in the kernel and ideally where a profile points at some
> >> of the code you're poking here.
> >>
> >> I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> >> compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
> >>
> >> But performance patches need to come with performance *numbers*.  
> > 
> > I don't consider this patch a performance patch, it is more a patch
> > that fixes a correctness issue. The compiler estimates the number of
> > instructions in the asm template wrong, so the patch instructs the
> > compiler that everything in the template in fact results in a single
> > instruction, no matter the pseudos there. The correct estimation then
> > allows the compiler to do its job better (e.g. better scheduling,
> > better inlining decisions, etc...).  
> 
> Why does it matter if the compiler does its job better?
> 
> I'll let the other folks who maintain this code chime in if they think
> I'm off my rocker. But, *I* consider this -- and all of these, frankly
> -- performance patches.

I was looking at some size changes related to a different 'trivial'
code change.
It caused gcc to make apparently unrelated inlining decisions that caused
some functions to grow/shrink by +/-100+ bytes even though the actual
change would mostly only add/remove a single instruction.

I've lost the patch for this one, but if the asm block does expand to a
single instruction it is likely to making gcc decide to inline one of the
functions that uses it - so increasing overall code size.
Whether that helps or hinders performance is difficult to say.

	David