lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Nov 2023 08:18:19 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [PATCH] x86/smp: Use atomic_try_cmpxchg() to micro-optimize
 native_stop_other_cpus()

On 11/14/23 08:43, Uros Bizjak wrote:
> Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
> in native_stop_other_cpus(). On x86 the CMPXCHG instruction returns success
> in the ZF flag, so this change saves a compare after CMPXCHG. Together
> with a small code reorder, the generated asm code improves from:
> 
>   74:	8b 05 00 00 00 00    	mov    0x0(%rip),%eax
>   7a:	41 54                	push   %r12
>   7c:	55                   	push   %rbp
>   7d:	65 8b 2d 00 00 00 00 	mov    %gs:0x0(%rip),%ebp
>   84:	53                   	push   %rbx
>   85:	85 c0                	test   %eax,%eax
>   87:	75 71                	jne    fa <native_stop_other_cpus+0x8a>
>   89:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
>   8e:	f0 0f b1 2d 00 00 00 	lock cmpxchg %ebp,0x0(%rip)
>   95:	00
>   96:	83 f8 ff             	cmp    $0xffffffff,%eax
>   99:	75 5f                	jne    fa <native_stop_other_cpus+0x8a>
> 
> to:
> 
>   74:	8b 05 00 00 00 00    	mov    0x0(%rip),%eax
>   7a:	85 c0                	test   %eax,%eax
>   7c:	0f 85 84 00 00 00    	jne    106 <native_stop_other_cpus+0x96>
>   82:	41 54                	push   %r12
>   84:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
>   89:	55                   	push   %rbp
>   8a:	53                   	push   %rbx
>   8b:	65 8b 1d 00 00 00 00 	mov    %gs:0x0(%rip),%ebx
>   92:	f0 0f b1 1d 00 00 00 	lock cmpxchg %ebx,0x0(%rip)
>   99:	00
>   9a:	75 5e                	jne    fa <native_stop_other_cpus+0x8a>
> 
> Please note early exit and lack of CMP after CMPXCHG.

Uros, I really do appreciate that you are trying to optimize these
paths.  But the thing we have to balance is the _need_ for optimization
with the chance that this will break something.

This is about as much of a slow path as we have in the kernel.  It's
only used at shutdown, right?  That means this is one of the places in
the kernel that least needs optimization.  It can only possibly get run
once per boot.

So, the benefit is that it might make this code a few cycles faster.  In
practice, it might not even be measurably faster.

On the other hand, this is relatively untested and also makes the C code
more complicated.

Is there some other side benefit that I'm missing here?  Applying this
patch doesn't seem to have a great risk/reward ratio.

Powered by blists - more mailing lists