[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <31f27919-926f-4cbd-81fa-5a52c453feca@intel.com>
Date: Wed, 16 Jul 2025 17:25:43 +0800
From: "Guo, Wangyang" <wangyang.guo@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Paolo Bonzini <pbonzini@...hat.com>, Vitaly Kuznetsov <vkuznets@...hat.com>,
Sean Christopherson <seanjc@...gle.com>
Cc: Tianyou Li <tianyou.li@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH RESEND] x86/paravirt: add backoff mechanism to
virt_spin_lock
Any comments or suggestions to this patch? Is there any further updates
or changes needed?
BR
Wangyang
On 7/3/2025 10:23 AM, Wangyang Guo wrote:
> When multiple threads waiting for lock at the same time, once lock owner
> releases the lock, waiters will see lock available and all try to lock,
> which may cause an expensive CAS storm.
>
> Binary exponential backoff is introduced. As try-lock attempt increases,
> there is more likely that a larger number threads compete for the same
> lock, so increase wait time in exponential.
>
> The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
> 288 cores VM on Intel Xeon 6 E-cores platform.
>
> For micro benchmark, the patch can have significant performance gain
> in high contention case. Slight regression is found in some of mid-
> conetented cases because the last waiter might take longer to check
> unlocked. No changes to low contented scenario as expected.
>
> Micro Bench: https://github.com/guowangy/kernel-lock-bench
> Test Platform: Xeon 8380L
> First Row: critical section length
> First Col: CPU core number
> Values: backoff vs linux-6.15, throughput based, higher is better
>
> non-critical-length: 1
> 0 1 2 4 8 16 32 64 128
> 1 1.01 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.00
> 2 1.02 1.01 1.02 0.97 1.02 1.05 1.01 1.00 1.01
> 4 1.15 1.20 1.14 1.11 1.34 1.26 0.99 0.93 0.98
> 8 1.59 1.71 1.18 1.80 1.95 1.45 1.05 0.99 1.17
> 16 1.04 1.37 1.08 1.31 1.85 1.50 1.24 0.99 1.24
> 32 1.24 1.36 1.23 1.40 1.50 1.86 1.45 1.18 1.48
> 64 1.12 1.24 1.11 1.31 1.34 1.37 2.01 1.60 1.43
>
> non-critical-length: 32
> 0 1 2 4 8 16 32 64 128
> 1 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.01
> 2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00
> 4 1.12 1.25 1.09 1.07 1.12 1.16 1.13 1.16 1.09
> 8 1.02 1.16 1.03 1.02 1.04 1.07 1.04 0.99 0.98
> 16 0.97 0.95 0.84 0.96 0.99 0.98 0.98 1.01 1.03
> 32 1.05 1.03 0.87 1.05 1.25 1.16 1.25 1.30 1.27
> 64 1.83 1.10 1.07 1.02 1.19 1.18 1.21 1.14 1.13
>
> non-critical-length: 128
> 0 1 2 4 8 16 32 64 128
> 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
> 2 0.99 1.02 1.00 1.00 1.00 1.00 1.00 1.00 1.00
> 4 0.98 0.99 1.00 1.00 0.99 1.04 0.99 0.99 1.02
> 8 1.08 1.08 1.08 1.07 1.15 1.12 1.03 0.94 1.00
> 16 1.00 1.00 1.00 1.01 1.01 1.01 1.36 1.06 1.02
> 32 1.07 1.08 1.07 1.07 1.09 1.10 1.22 1.36 1.25
> 64 1.03 1.04 1.04 1.06 1.13 1.18 0.82 1.02 1.14
>
> Reviewed-by: Tianyou Li <tianyou.li@...el.com>
> Reviewed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Wangyang Guo <wangyang.guo@...el.com>
> ---
> arch/x86/include/asm/qspinlock.h | 28 +++++++++++++++++++++++++---
> 1 file changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 68da67df304d..ac6e1bbd9ba4 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -87,7 +87,7 @@ DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
> #define virt_spin_lock virt_spin_lock
> static inline bool virt_spin_lock(struct qspinlock *lock)
> {
> - int val;
> + int val, locked;
>
> if (!static_branch_likely(&virt_spin_lock_key))
> return false;
> @@ -98,11 +98,33 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> * horrible lock 'holder' preemption issues.
> */
>
> +#define MAX_BACKOFF 64
> + int backoff = 1;
> +
> __retry:
> val = atomic_read(&lock->val);
> + locked = val;
> +
> + if (locked || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
> + int spin_count = backoff;
> +
> + while (spin_count--)
> + cpu_relax();
> +
> + /*
> + * Here not locked means lock tried, but fails.
> + *
> + * When multiple threads waiting for lock at the same time,
> + * once lock owner releases the lock, waiters will see lock available
> + * and all try to lock, which may cause an expensive CAS storm.
> + *
> + * Binary exponential backoff is introduced. As try-lock attempt
> + * increases, there is more likely that a larger number threads
> + * compete for the same lock, so increase wait time in exponential.
> + */
> + if (!locked)
> + backoff = (backoff < MAX_BACKOFF) ? backoff << 1 : backoff;
>
> - if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
> - cpu_relax();
> goto __retry;
> }
>
Powered by blists - more mailing lists