[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250916141032.GJ3245006@noisy.programming.kicks-ass.net>
Date: Tue, 16 Sep 2025 16:10:32 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: pengyu <pengyu@...inos.cn>
Cc: mingo@...hat.com, will@...nel.org, boqun.feng@...il.com,
longman@...hat.com, linux-kernel@...r.kernel.org,
Mark Rutland <mark.rutland@....com>, t.haas@...bs.de,
parri.andrea@...il.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
paulmck@...nel.org, jonas.oberhauser@...weicloud.com,
r.maseli@...bs.de, lkmm@...ts.linux.dev, stern@...land.harvard.edu
Subject: Re: [PATCH] locking/qspinlock: use xchg with _mb in slowpath for
arm64
On Tue, Sep 16, 2025 at 11:39:03AM +0800, pengyu wrote:
> From: Yu Peng <pengyu@...inos.cn>
>
> A hardlock detected on arm64: rq->lock was released, but a CPU
> blocked at mcs_node->locked and timed out.
>
> We found xchg_tail and atomic_try_cmpxchg_relaxed used _relaxed
> versions without memory barriers. Suspected insufficient coherence
> guarantees on some arm64 microarchitectures, potentially leading to
> the following issues occurred:
>
> CPU0: CPU1:
> // Set tail to CPU0
> old = xchg_tail(lock, tail);
>
> //CPU0 read tail is itself
> if ((val & _Q_TAIL_MASK) == tail)
> // CPU1 exchanges the tail
> old = xchg_tail(lock, tail)
> //assuming CPU0 not see tail change
> atomic_try_cmpxchg_relaxed(
> &lock->val, &val, _Q_LOCKED_VAL)
> //released without notifying CPU1
> goto release;
> //hardlock detected
> arch_mcs_spin_lock_contended(
> &node->locked)
>
> Therefore, xchg_tail and atomic_try_cmpxchg using _mb to replace _relaxed.
Yeah, no. We do not apply patches based on suspicion. And we most
certainly do not sprinkle #ifdef ARM64 in generic code.
There is this thread:
https://lkml.kernel.org/r/cb83e3e4-9e22-4457-bf61-5614cc4396ad@tu-bs.de
Which is also concerned with xchg_tail(). Reading back, I'm not sure
we've ever heard back from ARM on whether that extra ;si was correct or
not, Will?
Anyway, as Waiman already asked, please state your exact ARM64
microarch.
Barring the ;si, the above thread suggests that they can prove the code
correct with the below change, does that resolve your problem?
Other than that, I'm going to have to leave this to Will and co.
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index af8d122bb649..249231460ea0 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -261,14 +261,8 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
goto release;
/*
- * Ensure that the initialisation of @node is complete before we
- * publish the updated tail via xchg_tail() and potentially link
- * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
- */
- smp_wmb();
-
- /*
- * Publish the updated tail.
+ * Publish the updated tail. This is a RELEASE barrier and ensures the
+ * @node is complete before the link becomes visible.
* We have already touched the queueing cacheline; don't bother with
* pending stuff.
*
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
index d69958a844f7..bb9a006e76eb 100644
--- a/kernel/locking/qspinlock.h
+++ b/kernel/locking/qspinlock.h
@@ -117,7 +117,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
* We can use relaxed semantics since the caller ensures that the
* MCS node is properly initialized before updating the tail.
*/
- return (u32)xchg_relaxed(&lock->tail,
+ return (u32)xchg_release(&lock->tail,
tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
}
@@ -167,7 +167,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
* the MCS node is properly initialized before updating the
* tail.
*/
- } while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+ } while (!atomic_try_cmpxchg_release(&lock->val, &old, new));
return old;
}
Powered by blists - more mailing lists