[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMmJlv8JrzyHRCxR@willie-the-truck>
Date: Tue, 16 Sep 2025 17:00:22 +0100
From: Will Deacon <will@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: pengyu <pengyu@...inos.cn>, mingo@...hat.com, boqun.feng@...il.com,
longman@...hat.com, linux-kernel@...r.kernel.org,
Mark Rutland <mark.rutland@....com>, t.haas@...bs.de,
parri.andrea@...il.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
paulmck@...nel.org, jonas.oberhauser@...weicloud.com,
r.maseli@...bs.de, lkmm@...ts.linux.dev, stern@...land.harvard.edu
Subject: Re: [PATCH] locking/qspinlock: use xchg with _mb in slowpath for
arm64
On Tue, Sep 16, 2025 at 04:10:32PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 16, 2025 at 11:39:03AM +0800, pengyu wrote:
> > From: Yu Peng <pengyu@...inos.cn>
> >
> > A hardlock detected on arm64: rq->lock was released, but a CPU
> > blocked at mcs_node->locked and timed out.
> >
> > We found xchg_tail and atomic_try_cmpxchg_relaxed used _relaxed
> > versions without memory barriers. Suspected insufficient coherence
> > guarantees on some arm64 microarchitectures, potentially leading to
> > the following issues occurred:
> >
> > CPU0: CPU1:
> > // Set tail to CPU0
> > old = xchg_tail(lock, tail);
> >
> > //CPU0 read tail is itself
> > if ((val & _Q_TAIL_MASK) == tail)
> > // CPU1 exchanges the tail
> > old = xchg_tail(lock, tail)
> > //assuming CPU0 not see tail change
> > atomic_try_cmpxchg_relaxed(
> > &lock->val, &val, _Q_LOCKED_VAL)
> > //released without notifying CPU1
> > goto release;
> > //hardlock detected
> > arch_mcs_spin_lock_contended(
> > &node->locked)
> >
> > Therefore, xchg_tail and atomic_try_cmpxchg using _mb to replace _relaxed.
>
> Yeah, no. We do not apply patches based on suspicion. And we most
> certainly do not sprinkle #ifdef ARM64 in generic code.
Absolutely.
> There is this thread:
>
> https://lkml.kernel.org/r/cb83e3e4-9e22-4457-bf61-5614cc4396ad@tu-bs.de
>
> Which is also concerned with xchg_tail(). Reading back, I'm not sure
> we've ever heard back from ARM on whether that extra ;si was correct or
> not, Will?
It's still under discussion with the Arm architects but it was _very_
close to concluding last time we met and I wouldn't worry about it for
the purposes of this report.
> Anyway, as Waiman already asked, please state your exact ARM64
> microarch.
>
> Barring the ;si, the above thread suggests that they can prove the code
> correct with the below change, does that resolve your problem?
>
> Other than that, I'm going to have to leave this to Will and co.
I'll take a look but it's light on details.
Will
Powered by blists - more mailing lists