lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMmJlv8JrzyHRCxR@willie-the-truck>
Date: Tue, 16 Sep 2025 17:00:22 +0100
From: Will Deacon <will@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: pengyu <pengyu@...inos.cn>, mingo@...hat.com, boqun.feng@...il.com,
	longman@...hat.com, linux-kernel@...r.kernel.org,
	Mark Rutland <mark.rutland@....com>, t.haas@...bs.de,
	parri.andrea@...il.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
	paulmck@...nel.org, jonas.oberhauser@...weicloud.com,
	r.maseli@...bs.de, lkmm@...ts.linux.dev, stern@...land.harvard.edu
Subject: Re: [PATCH] locking/qspinlock: use xchg with _mb in slowpath for
 arm64

On Tue, Sep 16, 2025 at 04:10:32PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 16, 2025 at 11:39:03AM +0800, pengyu wrote:
> > From: Yu Peng <pengyu@...inos.cn>
> > 
> > A hardlock detected on arm64: rq->lock was released, but a CPU
> > blocked at mcs_node->locked and timed out.
> > 
> > We found xchg_tail and atomic_try_cmpxchg_relaxed used _relaxed
> > versions without memory barriers. Suspected insufficient coherence
> > guarantees on some arm64 microarchitectures, potentially leading to
> > the following issues occurred:
> > 
> > CPU0:                                           CPU1:
> > // Set tail to CPU0
> > old = xchg_tail(lock, tail);
> > 
> > //CPU0 read tail is itself
> > if ((val & _Q_TAIL_MASK) == tail)
> >                                                 // CPU1 exchanges the tail
> >                                                 old = xchg_tail(lock, tail)
> > //assuming CPU0 not see tail change
> > atomic_try_cmpxchg_relaxed(
> > 	  &lock->val, &val, _Q_LOCKED_VAL)
> > //released without notifying CPU1
> > goto release;
> >                                                 //hardlock detected
> >                                                 arch_mcs_spin_lock_contended(
> >                                                       &node->locked)
> > 
> > Therefore, xchg_tail and atomic_try_cmpxchg using _mb to replace _relaxed.
> 
> Yeah, no. We do not apply patches based on suspicion. And we most
> certainly do not sprinkle #ifdef ARM64 in generic code.

Absolutely.

> There is this thread:
> 
>   https://lkml.kernel.org/r/cb83e3e4-9e22-4457-bf61-5614cc4396ad@tu-bs.de
> 
> Which is also concerned with xchg_tail(). Reading back, I'm not sure
> we've ever heard back from ARM on whether that extra ;si was correct or
> not, Will?

It's still under discussion with the Arm architects but it was _very_
close to concluding last time we met and I wouldn't worry about it for
the purposes of this report.

> Anyway, as Waiman already asked, please state your exact ARM64
> microarch.
> 
> Barring the ;si, the above thread suggests that they can prove the code
> correct with the below change, does that resolve your problem?
> 
> Other than that, I'm going to have to leave this to Will and co.

I'll take a look but it's light on details.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ