linux-kernel - Re: [PATCH] locking/qspinlock: use xchg with

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <31861b75-02ee-495e-b839-15d7510bf7c6@kylinos.cn>
Date: Wed, 17 Sep 2025 18:51:18 +0800
From: pengyu <pengyu@...inos.cn>
To: Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, boqun.feng@...il.com, longman@...hat.com,
 linux-kernel@...r.kernel.org, Mark Rutland <mark.rutland@....com>,
 t.haas@...bs.de, parri.andrea@...il.com, j.alglave@....ac.uk,
 luc.maranget@...ia.fr, paulmck@...nel.org, jonas.oberhauser@...weicloud.com,
 r.maseli@...bs.de, lkmm@...ts.linux.dev, stern@...land.harvard.edu
Subject: Re: [PATCH] locking/qspinlock: use xchg with _mb in slowpath for
 arm64

On 9/17/25 00:00, Will Deacon wrote:
> On Tue, Sep 16, 2025 at 04:10:32PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 16, 2025 at 11:39:03AM +0800, pengyu wrote:
>>> From: Yu Peng <pengyu@...inos.cn>
>>>
>>> A hardlock detected on arm64: rq->lock was released, but a CPU
>>> blocked at mcs_node->locked and timed out.
>>>
>>> We found xchg_tail and atomic_try_cmpxchg_relaxed used _relaxed
>>> versions without memory barriers. Suspected insufficient coherence
>>> guarantees on some arm64 microarchitectures, potentially leading to
>>> the following issues occurred:
>>>
>>> CPU0:                                           CPU1:
>>> // Set tail to CPU0
>>> old = xchg_tail(lock, tail);
>>>
>>> //CPU0 read tail is itself
>>> if ((val & _Q_TAIL_MASK) == tail)
>>>                                                  // CPU1 exchanges the tail
>>>                                                  old = xchg_tail(lock, tail)
>>> //assuming CPU0 not see tail change
>>> atomic_try_cmpxchg_relaxed(
>>> 	  &lock->val, &val, _Q_LOCKED_VAL)
>>> //released without notifying CPU1
>>> goto release;
>>>                                                  //hardlock detected
>>>                                                  arch_mcs_spin_lock_contended(
>>>                                                        &node->locked)
>>>
>>> Therefore, xchg_tail and atomic_try_cmpxchg using _mb to replace _relaxed.
>>
>> Yeah, no. We do not apply patches based on suspicion. And we most
>> certainly do not sprinkle #ifdef ARM64 in generic code.
> 
> Absolutely.
> 
>> There is this thread:
>>
>>    https://lkml.kernel.org/r/cb83e3e4-9e22-4457-bf61-5614cc4396ad@tu-bs.de
>>
>> Which is also concerned with xchg_tail(). Reading back, I'm not sure
>> we've ever heard back from ARM on whether that extra ;si was correct or
>> not, Will?
> 
> It's still under discussion with the Arm architects but it was _very_
> close to concluding last time we met and I wouldn't worry about it for
> the purposes of this report.
> 
>> Anyway, as Waiman already asked, please state your exact ARM64
>> microarch.
>>
>> Barring the ;si, the above thread suggests that they can prove the code
>> correct with the below change, does that resolve your problem?
>>
>> Other than that, I'm going to have to leave this to Will and co.
> 
> I'll take a look but it's light on details.
> 
> Will


Yes, this issue occurred on a kunpeng920 96-core machine and only
affected a small number of systems that had been running for over a
year.

Vmcore Analysis:
• Panic triggered by CPU 83 detecting a hard lockup at
     queued_spin_lock_slowpath+0x1d8/0x320.

• Corresponding code:
     arch_mcs_spin_lock_contended(&node->locked);

• The qspinlock involved was the rq lock, which showed a cleared state:
     crash> rq.lock,cpu ffffad96ff2907c0
       lock = {
         raw_lock = {
           {
             val = {
               counter = 0
             },
             {
               locked = 0 '\000',
               pending = 0 '\000'
             },
             {
               locked_pending = 0,
               tail = 0
             }
           }
         }
       },
       cpu = 50,

• CPU 83’s MCS node remained in a locked=0 state, with no previous
node found in the qnodes list.
     crash> p qnodes:83
     per_cpu(qnodes, 83) = $292 =
      {{
         mcs = {
           next = 0x0,
           locked = 0,
           count = 1
         }
       },
     crash> p qnodes | grep 83
       [83]: ffffadd6bf7914c0
     crash> p qnodes:all | grep ffffadd6bf7914c0
     crash>

• Since rq->lock was cleared, no CPU could notify CPU 83.

This issue has occurred multiple times, but the root cause remains
unclear. We suspect that CPU 83 may have failed to enqueue itself,
potentially due to a failure in the xchg_tail atomic operation.

It has been noted that the _relaxed version is used in xchx_tail, and we
are uncertain whether this could lead to visibility issues—for example,
if CPU 83 modifies lock->tail, but other CPUs fail to observe the
change.

We are also checking if this is related：
     https://lkml.kernel.org/r/cb83e3e4-9e22-4457-bf61-5614cc4396ad@tu-bs.de