linux-kernel - Re: [PATCH] locking/qspinlock: use xchg with

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8b89a9a8-7114-452e-bf7c-86f0cedbe01d@redhat.com>
Date: Tue, 16 Sep 2025 09:27:20 -0400
From: Waiman Long <llong@...hat.com>
To: pengyu <pengyu@...inos.cn>, peterz@...radead.org, mingo@...hat.com,
 will@...nel.org, boqun.feng@...il.com
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] locking/qspinlock: use xchg with _mb in slowpath for
 arm64

On 9/15/25 11:39 PM, pengyu wrote:
> From: Yu Peng <pengyu@...inos.cn>
>
> A hardlock detected on arm64: rq->lock was released, but a CPU
> blocked at mcs_node->locked and timed out.
>
> We found xchg_tail and atomic_try_cmpxchg_relaxed used _relaxed
> versions without memory barriers. Suspected insufficient coherence
> guarantees on some arm64 microarchitectures, potentially leading to
> the following issues occurred:
>
> CPU0:                                           CPU1:
> // Set tail to CPU0
> old = xchg_tail(lock, tail);
>
> //CPU0 read tail is itself
> if ((val & _Q_TAIL_MASK) == tail)
>                                                  // CPU1 exchanges the tail
>                                                  old = xchg_tail(lock, tail)
> //assuming CPU0 not see tail change
> atomic_try_cmpxchg_relaxed(
> 	  &lock->val, &val, _Q_LOCKED_VAL)
> //released without notifying CPU1
> goto release;
>                                                  //hardlock detected
>                                                  arch_mcs_spin_lock_contended(
>                                                        &node->locked)
>
> Therefore, xchg_tail and atomic_try_cmpxchg using _mb to replace _relaxed.
>
> Signed-off-by: pengyu <pengyu@...inos.cn>

The qspinlock code had been enabled for arm64 for quite a long time. 
This is the first time that we got report like this. How reproducible is 
this hangup problem?

What arm64 architecture has this problem? It can be a hardware bug.

Anyway, changing a relaxed version of atomic op to a fully barrier 
version can be expensive on arm64 in general. We need more information 
to ensure that we are doing the right thing.

Cheers,
Longman