[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251206062106.2109014-1-stepanov.anatoly@huawei.com>
Date: Sat, 6 Dec 2025 14:21:01 +0800
From: Anatoly Stepanov <stepanov.anatoly@...wei.com>
To: <peterz@...radead.org>, <boqun.feng@...il.com>, <longman@...hat.com>,
<catalin.marinas@....com>, <will@...nel.org>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
<hpa@...or.com>, <arnd@...db.de>, <dvhart@...radead.org>,
<dave@...olabs.net>, <andrealmeid@...lia.com>
CC: <linux-kernel@...r.kernel.org>, <linux-arch@...r.kernel.org>,
<guohanjun@...wei.com>, <wangkefeng.wang@...wei.com>,
<weiyongjun1@...wei.com>, <yusongping@...wei.com>, <leijitang@...wei.com>,
<artem.kuzin@...wei.com>, <fedorov.nikita@...artners.com>,
<kang.sun@...wei.com>, Anatoly Stepanov <stepanov.anatoly@...wei.com>
Subject: [RFC PATCH v2 0/5] Introduce Hierarchical Queued NUMA-aware spinlock
[Introduction & Motivation]
In a high contention case, existing Linux kernel spinlock implementations can become
inefficient on modern NUMA-systems due to frequent and expensive
cross-NUMA cache-line transfers.
This might happen due to following reasons:
- on "contender enqueue" each lock contender updates a shared lock structure
- on "MCS handoff" cross-NUMA cache-line transfer occurs when
two contenders are from different NUMA nodes.
We introduce new NUMA-aware spinlock in the kernel - Hierarchical Queued
spinlock (HQ-spinlock).
Previous work regarding NUMA-aware spinlock in Linux kernel is CNA-lock:
https://lore.kernel.org/lkml/20210514200743.3026725-1-alex.kogan@oracle.com/
Despite being better than default qspinlock (handoff-wise), CNA still has
shared-lock variable modification on every new contender arrival.
HQ-lock has completely different design concept: kind of cohort-lock and
queued-spinlock hybrid.
Which is of course bulkier, but at the same time outperforms CNA in high-contention cases.
Locktorture on "Kunpeng 920" for example:
+-------------------------+--------------+
| HQ-lock vs CNA-lock | |
+-------------------------+--------------+
| Kunpeng 920 (arm64) | |
| 96 cores (no MT) | |
| 2 sockets, 4 NUMA nodes | |
| | |
| Locktorture | |
| Threads | HQ-lock gain |
| 32 | 26% |
| 64 | 32% |
| 80 | 35% |
| 96 | 31% |
+-------------------------+--------------+
All other design and implementation details can be found in following
patches.
Anatoly Stepanov, Nikita Fedorov (5):
kernel: introduce Hierarchical Queued spinlock
hq-spinlock: proc tunables and debug stats
kernel: introduce general hq-lock support
lockref: use hq-spinlock
kernel: futex: use HQ-spinlock for hash-buckets
arch/arm64/include/asm/qspinlock.h | 37 ++
arch/x86/include/asm/qspinlock.h | 38 +-
include/asm-generic/qspinlock.h | 23 +-
include/asm-generic/qspinlock_types.h | 54 +-
include/linux/lockref.h | 2 +-
include/linux/spinlock.h | 26 +
include/linux/spinlock_types.h | 26 +
include/linux/spinlock_types_raw.h | 20 +
init/main.c | 4 +
kernel/Kconfig.locks | 29 +
kernel/futex/core.c | 2 +-
kernel/locking/hqlock_core.h | 812 ++++++++++++++++++++++++++
kernel/locking/hqlock_meta.h | 477 +++++++++++++++
kernel/locking/hqlock_proc.h | 88 +++
kernel/locking/hqlock_types.h | 118 ++++
kernel/locking/qspinlock.c | 65 ++-
kernel/locking/qspinlock.h | 4 +-
kernel/locking/spinlock_debug.c | 20 +
18 files changed, 1816 insertions(+), 29 deletions(-)
create mode 100644 arch/arm64/include/asm/qspinlock.h
create mode 100644 kernel/locking/hqlock_core.h
create mode 100644 kernel/locking/hqlock_meta.h
create mode 100644 kernel/locking/hqlock_proc.h
create mode 100644 kernel/locking/hqlock_types.h
--
2.34.1
Powered by blists - more mailing lists