[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd0e5627-b89c-d25b-bbf5-edc712642f85@gmail.com>
Date: Sun, 15 Sep 2024 20:44:31 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: yongli-oc <yongli-oc@...oxin.com>, peterz@...radead.org,
mingo@...hat.com, will@...nel.org, longman@...hat.com, boqun.feng@...il.com
Cc: linux-kernel@...r.kernel.org, yongli@...oxin.com, louisqi@...oxin.com,
cobechen@...oxin.com, jiangbowang@...oxin.com
Subject: Re: [PATCH 3/4] locking/osq_lock: From x_osq_lock/unlock to
numa-aware lock/unlock.
On 14. 09. 24 10:53, yongli-oc wrote:
> According to the contention level, switches from x_osq_lock to
> numa-aware osq_lock.
> The numa-aware lock is a two level osq_lock.
> The Makefile for dynamic numa-aware osq lock.
>
> Signed-off-by: yongli-oc <yongli-oc@...oxin.com>
> ---
> kernel/locking/Makefile | 1 +
> kernel/locking/numa.h | 98 ++++++++
> kernel/locking/numa_osq.h | 32 +++
> kernel/locking/x_osq_lock.c | 332 +++++++++++++++++++++++++++
> kernel/locking/zx_numa_osq.c | 433 +++++++++++++++++++++++++++++++++++
> 5 files changed, 896 insertions(+)
> create mode 100644 kernel/locking/numa.h
> create mode 100644 kernel/locking/numa_osq.h
> create mode 100644 kernel/locking/x_osq_lock.c
> create mode 100644 kernel/locking/zx_numa_osq.c
...
> + if (lock->numa_enable == OSQLOCKSTOPPING && old == OSQ_UNLOCKED_VAL)
> + old = OSQ_LOCKED_VAL;
> +
> + for (;;) {
> + if (READ_ONCE(lock->tail16) == curr &&
> + cmpxchg(&lock->tail16, curr, old) == curr) {
I would like to ask if there is any benefit to read the location two
times? cmpxchg() reads the location and skips the update when curr is
different from the value at the location by itself. Using try_cmpxchg()
can produce even more optimized code, since on x86 arch CMPXCHG also
sets ZF flag when operand 2 is equal to the value at location (and
update happens), and this flag can be used in a conditional jump.
The above condition could read:
if (try_cmpxchg(&lock->tail16, &curr, old)) {
with an optional temporary variable instead of curr, because
try_cmpxchg() updates its second argument:
+ for (;;) {
+ u16 tmp = curr;
+ if (try_cmpxchg(&lock->tail16, &tmp, old)) {
Please note that the above form could produce slightly more optimized
code also on non-x86 targets.
Uros.
Powered by blists - more mailing lists