[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef3e34be-a30a-402b-a806-aa6ef29b410d@redhat.com>
Date: Wed, 20 Dec 2023 11:58:25 -0500
From: Waiman Long <longman@...hat.com>
To: Zeng Heng <zengheng4@...wei.com>, boqun.feng@...il.com,
peterz@...radead.org, mingo@...hat.com, will@...nel.org, guohui@...ontech.com
Cc: linux-kernel@...r.kernel.org, xiexiuqi@...wei.com, liwei391@...wei.com
Subject: Re: [PATCH] locking/osq_lock: Avoid false sharing in
optimistic_spin_node
On 12/20/23 02:02, Zeng Heng wrote:
> Using the UnixBench test suite, we clearly find that osq_lock() cause
> extremely high overheads with perf tool in the File Copy items:
>
> Overhead Shared Object Symbol
> 94.25% [kernel] [k] osq_lock
> 0.74% [kernel] [k] rwsem_spin_on_owner
> 0.32% [kernel] [k] filemap_get_read_batch
>
> In response to this, we conducted an analysis and made some gains:
>
> In the prologue of osq_lock(), it set `cpu` member of percpu struct
> optimistic_spin_node with the local cpu id, after that the value of the
> percpu struct would never change in fact. Based on that, we can regard
> the `cpu` member as a constant variable.
>
> In the meanwhile, other members of the percpu struct like next, prev and
> locked are frequently modified by osq_lock() and osq_unlock() which are
> called by rwsem, mutex and so on. However, that would invalidate the cache
> of the cpu member on other CPUs.
>
> Therefore, we can place padding here and split them into different cache
> lines to avoid cache misses when the next CPU is spinning to check other
> node's cpu member by vcpu_is_preempted().
>
> Here provide the UnixBench full-core test result based on v6.6 as below:
> Machine Intel(R) Xeon(R) Gold 6248 CPU, 40 cores, 80 threads
> Run the command of "./Run -c 80 -i 3" over 20 times and take the average.
>
> System Benchmarks Index Values Without Patch With Patch Diff
> Dhrystone 2 using register variables 185518.38 185329.56 -0.10%
> Double-Precision Whetstone 79330.46 79268.22 -0.08%
> Execl Throughput 9725.14 10390.18 6.84%
> File Copy 1024 bufsize 2000 maxblocks 1658.42 2035.55 22.74%
> File Copy 256 bufsize 500 maxblocks 1086.54 1316.96 21.21%
> File Copy 4096 bufsize 8000 maxblocks 3610.42 4152.79 15.02%
> Pipe Throughput 69325.18 69913.85 0.85%
> Pipe-based Context Switching 14026.32 14703.07 4.82%
> Process Creation 8329.94 8355.31 0.30%
> Shell Scripts (1 concurrent) 38942.41 41518.39 6.61%
> Shell Scripts (8 concurrent) 37762.35 40224.49 6.52%
> System Call Overhead 4064.44 4004.45 -1.48%
> ========
> System Benchmarks Index Score 13634.17 14560.71 6.80%
>
> Signed-off-by: Zeng Heng <zengheng4@...wei.com>
> ---
> include/linux/osq_lock.h | 6 +++++-
> kernel/locking/osq_lock.c | 8 +++++++-
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
> index 5581dbd3bd34..f33f47ec0c90 100644
> --- a/include/linux/osq_lock.h
> +++ b/include/linux/osq_lock.h
> @@ -9,7 +9,11 @@
> struct optimistic_spin_node {
> struct optimistic_spin_node *next, *prev;
> int locked; /* 1 if lock acquired */
> - int cpu; /* encoded CPU # + 1 value */
> + /*
> + * Stores an encoded CPU # + 1 value.
> + * Only read by other cpus, so split into different cache lines.
> + */
> + int cpu ____cacheline_aligned;
> };
>
> struct optimistic_spin_queue {
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index d5610ad52b92..17618d62343f 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -96,7 +96,13 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>
> node->locked = 0;
> node->next = NULL;
> - node->cpu = curr;
> + /*
> + * After this cpu member is initialized for the first time, it
> + * would no longer change in fact. That could avoid cache misses
> + * when spin and access the cpu member by other CPUs.
> + */
> + if (node->cpu != curr)
> + node->cpu = curr;
>
> /*
> * We need both ACQUIRE (pairs with corresponding RELEASE in
The contention on prev->cpu is due to the vcpu_is_preempted() call in
osq_lock(). Could you try the attached patch to see if that helps?
Thanks,
Longman
View attachment "0001-locking-osq_lock-Minimize-spinning-on-prev-cpu.patch" of type "text/x-patch" (2048 bytes)
Powered by blists - more mailing lists