linux-kernel - Re: rq lock contention due to commit af7f588d8f73

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230329074549.GA65916@ziqianlu-desk2>
Date:   Wed, 29 Mar 2023 15:45:49 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: rq lock contention due to commit af7f588d8f73

On Tue, Mar 28, 2023 at 08:39:41AM -0400, Mathieu Desnoyers wrote:
> On 2023-03-28 02:58, Aaron Lu wrote:
> > On Mon, Mar 27, 2023 at 03:57:43PM -0400, Mathieu Desnoyers wrote:
> > > I've just resuscitated my per-runqueue concurrency ID cache patch from an older
> > > patchset, and posted it as RFC. So far it passed one round of rseq selftests. Can
> > > you test it in your environment to see if I'm on the right track ?
> > > 
> > > https://lore.kernel.org/lkml/20230327195318.137094-1-mathieu.desnoyers@efficios.com/
> > 
> > There are improvements with this patch.
> > 
> > When running the client side sysbench with nr_thread=56, the lock contention
> > is gone%; with nr_thread=224(=nr_cpu of this machine), the lock contention
> > dropped from 75% to 27%.
> 
> This is a good start!
> 
> Can you compare this with Peter's approach to modify init/Kconfig, make
> SCHED_MM_CID a bool, and set it =n in the kernel config ?
> 
> I just want to see what baseline we should compare against.
> 
> Another test we would want to try here: there is an arbitrary choice for the
> runqueue cache array size in my own patch:
> 
> kernel/sched/sched.h:
> # define RQ_CID_CACHE_SIZE    8
> 
> Can you try changing this value for 16 or 32 instead and see if it helps?

I tried 32. The short answer is: for nr_thread=224 case, using a larger
value doesn't show obvious difference.

Here is more detailed info.

During a 5 minutes run, I captued 5s perf every 30 seconds. To avoid
getting too huge data recorded by perf since this machine has 224 cpus,
I picked 4 cpus of each node when doing perf record and here are the results:

Your RFC patch that did mm_cid rq cache:
node0_1.profile:    26.07%    26.06%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_2.profile:    28.38%    28.37%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_3.profile:    25.44%    25.44%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_4.profile:    16.14%    16.13%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_5.profile:    15.17%    15.16%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_6.profile:     5.23%     5.23%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_7.profile:     2.64%     2.64%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_8.profile:     2.87%     2.87%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_9.profile:     2.73%     2.73%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_1.profile:    23.78%    23.77%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_2.profile:    25.11%    25.10%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_3.profile:    21.97%    21.95%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_4.profile:    19.37%    19.35%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_5.profile:    18.85%    18.84%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_6.profile:    11.22%    11.20%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_7.profile:     1.65%     1.64%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath 
node1_8.profile:     1.68%     1.67%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_9.profile:     1.57%     1.56%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath

Changing RQ_CID_CACHE_SIZE to 32:
node0_1.profile:    29.25%    29.24%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_2.profile:    26.87%    26.87%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_3.profile:    24.23%    24.23%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_4.profile:    17.31%    17.30%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_5.profile:     3.61%     3.60%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_6.profile:     2.60%     2.59%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_7.profile:     1.77%     1.77%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_8.profile:     2.14%     2.13%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_9.profile:     2.20%     2.20%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_1.profile:    27.25%    27.24%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_2.profile:    25.12%    25.11%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_3.profile:    25.27%    25.26%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_4.profile:    19.48%    19.47%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_5.profile:    10.21%    10.20%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_6.profile:     3.01%     3.00%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_7.profile:     1.47%     1.47%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_8.profile:     1.52%     1.51%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_9.profile:     1.58%     1.56%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath

This workload has a characteristic that in the initial ~2 minutes, it has
more wakeups and task migrations and that probably can explain why lock
contention dropped in later profiles.

As comparison, the vanilla v6.3-rc4:
node0_1.profile:    71.27%    71.26%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_2.profile:    72.14%    72.13%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_3.profile:    72.68%    72.67%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_4.profile:    73.30%    73.29%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_5.profile:    77.54%    77.53%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_6.profile:    76.05%    76.04%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_7.profile:    75.08%    75.07%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_8.profile:    75.78%    75.77%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_9.profile:    75.30%    75.30%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_1.profile:    68.40%    68.40%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_2.profile:    69.19%    69.18%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_3.profile:    68.74%    68.74%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_4.profile:    59.99%    59.98%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_5.profile:    56.81%    56.80%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_6.profile:    53.46%    53.45%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_7.profile:    28.90%    28.88%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_8.profile:    27.70%    27.67%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_9.profile:    27.17%    27.14%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath

And when CONFIG_SCHED_MM_CID is off on top of v6.3-rc4:
node0_1.profile:     0.09%     0.08%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_2.profile:     0.08%     0.08%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_3.profile:     0.09%     0.09%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_4.profile:     0.10%     0.10%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_5.profile:     0.07%     0.07%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_6.profile:     0.09%     0.09%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_7.profile:     0.15%     0.15%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_8.profile:     0.08%     0.08%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node0_9.profile:     0.08%     0.08%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_1.profile:     0.23%     0.22%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_2.profile:     0.28%     0.28%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_3.profile:     2.80%     2.80%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_4.profile:     4.29%     4.29%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_5.profile:     4.05%     4.05%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_6.profile:     2.93%     2.92%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_7.profile:     0.07%     0.07%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_8.profile:     0.07%     0.07%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
node1_9.profile:     0.07%     0.06%  [kernel.vmlinux]        [k] native_queued_spin_lock_slowpath
As for the few profiles on node1 where lock contention is more than
0.3%, I've checked those are from pkg_thermal_notify() which should
be a separate issue.

Thanks,
Aaron