[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <474f934d-6c13-6755-fa7a-6116b3159301@efficios.com>
Date: Wed, 29 Mar 2023 14:07:15 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Aaron Lu <aaron.lu@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: rq lock contention due to commit af7f588d8f73
On 2023-03-29 03:45, Aaron Lu wrote:
> On Tue, Mar 28, 2023 at 08:39:41AM -0400, Mathieu Desnoyers wrote:
>> On 2023-03-28 02:58, Aaron Lu wrote:
>>> On Mon, Mar 27, 2023 at 03:57:43PM -0400, Mathieu Desnoyers wrote:
>>>> I've just resuscitated my per-runqueue concurrency ID cache patch from an older
>>>> patchset, and posted it as RFC. So far it passed one round of rseq selftests. Can
>>>> you test it in your environment to see if I'm on the right track ?
>>>>
>>>> https://lore.kernel.org/lkml/20230327195318.137094-1-mathieu.desnoyers@efficios.com/
>>>
>>> There are improvements with this patch.
>>>
>>> When running the client side sysbench with nr_thread=56, the lock contention
>>> is gone%; with nr_thread=224(=nr_cpu of this machine), the lock contention
>>> dropped from 75% to 27%.
>>
>> This is a good start!
>>
>> Can you compare this with Peter's approach to modify init/Kconfig, make
>> SCHED_MM_CID a bool, and set it =n in the kernel config ?
>>
>> I just want to see what baseline we should compare against.
>>
>> Another test we would want to try here: there is an arbitrary choice for the
>> runqueue cache array size in my own patch:
>>
>> kernel/sched/sched.h:
>> # define RQ_CID_CACHE_SIZE 8
>>
>> Can you try changing this value for 16 or 32 instead and see if it helps?
>
> I tried 32. The short answer is: for nr_thread=224 case, using a larger
> value doesn't show obvious difference.
>
> Here is more detailed info.
>
> During a 5 minutes run, I captued 5s perf every 30 seconds. To avoid
> getting too huge data recorded by perf since this machine has 224 cpus,
> I picked 4 cpus of each node when doing perf record and here are the results:
>
> Your RFC patch that did mm_cid rq cache:
> node0_1.profile: 26.07% 26.06% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_2.profile: 28.38% 28.37% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_3.profile: 25.44% 25.44% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_4.profile: 16.14% 16.13% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_5.profile: 15.17% 15.16% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_6.profile: 5.23% 5.23% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_7.profile: 2.64% 2.64% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_8.profile: 2.87% 2.87% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_9.profile: 2.73% 2.73% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_1.profile: 23.78% 23.77% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_2.profile: 25.11% 25.10% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_3.profile: 21.97% 21.95% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_4.profile: 19.37% 19.35% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_5.profile: 18.85% 18.84% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_6.profile: 11.22% 11.20% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_7.profile: 1.65% 1.64% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_8.profile: 1.68% 1.67% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_9.profile: 1.57% 1.56% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
>
> Changing RQ_CID_CACHE_SIZE to 32:
> node0_1.profile: 29.25% 29.24% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_2.profile: 26.87% 26.87% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_3.profile: 24.23% 24.23% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_4.profile: 17.31% 17.30% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_5.profile: 3.61% 3.60% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_6.profile: 2.60% 2.59% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_7.profile: 1.77% 1.77% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_8.profile: 2.14% 2.13% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node0_9.profile: 2.20% 2.20% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_1.profile: 27.25% 27.24% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_2.profile: 25.12% 25.11% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_3.profile: 25.27% 25.26% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_4.profile: 19.48% 19.47% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_5.profile: 10.21% 10.20% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_6.profile: 3.01% 3.00% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_7.profile: 1.47% 1.47% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_8.profile: 1.52% 1.51% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> node1_9.profile: 1.58% 1.56% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
>
> This workload has a characteristic that in the initial ~2 minutes, it has
> more wakeups and task migrations and that probably can explain why lock
> contention dropped in later profiles.
Yeah my RFC patch adds a rq lock on try to wakeup migrations, which I
suspect is causing this performance regression.
I've come up with a design for an alternative scheme which should be
much more lightweight locking-wise. I'll see if I can make it work and
let you know when I have something to test.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists