[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230616005338.GA115001@ziqianlu-dell>
Date: Fri, 16 Jun 2023 08:53:38 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: David Vernet <void@...ifault.com>
CC: Peter Zijlstra <peterz@...radead.org>,
<linux-kernel@...r.kernel.org>, <mingo@...hat.com>,
<juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
<rostedt@...dmis.org>, <dietmar.eggemann@....com>,
<bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
<vschneid@...hat.com>, <joshdon@...gle.com>,
<roman.gushchin@...ux.dev>, <tj@...nel.org>, <kernel-team@...a.com>
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
On Thu, Jun 15, 2023 at 06:26:05PM -0500, David Vernet wrote:
> Ok, it seems that the issue is that I wasn't creating enough netperf
> clients. I assumed that -n $(nproc) was sufficient. I was able to repro
Yes that switch is confusing.
> the contention on my 26 core / 52 thread skylake client as well:
>
>
> Thanks for the help in getting the repro on my end.
You are welcome.
> So yes, there is certainly a scalability concern to bear in mind for
> swqueue for LLCs with a lot of cores. If you have a lot of tasks quickly
> e.g. blocking and waking on futexes in a tight loop, I expect a similar
> issue would be observed.
>
> On the other hand, the issue did not occur on my 7950X. I also wasn't
Using netperf/UDP_RR?
> able to repro the contention on the Skylake if I ran with the default
> netperf workload rather than UDP_RR (even with the additional clients).
I also tried that on the 18cores/36threads/LLC Skylake and the contention
is indeed much smaller than UDP_RR:
7.30% 7.29% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
But I wouldn't say it's entirely gone. Also consider Skylake has a lot
fewer cores per LLC than later Intel servers like Icelake and Sapphire
Rapids and I expect things would be worse on those two machines.
> I didn't bother to take the mean of all of the throughput results
> between NO_SWQUEUE and SWQUEUE, but they looked roughly equal.
>
> So swqueue isn't ideal for every configuration, but I'll echo my
> sentiment from [0] that this shouldn't on its own necessarily preclude
> it from being merged given that it does help a large class of
> configurations and workloads, and it's disabled by default.
>
> [0]: https://lore.kernel.org/all/20230615000103.GC2883716@maniforge/
I was wondering: does it make sense to do some divide on machines with
big LLCs? Like converting the per-LLC swqueue to per-group swqueue where
the group can be made of ~8 cpus of the same LLC. This will have a
similar effect of reducing the number of CPUs in a single LLC so the
scalability issue can hopefully be fixed while at the same time, it
might still help some workloads. I realized this isn't ideal in that
wakeup happens at LLC scale so the group thing may not fit very well
here.
Just a thought, feel free to ignore it if you don't think this is
feasible :-)
Thanks,
Aaron
Powered by blists - more mailing lists