[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8480c59-f677-c6a4-75bb-227de6a1fc2c@bytedance.com>
Date: Thu, 24 Feb 2022 11:19:38 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ben Segall <bsegall@...gle.com>,
Juri Lelli <juri.lelli@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>,
Vincent Guittot <vincent.guittot@...aro.org>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
Abel Wu <wuyun.abel@...edance.com>
Subject: Re: [RFC PATCH 0/5] introduce sched-idle balancing
Ping :)
On 2/17/22 11:43 PM, Abel Wu Wrote:
> Current load balancing is mainly based on cpu capacity
> and task util, which makes sense in the POV of overall
> throughput. While there still might be some improvement
> can be done by reducing number of overloaded cfs rqs if
> sched-idle or idle rq exists.
>
> An CFS runqueue is considered overloaded when there are
> more than one pullable non-idle tasks on it (since sched-
> idle cpus are treated as idle cpus). And idle tasks are
> counted towards rq->cfs.idle_h_nr_running, that is either
> assigned SCHED_IDLE policy or placed under idle cgroups.
>
> The overloaded cfs rqs can cause performance issues to
> both task types:
>
> - for latency critical tasks like SCHED_NORMAL,
> time of waiting in the rq will increase and
> result in higher pct99 latency, and
>
> - batch tasks may not be able to make full use
> of cpu capacity if sched-idle rq exists, thus
> presents poorer throughput.
>
> So in short, the goal of the sched-idle balancing is to
> let the *non-idle tasks* make full use of cpu resources.
> To achieve that, we mainly do two things:
>
> - pull non-idle tasks for sched-idle or idle rqs
> from the overloaded ones, and
>
> - prevent pulling the last non-idle task in an rq
>
> The mask of overloaded cpus is updated in periodic tick
> and the idle path at the LLC domain basis. This cpumask
> will also be used in SIS as a filter, improving idle cpu
> searching.
>
> Tests are done in an Intel Xeon E5-2650 v4 server with
> 2 NUMA nodes each of which has 12 cores, and with SMT2
> enabled, so 48 CPUs in total. Test results are listed
> as follows.
>
> - we used perf messaging test to test throughput
> at different load (groups).
>
> perf bench sched messaging -g [N] -l 40000
>
> N w/o w/ diff
> 1 2.897 2.834 -2.17%
> 3 5.156 4.904 -4.89%
> 5 7.850 7.617 -2.97%
> 10 15.140 14.574 -3.74%
> 20 29.387 27.602 -6.07%
>
> the result shows approximate 2~6% improvement.
>
> - and schbench to test latency performance in two
> scenarios: quiet and noisy. In quiet test, we
> run schbench in a normal cpu cgroup in a quiet
> system, while the noisy test additionally runs
> perf messaging workload inside an idle cgroup
> as nosie.
>
> schbench -m 2 -t 24 -i 60 -r 60
> perf bench sched messaging -g 1 -l 4000000
>
> [quiet]
> w/o w/
> 50.0th 31 31
> 75.0th 45 45
> 90.0th 55 55
> 95.0th 62 61
> *99.0th 85 86
> 99.5th 565 318
> 99.9th 11536 10992
> max 13029 13067
>
> [nosiy]
> w/o w/
> 50.0th 34 32
> 75.0th 48 45
> 90.0th 58 55
> 95.0th 65 61
> *99.0th 2364 208
> 99.5th 6696 2068
> 99.9th 12688 8816
> max 15209 14191
>
> it can be seen that the quiet test results are
> quite similar, but the p99 latency is greatly
> improved in the nosiy test.
>
> Comments and tests are appreciated!
>
> Abel Wu (5):
> sched/fair: record overloaded cpus
> sched/fair: introduce sched-idle balance
> sched/fair: add stats for sched-idle balancing
> sched/fair: filter out overloaded cpus in sis
> sched/fair: favor cpu capacity for idle tasks
>
> include/linux/sched/idle.h | 1 +
> include/linux/sched/topology.h | 15 ++++
> kernel/sched/core.c | 1 +
> kernel/sched/fair.c | 187 ++++++++++++++++++++++++++++++++++++++++-
> kernel/sched/sched.h | 6 ++
> kernel/sched/stats.c | 5 +-
> kernel/sched/topology.c | 4 +-
> 7 files changed, 215 insertions(+), 4 deletions(-)
>
Powered by blists - more mailing lists