[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170424201344.GA14169@wtj.duckdns.org>
Date: Mon, 24 Apr 2017 13:13:44 -0700
From: Tejun Heo <tj@...nel.org>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
Chris Mason <clm@...com>, kernel-team@...com
Subject: [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is
in use
Hello,
We've noticed scheduling latency spike when cgroup is in use even when
the machine is idle enough with moderate scheduling frequency and
single level of cgroup nesting. More details are in the patch
descriptions but here's a schbench run from the root cgroup.
# ~/schbench -m 2 -t 16 -s 10000 -c 15000 -r 30
Latency percentiles (usec)
50.0000th: 26
75.0000th: 62
90.0000th: 74
95.0000th: 86
*99.0000th: 887
99.5000th: 3692
99.9000th: 10832
min=0, max=13374
And here's one from inside a first level cgroup.
# ~/schbench -m 2 -t 16 -s 10000 -c 15000 -r 30
Latency percentiles (usec)
50.0000th: 31
75.0000th: 65
90.0000th: 71
95.0000th: 91
*99.0000th: 7288
99.5000th: 10352
99.9000th: 12496
min=0, max=13023
The p99 latency spike got tracked down to runnable_load_avg not being
propagated through nested cfs_rqs and thus load_balance() operating on
out-of-sync load information. It ended up picking the wrong CPU as
load balance target often enough to significantly impact p99 latency.
This patchset fixes the issue by always propagating runnable_load_avg
so that, regardless of nesting, every cfs_rq's runnable_load_avg is
the sum of the scaled loads of all tasks queued below it.
As a side effect, this changes the load_avg behavior of sched_entities
associated cfs_rq's. It doesn't seem wrong to me and I can't think of
a better / cleaner way, but if there is, please let me know.
This patchset is on top of v4.11-rc8 and contains the following two
patches.
sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity
sched/fair: Always propagate runnable_load_avg
diffstat follows.
kernel/sched/fair.c | 46 +++++++++++++++++++---------------------------
1 file changed, 19 insertions(+), 27 deletions(-)
Thanks.
--
tejun
Powered by blists - more mailing lists