[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 10 Oct 2017 03:59:06 -0700
From: tip-bot for Brendan Jackman <tipbot@...or.com>
To: linux-tip-commits@...r.kernel.org
Cc: efault@....de, mingo@...nel.org, pjt@...gle.com,
tglx@...utronix.de, morten.rasmussen@....com,
torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
brendan.jackman@....com, hpa@...or.com, peterz@...radead.org
Subject: [tip:sched/core] sched/fair: Force balancing on NOHZ balance if
local group has capacity
Commit-ID: 583ffd99d7657755736d831bbc182612d1d2697d
Gitweb: https://git.kernel.org/tip/583ffd99d7657755736d831bbc182612d1d2697d
Author: Brendan Jackman <brendan.jackman@....com>
AuthorDate: Thu, 5 Oct 2017 11:58:54 +0100
Committer: Ingo Molnar <mingo@...nel.org>
CommitDate: Tue, 10 Oct 2017 11:45:32 +0200
sched/fair: Force balancing on NOHZ balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.
The original commit that adds it:
fab476228ba3 ("sched: Force balancing on newidle balance if local group has capacity")
explains:
Under certain situations, such as a niced down task (i.e. nice =
-15) in the presence of nr_cpus NICE0 tasks, the niced task lands
on a sched group and kicks away other tasks because of its large
weight. This leads to sub-optimal utilization of the
machine. Even though the sched group has capacity, it does not
pull tasks because sds.this_load >> sds.max_load, and f_b_g()
returns NULL.
A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".
The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.
Signed-off-by: Brendan Jackman <brendan.jackman@....com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mike Galbraith <efault@....de>
Cc: Morten Rasmussen <morten.rasmussen@....com>
Cc: Paul Turner <pjt@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
kernel/sched/fair.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c04a425..cf3e816 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8186,8 +8186,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
if (busiest->group_type == group_imbalanced)
goto force_balance;
- /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
- if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+ /*
+ * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+ * capacities from resulting in underutilization due to avg_load.
+ */
+ if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
busiest->group_no_capacity)
goto force_balance;
Powered by blists - more mailing lists