[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1461740991.3622.3.camel@gmail.com>
Date: Wed, 27 Apr 2016 09:09:51 +0200
From: Mike Galbraith <umgwanakikbuti@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Brendan Gregg <brendan.d.gregg@...il.com>,
Jeff Merkey <linux.mdb@...il.com>
Subject: [patch] sched: Fix smp nice induced group scheduling load
distribution woes
On Mon, 2016-04-25 at 11:18 +0200, Mike Galbraith wrote:
> On Sun, 2016-04-24 at 09:05 +0200, Mike Galbraith wrote:
> > On Sat, 2016-04-23 at 18:38 -0700, Brendan Gregg wrote:
> >
> > > The bugs they found seem real, and their analysis is great
> > > (although
> > > using visualizations to find and fix scheduler bugs isn't new),
> > > and it
> > > would be good to see these fixed. However, it would also be
> > > useful to
> > > double check how widespread these issues really are. I suspect
> > > many on
> > > this list can test these patches in different environments.
> >
> > Part of it sounded to me very much like they're meeting and
> > "fixing"
> > SMP group fairness...
>
> Ew, NUMA boxen look like they could use a hug or two. Add a group of
> one hog to compete with a box wide kbuild, ~lose a node.
sched: Fix smp nice induced group scheduling load distribution woes
On even a modest sized NUMA box any load that wants to scale
is essentially reduced to SCHED_IDLE class by smp nice scaling.
Limit niceness to prevent cramming a box wide load into a too
small space. Given niceness affects latency, give the user the
option to completely disable box wide group fairness as well.
time make -j192 modules on a 4 node NUMA box..
Before:
root cgroup
real 1m6.987s 1.00
cgroup vs 1 groups of 1 hog
real 1m20.871s 1.20
cgroup vs 2 groups of 1 hog
real 1m48.803s 1.62
Each single task group receives a ~full socket because the kbuild
has become an essentially massless object that fits in practically
no space at all. Near perfect math led directly to far from good
scaling/performance, a "Perfect is the enemy of good" poster child.
After "Let's just be nice enough instead" adjustment, single task
groups continued to sustain >99% utilization while competing with
the box sized kbuild.
cgroup vs 2 groups of 1 hog
real 1m8.151s 1.01 192/190=1.01
Good enough works better.. nearly perfectly in this case.
Signed-off-by: Mike Galbraith <umgwanakikbuit@...il.com>
---
kernel/sched/fair.c | 22 ++++++++++++++++++----
kernel/sched/features.h | 3 +++
2 files changed, 21 insertions(+), 4 deletions(-)
Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -2464,17 +2464,28 @@ static inline long calc_tg_weight(struct
static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
{
- long tg_weight, load, shares;
+ long tg_weight, load, shares, min_shares = MIN_SHARES;
- tg_weight = calc_tg_weight(tg, cfs_rq);
+ if (!sched_feat(SMP_NICE_GROUPS))
+ return tg->shares;
+
+ /*
+ * Bound niceness to prevent everything that wants to scale from
+ * essentially becoming SCHED_IDLE on multi/large socket boxen,
+ * screwing up our ability to distribute load properly and/or
+ * deliver acceptable latencies.
+ */
+ tg_weight = min_t(long, calc_tg_weight(tg, cfs_rq), sched_prio_to_weight[10]);
load = cfs_rq->load.weight;
shares = (tg->shares * load);
if (tg_weight)
shares /= tg_weight;
- if (shares < MIN_SHARES)
- shares = MIN_SHARES;
+ if (tg->shares > sched_prio_to_weight[20])
+ min_shares = sched_prio_to_weight[20];
+ if (shares < min_shares)
+ shares = min_shares;
if (shares > tg->shares)
shares = tg->shares;
@@ -2517,6 +2528,9 @@ static void update_cfs_shares(struct cfs
#ifndef CONFIG_SMP
if (likely(se->load.weight == tg->shares))
return;
+#else
+ if (!sched_feat(SMP_NICE_GROUPS) && se->load.weight == tg->shares)
+ return;
#endif
shares = calc_cfs_shares(cfs_rq, tg);
Index: linux-2.6/kernel/sched/features.h
===================================================================
--- linux-2.6.orig/kernel/sched/features.h
+++ linux-2.6/kernel/sched/features.h
@@ -69,3 +69,6 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
SCHED_FEAT(ATTACH_AGE_LOAD, true)
+#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP)
+SCHED_FEAT(SMP_NICE_GROUPS, true)
+#endif
Powered by blists - more mailing lists