lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 27 Apr 2016 09:09:51 +0200 From: Mike Galbraith <umgwanakikbuti@...il.com> To: Peter Zijlstra <peterz@...radead.org> Cc: LKML <linux-kernel@...r.kernel.org>, Brendan Gregg <brendan.d.gregg@...il.com>, Jeff Merkey <linux.mdb@...il.com> Subject: [patch] sched: Fix smp nice induced group scheduling load distribution woes On Mon, 2016-04-25 at 11:18 +0200, Mike Galbraith wrote: > On Sun, 2016-04-24 at 09:05 +0200, Mike Galbraith wrote: > > On Sat, 2016-04-23 at 18:38 -0700, Brendan Gregg wrote: > > > > > The bugs they found seem real, and their analysis is great > > > (although > > > using visualizations to find and fix scheduler bugs isn't new), > > > and it > > > would be good to see these fixed. However, it would also be > > > useful to > > > double check how widespread these issues really are. I suspect > > > many on > > > this list can test these patches in different environments. > > > > Part of it sounded to me very much like they're meeting and > > "fixing" > > SMP group fairness... > > Ew, NUMA boxen look like they could use a hug or two. Add a group of > one hog to compete with a box wide kbuild, ~lose a node. sched: Fix smp nice induced group scheduling load distribution woes On even a modest sized NUMA box any load that wants to scale is essentially reduced to SCHED_IDLE class by smp nice scaling. Limit niceness to prevent cramming a box wide load into a too small space. Given niceness affects latency, give the user the option to completely disable box wide group fairness as well. time make -j192 modules on a 4 node NUMA box.. Before: root cgroup real 1m6.987s 1.00 cgroup vs 1 groups of 1 hog real 1m20.871s 1.20 cgroup vs 2 groups of 1 hog real 1m48.803s 1.62 Each single task group receives a ~full socket because the kbuild has become an essentially massless object that fits in practically no space at all. Near perfect math led directly to far from good scaling/performance, a "Perfect is the enemy of good" poster child. After "Let's just be nice enough instead" adjustment, single task groups continued to sustain >99% utilization while competing with the box sized kbuild. cgroup vs 2 groups of 1 hog real 1m8.151s 1.01 192/190=1.01 Good enough works better.. nearly perfectly in this case. Signed-off-by: Mike Galbraith <umgwanakikbuit@...il.com> --- kernel/sched/fair.c | 22 ++++++++++++++++++---- kernel/sched/features.h | 3 +++ 2 files changed, 21 insertions(+), 4 deletions(-) Index: linux-2.6/kernel/sched/fair.c =================================================================== --- linux-2.6.orig/kernel/sched/fair.c +++ linux-2.6/kernel/sched/fair.c @@ -2464,17 +2464,28 @@ static inline long calc_tg_weight(struct static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) { - long tg_weight, load, shares; + long tg_weight, load, shares, min_shares = MIN_SHARES; - tg_weight = calc_tg_weight(tg, cfs_rq); + if (!sched_feat(SMP_NICE_GROUPS)) + return tg->shares; + + /* + * Bound niceness to prevent everything that wants to scale from + * essentially becoming SCHED_IDLE on multi/large socket boxen, + * screwing up our ability to distribute load properly and/or + * deliver acceptable latencies. + */ + tg_weight = min_t(long, calc_tg_weight(tg, cfs_rq), sched_prio_to_weight[10]); load = cfs_rq->load.weight; shares = (tg->shares * load); if (tg_weight) shares /= tg_weight; - if (shares < MIN_SHARES) - shares = MIN_SHARES; + if (tg->shares > sched_prio_to_weight[20]) + min_shares = sched_prio_to_weight[20]; + if (shares < min_shares) + shares = min_shares; if (shares > tg->shares) shares = tg->shares; @@ -2517,6 +2528,9 @@ static void update_cfs_shares(struct cfs #ifndef CONFIG_SMP if (likely(se->load.weight == tg->shares)) return; +#else + if (!sched_feat(SMP_NICE_GROUPS) && se->load.weight == tg->shares) + return; #endif shares = calc_cfs_shares(cfs_rq, tg); Index: linux-2.6/kernel/sched/features.h =================================================================== --- linux-2.6.orig/kernel/sched/features.h +++ linux-2.6/kernel/sched/features.h @@ -69,3 +69,6 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) +#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SMP) +SCHED_FEAT(SMP_NICE_GROUPS, true) +#endif
Powered by blists - more mailing lists