linux-kernel - Re: 4.3 group scheduling regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151012021230.GK11102@intel.com>
Date:	Mon, 12 Oct 2015 10:12:31 +0800
From:	Yuyang Du <yuyang.du@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: 4.3 group scheduling regression

On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> > Good morning, Peter.
> > 
> > On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > > 
> > > > It's odd to me that things look pretty much the same good/bad tree with
> > > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > > the BadThing trigger.
> > >
> > > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > > you had your entire user session in 1 (auto) group and was competing
> > > against 8 manual cgroups.
> > > 
> > > So how exactly are things configured?
> >  
> > Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> > to the per CPU group entity share distribution. Let me dig more.
> 
> So in the old code we had 'magic' to deal with the case where a cgroup
> was consuming less than 1 cpu's worth of runtime. For example, a single
> task running in the group.
> 
> In that scenario it might be possible that the group entity weight:
> 
> 	se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> 
> Strongly deviates from the tg->shares; you want the single task reflect
> the full group shares to the next level; due to the whole distributed
> approximation stuff.

Yeah, I thought so.
 
> I see you've deleted all that code; see the former
> __update_group_entity_contrib().
 
Probably not there, it actually was an icky way to adjust things.

> It could be that we need to bring that back. But let me think a little
> bit more on this.. I'm having a hard time waking :/

I am guessing it is in calc_tg_weight(), and naughty boys do make them more
favored, what a reality...

Mike, beg you test the following?

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..b184da0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
 	 */
 	tg_weight = atomic_long_read(&tg->load_avg);
 	tg_weight -= cfs_rq->tg_load_avg_contrib;
-	tg_weight += cfs_rq_load_avg(cfs_rq);
+	tg_weight += cfs_rq->load.weight;
 
 	return tg_weight;
 }
@@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
 	long tg_weight, load, shares;
 
 	tg_weight = calc_tg_weight(tg, cfs_rq);
-	load = cfs_rq_load_avg(cfs_rq);
+	load = cfs_rq->load.weight;
 
 	shares = (tg->shares * load);
 	if (tg_weight)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/