[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1329482054.2293.273.camel@twins>
Date: Fri, 17 Feb 2012 13:34:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Paul Turner <pjt@...gle.com>
Cc: linux-kernel@...r.kernel.org, Venki Pallipadi <venki@...gle.com>,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
Mike Galbraith <efault@....de>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Ben Segall <bsegall@...gle.com>, Ingo Molnar <mingo@...e.hu>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 08/14] sched: normalize tg load contributions
against runnable time
On Thu, 2012-02-16 at 00:36 +0100, Peter Zijlstra wrote:
> On Wed, 2012-02-01 at 17:38 -0800, Paul Turner wrote:
> > Entities of equal weight should receive equitable distribution of cpu time.
> > This is challenging in the case of a task_group's shares as execution may be
> > occurring on multiple cpus simultaneously.
> >
> > To handle this we divide up the shares into weights proportionate with the load
> > on each cfs_rq. This does not however, account for the fact that the sum of
> > the parts may be less than one cpu and so we need to normalize:
> > load(tg) = min(runnable_avg(tg), 1) * tg->shares
> > Where runnable_avg is the aggregate time in which the task_group had runnable
> > children.
>
>
> > static inline void __update_group_entity_contrib(struct sched_entity *se)
> > {
> > struct cfs_rq *cfs_rq = group_cfs_rq(se);
> > struct task_group *tg = cfs_rq->tg;
> > + int runnable_avg;
> >
> > se->avg.load_avg_contrib = (cfs_rq->tg_load_contrib * tg->shares);
> > se->avg.load_avg_contrib /= atomic64_read(&tg->load_avg) + 1;
> > +
> > + /*
> > + * Unlike a task-entity, a group entity may be using >=1 cpu globally.
> > + * However, in the case that it's using <1 cpu we need to form a
> > + * correction term so that we contribute the same load as a task of
> > + * equal weight. (Global runnable time is taken as a fraction over 2^12.)
> > + */
> > + runnable_avg = atomic_read(&tg->runnable_avg);
> > + if (runnable_avg < (1<<12)) {
> > + se->avg.load_avg_contrib *= runnable_avg;
> > + se->avg.load_avg_contrib /= (1<<12);
> > + }
> > }
>
> This seems weird, and the comments don't explain anything.
>
> Ah,.. you can count runnable multiple times (on each cpu), this also
> means that the number you're using (when below 1) can still be utter
> crap.
>
> Neither the comment nor the changelog mention this, it should, it should
> also mention why it doesn't matter (does it?).
Since we don't know when we were runnable in the window, we can take our
runnable fraction as a flat probability distribution over the entire
window.
The combined answer we're looking for is what fraction of time was any
of our cpus running.
Take p_i to be the runnable probability of cpu i, then the probability
that both cpu0 and cpu1 were runnable is pc_0,1 = p_0 * p_1, so the
probability that either was running is p_01 = p_0 + p_1 - pc_0,1.
The 3 cpu case becomes when was either cpu01 or cpu2 running, yielding
the iteration: p_012 = p_01 + p_2 - pc_01,2.
p_012 = p_0 + p_1 + p_2 - (p_0 * p_1 + (p_0 + p_1 - p_0 * p_1) * p_2)
Now for small values of p our combined/corrective term is small, since
its a product of small, which is smaller, however it becomes more
dominant the nearer we get to 1.
Since its more likely to get near to 1 the more CPUs we have, I'm not
entirely convinced we can ignore it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists