linux-kernel - Re: [RFC PATCH 08/14] sched: normalize tg load contributions against runnable time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1329482054.2293.273.camel@twins>
Date:	Fri, 17 Feb 2012 13:34:14 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Paul Turner <pjt@...gle.com>
Cc:	linux-kernel@...r.kernel.org, Venki Pallipadi <venki@...gle.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Mike Galbraith <efault@....de>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Ben Segall <bsegall@...gle.com>, Ingo Molnar <mingo@...e.hu>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 08/14] sched: normalize tg load contributions
 against runnable time

On Thu, 2012-02-16 at 00:36 +0100, Peter Zijlstra wrote:
> On Wed, 2012-02-01 at 17:38 -0800, Paul Turner wrote:
> > Entities of equal weight should receive equitable distribution of cpu time.
> > This is challenging in the case of a task_group's shares as execution may be
> > occurring on multiple cpus simultaneously.
> > 
> > To handle this we divide up the shares into weights proportionate with the load
> > on each cfs_rq.  This does not however, account for the fact that the sum of
> > the parts may be less than one cpu and so we need to normalize:
> >   load(tg) = min(runnable_avg(tg), 1) * tg->shares
> > Where runnable_avg is the aggregate time in which the task_group had runnable
> > children.
> 
> 
> >  static inline void __update_group_entity_contrib(struct sched_entity *se)
> >  {
> >         struct cfs_rq *cfs_rq = group_cfs_rq(se);
> >         struct task_group *tg = cfs_rq->tg;
> > +       int runnable_avg;
> >  
> >         se->avg.load_avg_contrib = (cfs_rq->tg_load_contrib * tg->shares);
> >         se->avg.load_avg_contrib /= atomic64_read(&tg->load_avg) + 1;
> > +
> > +       /*
> > +        * Unlike a task-entity, a group entity may be using >=1 cpu globally.
> > +        * However, in the case that it's using <1 cpu we need to form a
> > +        * correction term so that we contribute the same load as a task of
> > +        * equal weight. (Global runnable time is taken as a fraction over 2^12.)
> > +        */
> > +       runnable_avg = atomic_read(&tg->runnable_avg);
> > +       if (runnable_avg < (1<<12)) {
> > +               se->avg.load_avg_contrib *= runnable_avg;
> > +               se->avg.load_avg_contrib /= (1<<12);
> > +       }
> >  } 
> 
> This seems weird, and the comments don't explain anything.
> 
> Ah,.. you can count runnable multiple times (on each cpu), this also
> means that the number you're using (when below 1) can still be utter
> crap.
> 
> Neither the comment nor the changelog mention this, it should, it should
> also mention why it doesn't matter (does it?).

Since we don't know when we were runnable in the window, we can take our
runnable fraction as a flat probability distribution over the entire
window.

The combined answer we're looking for is what fraction of time was any
of our cpus running.

Take p_i to be the runnable probability of cpu i, then the probability
that both cpu0 and cpu1 were runnable is pc_0,1 = p_0 * p_1, so the
probability that either was running is p_01 = p_0 + p_1 - pc_0,1.

The 3 cpu case becomes when was either cpu01 or cpu2 running, yielding
the iteration: p_012 = p_01 + p_2 - pc_01,2.

p_012 = p_0 + p_1 + p_2 - (p_0 * p_1 + (p_0 + p_1 - p_0 * p_1) * p_2)

Now for small values of p our combined/corrective term is small, since
its a product of small, which is smaller, however it becomes more
dominant the nearer we get to 1.

Since its more likely to get near to 1 the more CPUs we have, I'm not
entirely convinced we can ignore it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/