[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1322524316.21329.64.camel@sbsiddha-desk.sc.intel.com>
Date: Mon, 28 Nov 2011 15:51:56 -0800
From: Suresh Siddha <suresh.b.siddha@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...e.hu>, Venki Pallipadi <venki@...gle.com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Mike Galbraith <efault@....de>,
linux-kernel <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...ux.jf.intel.com>,
"Shi, Alex" <alex.shi@...el.com>
Subject: Re: [patch 3/6] sched, nohz: sched group, domain aware nohz idle
load balancing
On Thu, 2011-11-24 at 03:47 -0800, Peter Zijlstra wrote:
> On Fri, 2011-11-18 at 15:03 -0800, Suresh Siddha wrote:
> > + for_each_domain(cpu, sd) {
> > + struct sched_group *sg = sd->groups;
> > + struct sched_group_power *sgp = sg->sgp;
> > + int nr_busy = atomic_read(&sgp->nr_busy_cpus);
> > +
> > + if (nr_busy > 1 && (nr_busy * SCHED_LOAD_SCALE > sgp->power))
> > + goto need_kick;
>
> This looks wrong, its basically always true for a box with HT.
In the presence of two busy HT siblings, we need to do the idle load
balance to figure out if the load from the busy core can be migrated to
any other idle core/sibling in the platform. And at this point, we
already know there are idle cpu's in the platform.
But you are right. using group power like the above is not right. For
example in the case of two sockets with each socket having dual core
with no HT, if one socket is completely busy with another completely
idle, we would like to identify this. But the group power of that socket
will be 2 * SCHED_POWER_SCALE.
In the older kernels, for the domains which was sharing package
resources, we were setting the group power to SCHED_POWER_SCALE for the
default performance mode. And I has that old code in the mind, while
doing the above check.
I will modify the above check to:
if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
goto need_kick;
This way, if there is a SMT/MC domain with more than one busy cpu in the
group, then we will request for the idle load balancing.
Current mainline code kicks the idle load balancer if there are two busy
cpus in the system. Above mentioned modification makes this decision
some what better. For example, two busy cpu's in two different sockets
or two busy cpu's in a dual-core single socket system will never kick
idle load balancer (as there is no need).
In future we can add more heuristics to kick the idle load balancer only
when it is really necessary (for example when there is a real imbalance
between the highest and lowest loaded groups etc). Only catch is to
identify those scenarios with out adding much penality to the busy cpu
which is identifying the imbalance and kicking the idle load balancer.
Above proposed approach is the simplest approach that is trying to do
better than the current logic we have in the kernel now.
Any more thoughts in making the kick decisions (for doing idle load
balancing) more robust are welcome.
thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists