linux-kernel - Re: [patch 3/6] sched, nohz: sched group, domain aware nohz idle load balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1322524316.21329.64.camel@sbsiddha-desk.sc.intel.com>
Date:	Mon, 28 Nov 2011 15:51:56 -0800
From:	Suresh Siddha <suresh.b.siddha@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Venki Pallipadi <venki@...gle.com>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Mike Galbraith <efault@....de>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Tim Chen <tim.c.chen@...ux.jf.intel.com>,
	"Shi, Alex" <alex.shi@...el.com>
Subject: Re: [patch 3/6] sched, nohz: sched group, domain aware nohz idle
 load balancing

On Thu, 2011-11-24 at 03:47 -0800, Peter Zijlstra wrote:
> On Fri, 2011-11-18 at 15:03 -0800, Suresh Siddha wrote:
> > +       for_each_domain(cpu, sd) {
> > +               struct sched_group *sg = sd->groups;
> > +               struct sched_group_power *sgp = sg->sgp;
> > +               int nr_busy = atomic_read(&sgp->nr_busy_cpus);
> > +
> > +               if (nr_busy > 1 && (nr_busy * SCHED_LOAD_SCALE > sgp->power))
> > +                       goto need_kick;
> 
> This looks wrong, its basically always true for a box with HT.

In the presence of two busy HT siblings, we need to do the idle load
balance to figure out if the load from the busy core can be migrated to
any other idle core/sibling in the platform. And at this point, we
already know there are idle cpu's in the platform.

But you are right. using group power like the above is not right. For
example in the case of two sockets with each socket having dual core
with no HT, if one socket is completely busy with another completely
idle, we would like to identify this. But the group power of that socket
will be 2 * SCHED_POWER_SCALE.

In the older kernels, for the domains which was sharing package
resources, we were setting the group power to SCHED_POWER_SCALE for the
default performance mode. And I has that old code in the mind, while
doing the above check.

I will modify the above check to:

if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
	goto need_kick;

This way, if there is a SMT/MC domain with more than one busy cpu in the
group, then we will request for the idle load balancing.

Current mainline code kicks the idle load balancer if there are two busy
cpus in the system. Above mentioned modification makes this decision
some what better. For example, two busy cpu's in two different sockets
or two busy cpu's in a dual-core single socket system will never kick
idle load balancer (as there is no need).

In future we can add more heuristics to kick the idle load balancer only
when it is really necessary (for example when there is a real imbalance
between the highest and lowest loaded groups etc). Only catch is to
identify those scenarios with out adding much penality to the busy cpu
which is identifying the imbalance and kicking the idle load balancer.
Above proposed approach is the simplest approach that is trying to do
better than the current logic we have in the kernel now.

Any more thoughts in making the kick decisions (for doing idle load
balancing) more robust are welcome.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/