lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Mar 2015 16:27:00 +0100
From:	Morten Rasmussen <morten.rasmussen@....com>
To:	Vincent Guittot <vincent.guittot@...aro.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"riel@...hat.com" <riel@...hat.com>,
	"daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
	"srikar@...ux.vnet.ibm.com" <srikar@...ux.vnet.ibm.com>,
	"pjt@...gle.com" <pjt@...gle.com>,
	"benh@...nel.crashing.org" <benh@...nel.crashing.org>,
	"efault@....de" <efault@....de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"iamjoonsoo.kim@....com" <iamjoonsoo.kim@....com>,
	"svaidy@...ux.vnet.ibm.com" <svaidy@...ux.vnet.ibm.com>,
	"tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
	"jason.low2@...com" <jason.low2@...com>
Subject: Re: [PATCH V2] sched: Improve load balancing in the presence of idle
 CPUs

On Mon, Mar 30, 2015 at 02:29:09PM +0100, Vincent Guittot wrote:
> On 30 March 2015 at 14:24, Peter Zijlstra <peterz@...radead.org> wrote:
> > On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote:
> >> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote:
> >> > On Fri, Mar 27, 2015 at 05:56:51PM +0000, Morten Rasmussen wrote:
> >> >
> >> > > I agree that it is hard to predict how many additional cpus you need,
> >> > > but I don't think you necessarily need that information as long as you
> >> > > start by filling up the cpu that was kicked to do the
> >> > > nohz_idle_balance() first.
> >> >
> >> > > Reducing unnecessary wakeups is quite important for energy consumption
> >> > > and something a lot of effort is put into. You really don't want to wake
> >> > > up another cluster/package unnecessarily just because there was only one
> >> > > nohz-idle cpu left in the previous one which could have handled the
> >> > > additional load. It gets even worse if the other cluster is less
> >> > > energy-efficient (big.LITTLE).
> >> >
> >> > So the only way to get tasks to cross your cluster is by balancing that
> >> > domain. At this point we'll compute sg stats for either group
> >> > (=cluster).
> >> >
> >> > The only thing we need to ensure is that it doesn't view the small
> >> > cluster as overloaded (as long as it really isn't of course), as long as
> >> > its not viewed as overloaded it will not pull tasks from it into the big
> >> > cluster, no matter how many ILBs we run before the ILB duty cpu's
> >> > rebalance_domains() call.
> >> >
> >> > I'm really not seeing the problem here.
> >>
> >> I see. The group_classify() should take care of it in all cases of
> >> balancing across clusters. You would be iterating over all cpus in the
> >> other cluster running rebalance_domains() if the balancer cpu happens to
> >> be the last one in the little cluster though. However, within the
> >> cluster (in case you have 2 or more nohz-idle cpus) you still take a
> >> double hit. No?
> >
> > It can yes, but typically not I think. This all could use some 'help'
> > for sure.
> >
> > So the thing is, find_new_ilb() simply selects the first idle_cpus_mask
> > cpu, while at the same time, nohz_idle_balance() will iterate the
> > idle_cpus_mask with the first, being first (obviously).
> >
> > So it is very like that if we migrate on the ILB it is in fact to the
> > current CPU.
> >
> > In case we cannot, we have no choice but to wake up a second idle,
> > nothing really to be done about that.
> >
> > To put it another way, for ILB purposes the rebalance_domains() call is
> > mostly superfluous. The only other case is if the selected ILB target
> > became non-idle between being selected and getting to run the softirq
> > handler. At which point we should wake another anyhow too.

Apart from the issues that Vincent has already pointed out, including
the balancing of this_cpu in nohz_idle_balance() also means that we are
no longer ignoring rq->next_balance for the cpu being kicked. So we risk
kicking a cpu to do nohz_idle_balance which might ignore the fact that
we need an ILB (the kicker should have determined that already) due to
having done a load-balance recently.

Also, if we include this_cpu in nohz_idle_balance() and pick the first
nohz-idle cpu (as we currently do) we are back to where we were before
Preeti's patch. The balancer cpu will bail out if it pulls anything for
itself and not kick anybody to take over.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ