[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140728143005.GU6758@twins.programming.kicks-ass.net>
Date: Mon, 28 Jul 2014 16:30:05 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Rik van Riel <riel@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Michael Neuling <mikey@...ling.org>,
Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
jhladky@...hat.com, ktkhai@...allels.com,
tim.c.chen@...ux.intel.com,
Nicolas Pitre <nicolas.pitre@...aro.org>
Subject: Re: [PATCH v2] sched: make update_sd_pick_busiest return true on a
busier sd
On Fri, Jul 25, 2014 at 03:32:10PM -0400, Rik van Riel wrote:
> Subject: sched: make update_sd_pick_busiest return true on a busier sd
>
> Currently update_sd_pick_busiest only identifies the busiest sd
> that is either overloaded, or has a group imbalance. When no
> sd is imbalanced or overloaded, the load balancer fails to find
> the busiest domain.
>
> This breaks load balancing between domains that are not overloaded,
> in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest
> return true when the busiest sd yet is encountered.
>
> Behaviour for SD_ASYM_PACKING does not seem to match the comment,
> but I have no hardware to test that so I have left the behaviour
> of that code unchanged.
>
> It is unclear what to do with the group_imb condition.
> Should group_imb override a busier load? If so, should we fix
> calculate_imbalance to return a sensible number when the "busiest"
> node found has a below average load? We probably need to fix
> calculate_imbalance anyway, to deal with an overloaded group that
> happens to have a below average load...
I think we want overloaded > group_imb > other. So prefer overloaded
groups, imbalanced groups if no overloaded and anything else if no
overloaded and imbalanced thingies.
If you look at the comment near sg_imbalanced(), in that case we want to
move tasks from the first group to the second, even though the second
group would be the heaviest.
enum group_type {
group_other = 0,
group_imbalanced,
group_overloaded,
};
static enum group_type group_classify(struct sg_lb_stats *sgs)
{
if (sgs->sum_nr_running > sgs->group_capacity_factor)
return group_overloaded;
if (sgs->group_imb)
return group_imbalanced;
return group_other;
}
> /**
> * update_sd_pick_busiest - return 1 on busiest group
> * @env: The load balancing environment.
> @@ -5957,7 +5962,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> * @sgs: sched_group statistics
> *
> * Determine if @sg is a busier group than the previously selected
> - * busiest group.
> + * busiest group.
We really need that extra trailing whitespace, yes? ;-)
> * Return: %true if @sg is a busier group than the previously selected
> * busiest group. %false otherwise.
> @@ -5967,13 +5972,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> struct sched_group *sg,
> struct sg_lb_stats *sgs)
> {
if (group_classify(sgs) < group_classify(&sds->busiest_stats))
return false;
> if (sgs->avg_load <= sds->busiest_stat.avg_load)
> return false;
>
> + /* This is the busiest node. */
> + if (!(env->sd->flags & SD_ASYM_PACKING))
> return true;
>
> /*
We could replace sg_lb_stats::group_imb with the above and avoid the
endless recomputation I suppose.
Also, we still need a little change to calculate_imbalance() where we
assume sum_nr_running > group_capacity_factor.
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists