[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200325134300.GA30416@lorien.usersys.redhat.com>
Date: Wed, 25 Mar 2020 09:43:00 -0400
From: Phil Auld <pauld@...hat.com>
To: Aubrey Li <aubrey.li@...el.com>
Cc: vincent.guittot@...aro.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
linux-kernel@...r.kernel.org, tim.c.chen@...ux.intel.com,
vpillai@...italocean.com, joel@...lfernandes.org,
Aubrey Li <aubrey.li@...ux.intel.com>
Subject: Re: [PATCH] sched/fair: Don't pull task if local group is more
loaded than busiest group
Hi Aubrey,
On Wed, Mar 25, 2020 at 08:46:28PM +0800 Aubrey Li wrote:
> A huge number of load imbalance was observed when the local group
> type is group_fully_busy, and the average load of local group is
> greater than the selected busiest group, so the imbalance calculation
> returns a negative value actually. Fix this problem by comparing the
> average load before local group type check.
>
> Signed-off-by: Aubrey Li <aubrey.li@...ux.intel.com>
> ---
> kernel/sched/fair.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c1217bf..c524369 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8862,17 +8862,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> goto out_balanced;
>
> /*
> + * If the local group is more loaded than the selected
> + * busiest group don't try to pull any tasks.
> + */
> + if (local->avg_load >= busiest->avg_load)
> + goto out_balanced;
> +
> + /*
> * When groups are overloaded, use the avg_load to ensure fairness
> * between tasks.
> */
> if (local->group_type == group_overloaded) {
> - /*
> - * If the local group is more loaded than the selected
> - * busiest group don't try to pull any tasks.
> - */
> - if (local->avg_load >= busiest->avg_load)
> - goto out_balanced;
> -
> /* XXX broken for overlapping NUMA groups */
> sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) /
> sds.total_capacity;
> --
> 2.7.4
>
I'm not sure about this. I think this patch will undo a good bit of the
benefit of the load balancer rework. Avg_load is really most useful
when things are overloaded. If we go back to looking at it here we may
fail to balance when needed.
There are cases where, due to group scheduler load scaling, local average
may be higher but have spare CPUs still whereas busiest may have extra
processes which be balanced.
Cheers,
Phil
--
Powered by blists - more mailing lists