[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230704134024.GV4253@hirez.programming.kicks-ass.net>
Date: Tue, 4 Jul 2023 15:40:24 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Tobias Huschle <huschle@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
sshegde@...ux.vnet.ibm.com, srikar@...ux.vnet.ibm.com,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [RFC 1/1] sched/fair: Consider asymmetric scheduler groups in
load balancer
On Mon, May 15, 2023 at 01:46:01PM +0200, Tobias Huschle wrote:
> The current load balancer implementation implies that scheduler groups,
> within the same domain, all host the same number of CPUs. This is
> reflected in the condition, that a scheduler group, which is load
> balancing and classified as having spare capacity, should pull work
> from the busiest group, if the local group runs less processes than
> the busiest one. This implies that these two groups should run the
> same number of processes, which is problematic if the groups are not
> of the same size.
>
> The assumption that scheduler groups within the same scheduler domain
> host the same number of CPUs appears to be true for non-s390
> architectures.
Mostly; there's historically the cpuset case where we can create
lopsided groups like that. And today we're growing all these hybrid
things that will also tickle this, except they're looking towards
different balancer extentions to deal with the IPC difference so might
not be immediately caring about this here issue.
> Signed-off-by: Tobias Huschle <huschle@...ux.ibm.com>
> ---
> kernel/sched/fair.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 48b6f0ca13ac..b1307d7e4065 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10426,7 +10426,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> * group's child domain.
> */
> if (sds.prefer_sibling && local->group_type == group_has_spare &&
> - busiest->sum_nr_running > local->sum_nr_running + 1)
> + busiest->sum_nr_running * local->group_weight >
> + local->sum_nr_running * busiest->group_weight + 1)
Should that not be: busiest->group_weight * (local->sum_nr_running + 1) ?
I'm not opposed to this -- it seems fairly straight forward.
> goto force_balance;
>
> if (busiest->group_type != group_overloaded) {
> --
> 2.34.1
>
Powered by blists - more mailing lists