[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9303b525-448a-4ed2-8ad0-043a3a6f97ac@amd.com>
Date: Fri, 6 Feb 2026 15:11:57 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Pierre Gondois <pierre.gondois@....com>, <linux-kernel@...r.kernel.org>
CC: Christian Loehle <christian.loehle@....com>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Rik van Riel
<riel@...riel.com>
Subject: Re: [PATCH 2/2] sched/fair: Balance #Tasks/#CPUs if busiest group has
no idle CPU
Hello Pierre,
On 2/5/2026 8:38 PM, Pierre Gondois wrote:
> Halving the imbalance currently lead to the following scenario.
> On a Juno with 2 clusters: CLU0: 4 CPUs and CLU1: 2 CPUs, with
> 6 long running tasks:
> - 1 task on the 2-CPUs cluster
> - 5 Tasks run in the 4-CPUs cluster
> Running the load balancer from the idle CPU (in CLU1):
> - Local group: CLU1: idle_cpus=1; nr_running=1; type=group_has_spare
> - Busiest group: CLU0 idle_cpus=0; nr_running=5 type=group_overloaded
> Half of (local->idle_cpus - busiest->idle_cpus) is 0.
> No task is migrated and the task placement persists.
...
> ---
> kernel/sched/fair.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aa14a9982b9f1..9dac3536d9c19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11235,20 +11235,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> return;
> }
>
> - if (busiest->group_weight == 1 || sds->prefer_sibling) {
> + env->migration_type = migrate_task;
> + if (busiest->group_weight == 1 || sds->prefer_sibling || !busiest->idle_cpus) {
I suppose you also have SD_ASYM_CPUCAPACITY set on your sd which is why
"sds->prefer_sibling" is false here.
Instead of checking for "busiest->idle_cpus", would it make sense to
enter this case for sibling_imbalance() when we have:
capacity_greater(capacity_of(env->dst_cpu), sds->busiest->sgc->min_capacity)
since it could very well be the case that the smaller cluster is
actually idle since task_fits_cpu() returned false for CPUs there?
I couldn't actually spot any case where we compare the capacities
of local and busiest group for <= fully_loaded but let me know if
I've missed something.
> /*
> - * When prefer sibling, evenly spread running tasks on
> - * groups.
> + * When prefer sibling, or when busiest has no idle CPU,
> + * evenly spread running tasks on groups.
> */
> - env->migration_type = migrate_task;
> env->imbalance = sibling_imbalance(env, sds, busiest, local);
I'm slightly skeptical of spreading the tasks evenly without considering
the capacity difference when we are on SD_ASYM_CPUCAPACITY. I suppose
we'll filter out the target in sched_balance_find_src_rq() and bail out
if we have only see lower capacity CPUs on the busiest group.
> } else {
> -
> /*
> * If there is no overload, we just want to even the number of
> * idle CPUs.
> */
> - env->migration_type = migrate_task;
> env->imbalance = local->idle_cpus;
> lsub_positive(&env->imbalance, busiest->idle_cpus);
> }
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists