linux-kernel - Re: [PATCH 2/2] sched/fair: Balance #Tasks/#CPUs if busiest group has no idle CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9303b525-448a-4ed2-8ad0-043a3a6f97ac@amd.com>
Date: Fri, 6 Feb 2026 15:11:57 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Pierre Gondois <pierre.gondois@....com>, <linux-kernel@...r.kernel.org>
CC: Christian Loehle <christian.loehle@....com>, Ingo Molnar
	<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
	<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
	<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Rik van Riel
	<riel@...riel.com>
Subject: Re: [PATCH 2/2] sched/fair: Balance #Tasks/#CPUs if busiest group has
 no idle CPU

Hello Pierre,

On 2/5/2026 8:38 PM, Pierre Gondois wrote:
> Halving the imbalance currently lead to the following scenario.
> On a Juno with 2 clusters: CLU0: 4 CPUs and CLU1: 2 CPUs, with
> 6 long running tasks:
> - 1 task on the 2-CPUs cluster
> - 5 Tasks run in the 4-CPUs cluster
> Running the load balancer from the idle CPU (in CLU1):
> - Local group: CLU1: idle_cpus=1; nr_running=1; type=group_has_spare
> - Busiest group: CLU0 idle_cpus=0; nr_running=5 type=group_overloaded
> Half of (local->idle_cpus - busiest->idle_cpus) is 0.
> No task is migrated and the task placement persists.

...

> ---
>  kernel/sched/fair.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aa14a9982b9f1..9dac3536d9c19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11235,20 +11235,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  			return;
>  		}
>  
> -		if (busiest->group_weight == 1 || sds->prefer_sibling) {
> +		env->migration_type = migrate_task;
> +		if (busiest->group_weight == 1 || sds->prefer_sibling || !busiest->idle_cpus) {

I suppose you also have SD_ASYM_CPUCAPACITY set on your sd which is why
"sds->prefer_sibling" is false here.

Instead of checking for "busiest->idle_cpus", would it make sense to
enter this case for sibling_imbalance() when we have:

    capacity_greater(capacity_of(env->dst_cpu), sds->busiest->sgc->min_capacity)

since it could very well be the case that the smaller cluster is
actually idle since task_fits_cpu() returned false for CPUs there?

I couldn't actually spot any case where we compare the capacities
of local and busiest group for <= fully_loaded but let me know if
I've missed something.

>  			/*
> -			 * When prefer sibling, evenly spread running tasks on
> -			 * groups.
> +			 * When prefer sibling, or when busiest has no idle CPU,
> +			 * evenly spread running tasks on groups.
>  			 */
> -			env->migration_type = migrate_task;
>  			env->imbalance = sibling_imbalance(env, sds, busiest, local);

I'm slightly skeptical of spreading the tasks evenly without considering
the capacity difference when we are on SD_ASYM_CPUCAPACITY. I suppose
we'll filter out the target in sched_balance_find_src_rq() and bail out
if we have only see lower capacity CPUs on the busiest group.

>  		} else {
> -
>  			/*
>  			 * If there is no overload, we just want to even the number of
>  			 * idle CPUs.
>  			 */
> -			env->migration_type = migrate_task;
>  			env->imbalance = local->idle_cpus;
>  			lsub_positive(&env->imbalance, busiest->idle_cpus);
>  		}

-- 
Thanks and Regards,
Prateek