linux-kernel - Re: [PATCH v4] sched/fair: Skip sched_balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aRRuvLKvMdxw8bZV@linux.ibm.com>
Date: Wed, 12 Nov 2025 16:55:48 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...nel.org>,
        Chen Yu <yu.c.chen@...el.com>, Doug Nelson <doug.nelson@...el.com>,
        Mohini Narkhede <mohini.narkhede@...el.com>,
        linux-kernel@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Shrikanth Hegde <sshegde@...ux.ibm.com>,
        K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when
 balance is not due

* Peter Zijlstra <peterz@...radead.org> [2025-11-12 11:45:55]:

> > 
> > Right, I had the same thought when grabbed the patch yesterday, but
> > ignored it.
> > 
> 
> Hmm, should we not redo while keeping the lock? Doesn't make much sense
> to drop and try to reacquire things here.
> 
> So perhaps this is the better option -- or did I overlook something with
> should_we_balance? It doesn't look like that will make a different
> decision on the retry.
> 
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11717,26 +11717,25 @@ static int sched_balance_rq(int this_cpu
>  		.fbq_type	= all,
>  		.tasks		= LIST_HEAD_INIT(env.tasks),
>  	};
> -	bool need_unlock;
> +	bool need_unlock = false;
>  
>  	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>  
>  	schedstat_inc(sd->lb_count[idle]);
>  
> -redo:
> -	need_unlock = false;
>  	if (!should_we_balance(&env)) {
>  		*continue_balancing = 0;
>  		goto out_balanced;
>  	}
>  
>  	if (sd->flags & SD_SERIALIZE) {
> -		if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1)) {
> +		if (!atomic_try_cmpxchg_acquire(&sched_balance_running, 0, 1))
>  			goto out_balanced;
> -		}
> +
>  		need_unlock = true;
>  	}
>  
> +redo:
>  	group = sched_balance_find_src_group(&env);
>  	if (!group) {
>  		schedstat_inc(sd->lb_nobusyg[idle]);
> @@ -11861,9 +11860,6 @@ static int sched_balance_rq(int this_cpu
>  			if (!cpumask_subset(cpus, env.dst_grpmask)) {
>  				env.loop = 0;
>  				env.loop_break = SCHED_NR_MIGRATE_BREAK;
> -				if (need_unlock)
> -					atomic_set_release(&sched_balance_running, 0);
> -
>  				goto redo;
>  			}
>  			goto out_all_pinned;

If the CPU that was doing the balance was not the first CPU of the domain
span, but it was doing the balance since the first CPU was busy, and the
first CPU now happens to be idle at redo, the scheduler would have chosen the
first CPU to do the balance. However it will now choose the CPU that had the atomic..

I think this is better because 
- The first CPU may have tried just before this CPU dropped the atomic and
  hence we may miss the balance opportunity.
- The first CPU and the other CPU may not be sharing cache and hence there
  may be a cache-miss, which we are avoiding by doing this.

-- 
Thanks and Regards
Srikar Dronamraju