[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRRuvLKvMdxw8bZV@linux.ibm.com>
Date: Wed, 12 Nov 2025 16:55:48 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...nel.org>,
Chen Yu <yu.c.chen@...el.com>, Doug Nelson <doug.nelson@...el.com>,
Mohini Narkhede <mohini.narkhede@...el.com>,
linux-kernel@...r.kernel.org,
Vincent Guittot <vincent.guittot@...aro.org>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when
balance is not due
* Peter Zijlstra <peterz@...radead.org> [2025-11-12 11:45:55]:
> >
> > Right, I had the same thought when grabbed the patch yesterday, but
> > ignored it.
> >
>
> Hmm, should we not redo while keeping the lock? Doesn't make much sense
> to drop and try to reacquire things here.
>
> So perhaps this is the better option -- or did I overlook something with
> should_we_balance? It doesn't look like that will make a different
> decision on the retry.
>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11717,26 +11717,25 @@ static int sched_balance_rq(int this_cpu
> .fbq_type = all,
> .tasks = LIST_HEAD_INIT(env.tasks),
> };
> - bool need_unlock;
> + bool need_unlock = false;
>
> cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>
> schedstat_inc(sd->lb_count[idle]);
>
> -redo:
> - need_unlock = false;
> if (!should_we_balance(&env)) {
> *continue_balancing = 0;
> goto out_balanced;
> }
>
> if (sd->flags & SD_SERIALIZE) {
> - if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1)) {
> + if (!atomic_try_cmpxchg_acquire(&sched_balance_running, 0, 1))
> goto out_balanced;
> - }
> +
> need_unlock = true;
> }
>
> +redo:
> group = sched_balance_find_src_group(&env);
> if (!group) {
> schedstat_inc(sd->lb_nobusyg[idle]);
> @@ -11861,9 +11860,6 @@ static int sched_balance_rq(int this_cpu
> if (!cpumask_subset(cpus, env.dst_grpmask)) {
> env.loop = 0;
> env.loop_break = SCHED_NR_MIGRATE_BREAK;
> - if (need_unlock)
> - atomic_set_release(&sched_balance_running, 0);
> -
> goto redo;
> }
> goto out_all_pinned;
If the CPU that was doing the balance was not the first CPU of the domain
span, but it was doing the balance since the first CPU was busy, and the
first CPU now happens to be idle at redo, the scheduler would have chosen the
first CPU to do the balance. However it will now choose the CPU that had the atomic..
I think this is better because
- The first CPU may have tried just before this CPU dropped the atomic and
hence we may miss the balance opportunity.
- The first CPU and the other CPU may not be sharing cache and hence there
may be a cache-miss, which we are avoiding by doing this.
--
Thanks and Regards
Srikar Dronamraju
Powered by blists - more mailing lists