linux-kernel - Re: [PATCH] sched: Skip useless sched_balance_running acquisition if load balance is not due

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4gvjwli7prddt5xkvqmbhrrcbpjph7b7dd6xcbbz7fudotgfib@5ytyxeo3b4ej>
Date: Mon, 27 Oct 2025 18:06:39 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Chen Yu <yu.c.chen@...el.com>, 
	Doug Nelson <doug.nelson@...el.com>, Mohini Narkhede <mohini.narkhede@...el.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: Skip useless sched_balance_running acquisition if
 load balance is not due

On Fri, Jun 06, 2025 at 03:51:34PM +0200, Vincent Guittot wrote:
> On Wed, 16 Apr 2025 at 05:51, Tim Chen <tim.c.chen@...ux.intel.com> wrote:
> >
> > At load balance time, balance of last level cache domains and
> > above needs to be serialized. The scheduler checks the atomic var
> > sched_balance_running first and then see if time is due for a load
> > balance. This is an expensive operation as multiple CPUs can attempt
> > sched_balance_running acquisition at the same time.
> >
> > On a 2 socket Granite Rapid systems enabling sub-numa cluster and
> > running OLTP workloads, 7.6% of cpu cycles are spent on cmpxchg of
> > sched_balance_running.  Most of the time, a balance attempt is aborted
> > immediately after acquiring sched_balance_running as load balance time
> > is not due.
> >
> > Instead, check balance due time first before acquiring
> > sched_balance_running. This skips many useless acquisitions
> > of sched_balance_running and knocks the 7.6% CPU overhead on
> > sched_balance_domain() down to 0.05%.  Throughput of the OLTP workload
> > improved by 11%.
> >
> > Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> > Reported-by: Mohini Narkhede <mohini.narkhede@...el.com>
> > Tested-by: Mohini Narkhede <mohini.narkhede@...el.com>
> 
> Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
> 

Reviewed-by: Mel Gorman <mgorman@...hsingularity.net>

I've been missing for a while and even now on reduced workload so I'm
only looking at this patch now. It was never merged, but why? It looks
like a no-brainer to avoid an atomic operation with minimal effort even
if it only applies to balancing across NUMA domains.

Performance looks better for a small number of workloads on multi-socket
machines including some Zen variants. Most results were neutral which is
not very surprising given the path affected. I made no effort to determine
how hot this particular path is for any of the tested workloads but nothing
obviously superceded this patch or made it irrelevant.

-- 
Mel Gorman
SUSE Labs