[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4gvjwli7prddt5xkvqmbhrrcbpjph7b7dd6xcbbz7fudotgfib@5ytyxeo3b4ej>
Date: Mon, 27 Oct 2025 18:06:39 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Chen Yu <yu.c.chen@...el.com>,
Doug Nelson <doug.nelson@...el.com>, Mohini Narkhede <mohini.narkhede@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: Skip useless sched_balance_running acquisition if
load balance is not due
On Fri, Jun 06, 2025 at 03:51:34PM +0200, Vincent Guittot wrote:
> On Wed, 16 Apr 2025 at 05:51, Tim Chen <tim.c.chen@...ux.intel.com> wrote:
> >
> > At load balance time, balance of last level cache domains and
> > above needs to be serialized. The scheduler checks the atomic var
> > sched_balance_running first and then see if time is due for a load
> > balance. This is an expensive operation as multiple CPUs can attempt
> > sched_balance_running acquisition at the same time.
> >
> > On a 2 socket Granite Rapid systems enabling sub-numa cluster and
> > running OLTP workloads, 7.6% of cpu cycles are spent on cmpxchg of
> > sched_balance_running. Most of the time, a balance attempt is aborted
> > immediately after acquiring sched_balance_running as load balance time
> > is not due.
> >
> > Instead, check balance due time first before acquiring
> > sched_balance_running. This skips many useless acquisitions
> > of sched_balance_running and knocks the 7.6% CPU overhead on
> > sched_balance_domain() down to 0.05%. Throughput of the OLTP workload
> > improved by 11%.
> >
> > Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> > Reported-by: Mohini Narkhede <mohini.narkhede@...el.com>
> > Tested-by: Mohini Narkhede <mohini.narkhede@...el.com>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
>
Reviewed-by: Mel Gorman <mgorman@...hsingularity.net>
I've been missing for a while and even now on reduced workload so I'm
only looking at this patch now. It was never merged, but why? It looks
like a no-brainer to avoid an atomic operation with minimal effort even
if it only applies to balancing across NUMA domains.
Performance looks better for a small number of workloads on multi-socket
machines including some Zen variants. Most results were neutral which is
not very surprising given the path affected. I made no effort to determine
how hot this particular path is for any of the tested workloads but nothing
obviously superceded this patch or made it irrelevant.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists