[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16f4c4312978bc1093df4cdba2f352fee33f8927.camel@linux.intel.com>
Date: Fri, 03 Oct 2025 09:37:42 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, Peter Zijlstra
	 <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, Chen Yu <yu.c.chen@...el.com>, Doug
 Nelson	 <doug.nelson@...el.com>, Mohini Narkhede
 <mohini.narkhede@...el.com>, 	linux-kernel@...r.kernel.org, Vincent Guittot
 <vincent.guittot@...aro.org>, K Prateek Nayak <kprateek.nayak@....com>
Subject: Re: [RESEND PATCH] sched/fair: Skip sched_balance_running cmpxchg
 when balance is not due
On Fri, 2025-10-03 at 10:53 +0530, Shrikanth Hegde wrote:
> 
> On 10/3/25 4:30 AM, Tim Chen wrote:
> > Repost comments:
> > 
> > There have been past discussions about avoiding serialization in load
> > balancing, but no objections were raised to this patch itself during
> > its last posting:
> > https://lore.kernel.org/lkml/20250416035823.1846307-1-tim.c.chen@linux.intel.com/
> > 
> > Vincent and Chen Yu have already provided their Reviewed-by tags.
> > 
> > We recently encountered this issue again on a 2-socket, 240-core
> > Clearwater Forest server running SPECjbb. In this case, 14% of CPU
> > cycles were wasted on unnecessary acquisitions of
> > sched_balance_running. This reinforces the need for the change, and we
> > hope it can be merged.
> > 
> > Tim
> > 
> > ---
> > 
> > During load balancing, balancing at the LLC level and above must be
> > serialized. The scheduler currently checks the atomic
> > `sched_balance_running` flag before verifying whether a balance is
> > actually due. This causes high contention, as multiple CPUs may attempt
> > to acquire the flag concurrently.
> > 
> > On a 2-socket Granite Rapids system with sub-NUMA clustering enabled
> > and running OLTP workloads, 7.6% of CPU cycles were spent on cmpxchg
> > operations for `sched_balance_running`. In most cases, the attempt
> > aborts immediately after acquisition because the load balance time is
> > not yet due.
> > 
> > Fix this by checking whether a balance is due *before* trying to
> > acquire `sched_balance_running`. This avoids many wasted acquisitions
> > and reduces the cmpxchg overhead in `sched_balance_domain()` from 7.6%
> > to 0.05%. As a result, OLTP throughput improves by 11%.
> > 
> > Reviewed-by: Chen Yu <yu.c.chen@...el.com>
> > Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
> > Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> > ---
> 
> Hi Tim.
> 
> Fine by me. unnecessary atomic operations do hurt on large systems.
> The further optimization that i pointed out can come in later i guess.
> That would help only further. this should be good to begin with.
Thanks for your review and your past comments. We'll look into further
optimization if we find that this became a hot path again.
For now this change seemed to be good enough.
Tim
> 
> With that.
> Reviewed-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> 
Powered by blists - more mailing lists
 
