[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe64e1cc-f80a-49f6-a2b4-bc936bbd5916@linux.ibm.com>
Date: Tue, 11 Nov 2025 11:54:05 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...nel.org>, Chen Yu <yu.c.chen@...el.com>,
Doug Nelson <doug.nelson@...el.com>,
Mohini Narkhede <mohini.narkhede@...el.com>,
linux-kernel@...r.kernel.org,
Vincent Guittot <vincent.guittot@...aro.org>,
K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when
balance is not due
Hi Tim,
On 11/11/25 12:17 AM, Tim Chen wrote:
> The NUMA sched domain sets the SD_SERIALIZE flag by default, allowing
> only one NUMA load balancing operation to run system-wide at a time.
>
> Currently, each sched group leader directly under NUMA domain attempts
> to acquire the global sched_balance_running flag via cmpxchg() before
> checking whether load balancing is due or whether it is the designated
> load balancer for that NUMA domain. On systems with a large number
> of cores, this causes significant cache contention on the shared
> sched_balance_running flag.
>
> This patch reduces unnecessary cmpxchg() operations by first checking
> that the balancer is the designated leader for a NUMA domain from
> should_we_balance(), and the balance interval has expired before
> trying to acquire sched_balance_running to load balance a NUMA
> domain.
>
> On a 2-socket Granite Rapids system with sub-NUMA clustering enabled,
> running an OLTP workload, 7.8% of total CPU cycles were previously spent
> in sched_balance_domain() contending on sched_balance_running before
> this change.
Looks good to me. Thanks for getting this into current shape.
I see hackbench improving slightly across its variations. So,
Tested-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
Powered by blists - more mailing lists