linux-kernel - Re: [PATCH v4] sched/fair: Skip sched_balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fe64e1cc-f80a-49f6-a2b4-bc936bbd5916@linux.ibm.com>
Date: Tue, 11 Nov 2025 11:54:05 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...nel.org>, Chen Yu <yu.c.chen@...el.com>,
        Doug Nelson <doug.nelson@...el.com>,
        Mohini Narkhede <mohini.narkhede@...el.com>,
        linux-kernel@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v4] sched/fair: Skip sched_balance_running cmpxchg when
 balance is not due

Hi Tim,

On 11/11/25 12:17 AM, Tim Chen wrote:
> The NUMA sched domain sets the SD_SERIALIZE flag by default, allowing
> only one NUMA load balancing operation to run system-wide at a time.
> 
> Currently, each sched group leader directly under NUMA domain attempts
> to acquire the global sched_balance_running flag via cmpxchg() before
> checking whether load balancing is due or whether it is the designated
> load balancer for that NUMA domain. On systems with a large number
> of cores, this causes significant cache contention on the shared
> sched_balance_running flag.
> 
> This patch reduces unnecessary cmpxchg() operations by first checking
> that the balancer is the designated leader for a NUMA domain from
> should_we_balance(), and the balance interval has expired before
> trying to acquire sched_balance_running to load balance a NUMA
> domain.
> 
> On a 2-socket Granite Rapids system with sub-NUMA clustering enabled,
> running an OLTP workload, 7.8% of total CPU cycles were previously spent
> in sched_balance_domain() contending on sched_balance_running before
> this change.
Looks good to me. Thanks for getting this into current shape.

I see hackbench improving slightly across its variations. So,
Tested-by: Shrikanth Hegde <sshegde@...ux.ibm.com>