lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250416035823.1846307-1-tim.c.chen@linux.intel.com>
Date: Tue, 15 Apr 2025 20:58:23 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Chen Yu <yu.c.chen@...el.com>,
	Doug Nelson <doug.nelson@...el.com>,
	Mohini Narkhede <mohini.narkhede@...el.com>,
	linux-kernel@...r.kernel.org
Subject: [PATCH] sched: Skip useless sched_balance_running acquisition if load balance is not due

At load balance time, balance of last level cache domains and
above needs to be serialized. The scheduler checks the atomic var
sched_balance_running first and then see if time is due for a load
balance. This is an expensive operation as multiple CPUs can attempt
sched_balance_running acquisition at the same time.

On a 2 socket Granite Rapid systems enabling sub-numa cluster and
running OLTP workloads, 7.6% of cpu cycles are spent on cmpxchg of
sched_balance_running.  Most of the time, a balance attempt is aborted
immediately after acquiring sched_balance_running as load balance time
is not due.

Instead, check balance due time first before acquiring
sched_balance_running. This skips many useless acquisitions
of sched_balance_running and knocks the 7.6% CPU overhead on
sched_balance_domain() down to 0.05%.  Throughput of the OLTP workload
improved by 11%.

Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
Reported-by: Mohini Narkhede <mohini.narkhede@...el.com>
Tested-by: Mohini Narkhede <mohini.narkhede@...el.com>
---
 kernel/sched/fair.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e43993a4e580..5e5f7a770b2f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12220,13 +12220,13 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
 
 		interval = get_sd_balance_interval(sd, busy);
 
-		need_serialize = sd->flags & SD_SERIALIZE;
-		if (need_serialize) {
-			if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
-				goto out;
-		}
-
 		if (time_after_eq(jiffies, sd->last_balance + interval)) {
+			need_serialize = sd->flags & SD_SERIALIZE;
+			if (need_serialize) {
+				if (atomic_cmpxchg_acquire(&sched_balance_running, 0, 1))
+					goto out;
+			}
+
 			if (sched_balance_rq(cpu, rq, sd, idle, &continue_balancing)) {
 				/*
 				 * The LBF_DST_PINNED logic could have changed
@@ -12238,9 +12238,9 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
 			}
 			sd->last_balance = jiffies;
 			interval = get_sd_balance_interval(sd, busy);
+			if (need_serialize)
+				atomic_set_release(&sched_balance_running, 0);
 		}
-		if (need_serialize)
-			atomic_set_release(&sched_balance_running, 0);
 out:
 		if (time_after(next_balance, sd->last_balance + interval)) {
 			next_balance = sd->last_balance + interval;
-- 
2.32.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ