lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028104255.1892485-1-srikar@linux.ibm.com>
Date: Tue, 28 Oct 2025 16:12:54 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: linux-kernel@...r.kernel.org
Cc: Michael Ellerman <mpe@...erman.id.au>,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        linuxppc-dev@...ts.ozlabs.org, Ben Segall <bsegall@...gle.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>, Nicholas Piggin <npiggin@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Srikar Dronamraju <srikar@...ux.ibm.com>
Subject: [PATCH 1/2] sched: Feature to decide if steal should update CPU capacity

At present, scheduler scales CPU capacity for fair tasks based on time
spent on irq and steal time. If a CPU sees irq or steal time, its
capacity for fair tasks decreases causing tasks to migrate to other CPU
that are not affected by irq and steal time. All of this is gated by
NONTASK_CAPACITY.

In virtualized setups, a CPU that reports steal time (time taken by the
hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
appear to be less busy, only for the situation to reverse shortly.

To mitigate this ping-pong behaviour, this change introduces a new
scheduler feature flag: ACCT_STEAL which will control whether steal time
contributes to non-task capacity adjustments (used for fair scheduling).

Signed-off-by: Srikar Dronamraju <srikar@...ux.ibm.com>
---
 include/linux/sched.h   | 1 +
 kernel/sched/core.c     | 7 +++++--
 kernel/sched/debug.c    | 8 ++++++++
 kernel/sched/features.h | 1 +
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..451931cce5bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2272,5 +2272,6 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
 #define alloc_tag_save(_tag)			NULL
 #define alloc_tag_restore(_tag, _old)		do {} while (0)
 #endif
+extern void steal_updates_cpu_capacity(bool enable);
 
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..3a7c4e307371 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -792,8 +792,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 	rq->clock_task += delta;
 
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
-	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
-		update_irq_load_avg(rq, irq_delta + steal);
+	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
+		if (steal && sched_feat(ACCT_STEAL))
+			irq_delta += steal;
+		update_irq_load_avg(rq, irq_delta);
+	}
 #endif
 	update_rq_clock_pelt(rq, delta);
 }
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 557246880a7e..a0393dd43bb2 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1307,3 +1307,11 @@ void resched_latency_warn(int cpu, u64 latency)
 	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
 	dump_stack();
 }
+
+void steal_updates_cpu_capacity(bool enable)
+{
+	if (enable)
+		sched_feat_set("ACCT_STEAL");
+	else
+		sched_feat_set("NO_ACCT_STEAL");
+}
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 3c12d9f93331..82d7806ea515 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -121,3 +121,4 @@ SCHED_FEAT(WA_BIAS, true)
 SCHED_FEAT(UTIL_EST, true)
 
 SCHED_FEAT(LATENCY_WARN, false)
+SCHED_FEAT(ACCT_STEAL, true)
-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ