lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260121-v8-patch-series-v8-1-b7f1cbee5055@os.amperecomputing.com>
Date: Wed, 21 Jan 2026 01:31:53 -0800
From: Shubhang Kaushik <shubhang@...amperecomputing.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
 Juri Lelli <juri.lelli@...hat.com>, 
 Vincent Guittot <vincent.guittot@...aro.org>, 
 Dietmar Eggemann <dietmar.eggemann@....com>, 
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
 Mel Gorman <mgorman@...e.de>, Shubhang Kaushik <sh@...two.org>, 
 Valentin Schneider <vschneid@...hat.com>, 
 K Prateek Nayak <kprateek.nayak@....com>
Cc: Huang Shijie <shijie8@...il.com>, linux-kernel@...r.kernel.org, 
 Shubhang Kaushik <shubhang@...amperecomputing.com>
Subject: [PATCH v8] sched: update rq->avg_idle when a task is moved to an
 idle CPU

Currently, rq->idle_stamp is only used to calculate avg_idle during
wakeups. This means other paths that move a task to an idle CPU such as
fork/clone, execve, or migrations, do not end the CPU's idle status in
the scheduler's eyes, leading to an inaccurate avg_idle.

This patch introduces update_rq_avg_idle() to provide a more accurate
measurement of CPU idle duration. By invoking this helper in
put_prev_task_idle(), we ensure avg_idle is updated whenever a CPU
stops being idle, regardless of how the new task arrived.

Changes in v8:
- Removed the 'if (rq->idle_stamp)' check: Based on reviewer feedback,
  tracking any idle duration (not just fair-class specific) provides a
  more universal view of core availability.

Testing on an 80-core Ampere Altra (ARMv8) with 6.19-rc5 baseline:
- Hackbench : +7.2% performance gain at 16 threads.
- Schbench: Reduced p99.9 tail latencies at high concurrency.

Tested-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
Signed-off-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
---
This series improves the accuracy of rq->avg_idle by ensuring the CPU's idle
duration is updated whenever a task moves to an idle CPU.

The rq->idle_stamp is only cleared during wakeups. This leaves other paths
that move a task to an idle CPU, such as fork, exec, or load balancing
migrations, unable to end the CPU's idle status in the scheduler's view.
This architectural gap produces stale avg_idle values, misleading the
new idle balancer into incorrectly skipping task migrations and degrading
overall throughput on high core count systems.

v7--> v8:
    Remove the 'if (rq->idle_stamp)' condition check in
    update_rq_avg_idle().
    --v7:https://lkml.org/lkml/2025/12/26/90

v6--> v7:
    Call the update_rq_avg_idle() in the put_prev_task_idle().
    Remove the patch 1 in the original patch set.
   --v6:https://lkml.org/lkml/2025/12/9/377

v5--> v6:
    Remove "this_rq->idle_stamp = 0;" in patch 1.
    Update the test result with Specjbb.
   --v5:https://lkml.org/lkml/2025/12/3/179

v4--> v5:
    Modify the changelog.

   --v4:https://lkml.org/lkml/2025/11/28/300

v3--> v4:
     Remove the code for delayed task.

   --v3: https://lkml.org/lkml/2025/11/27/456

v2--> v3:
  -- merge patch 3 into patch 2:
      move update_rq_avg_idle() to enqueue_task().

   --v2: https://lkml.org/lkml/2025/11/27/214

v1--> v2:
  -- Put update_rq_avg_idle() to activate_task()
  -- Add Delay-dequeue task check.

   --v1: https://lkml.org/lkml/2025/11/24/97

kernel/sched/core.c | 23 +++++++++++------------
kernel/sched/idle.c | 1 +
kernel/sched/sched.h | 1 +
3 files changed, 13 insertions(+), 12 deletions(-)
--
2.52.0

sched/core: update rq->avg_idle when a task is moved to an idle CPU
---
 kernel/sched/core.c  | 24 ++++++++++++------------
 kernel/sched/idle.c  |  1 +
 kernel/sched/sched.h |  1 +
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 045f83ad261e25283d290fd064ad47cd2399dc79..81a841e22c961ff04ad291eeeed81147fd464324 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3607,6 +3607,18 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
 	trace_sched_wakeup(p);
 }
 
+void update_rq_avg_idle(struct rq *rq)
+{
+	u64 delta = rq_clock(rq) - rq->idle_stamp;
+	u64 max = 2*rq->max_idle_balance_cost;
+
+	update_avg(&rq->avg_idle, delta);
+
+	if (rq->avg_idle > max)
+		rq->avg_idle = max;
+	rq->idle_stamp = 0;
+}
+
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		 struct rq_flags *rf)
@@ -3642,18 +3654,6 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		p->sched_class->task_woken(rq, p);
 		rq_repin_lock(rq, rf);
 	}
-
-	if (rq->idle_stamp) {
-		u64 delta = rq_clock(rq) - rq->idle_stamp;
-		u64 max = 2*rq->max_idle_balance_cost;
-
-		update_avg(&rq->avg_idle, delta);
-
-		if (rq->avg_idle > max)
-			rq->avg_idle = max;
-
-		rq->idle_stamp = 0;
-	}
 }
 
 /*
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c174afe1dd177a22535417be0de1fc1b690c0368..36ddc5bcfa0383bd4d07d3c8b732ee5b8567d194 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -460,6 +460,7 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
 {
 	update_curr_idle(rq);
 	scx_update_idle(rq, false, true);
+	update_rq_avg_idle(rq);
 }
 
 static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 93fce4bbff5eac1d4719394e89dfae886b74d865..7edf8600f2c3f45afa32bc73db2155ea6e0067f0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1676,6 +1676,7 @@ static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
 
 #endif /* !CONFIG_FAIR_GROUP_SCHED */
 
+extern void update_rq_avg_idle(struct rq *rq);
 extern void update_rq_clock(struct rq *rq);
 
 /*

---
base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7
change-id: 20260116-v8-patch-series-5ff91b821cd4

Best regards,
-- 
Shubhang Kaushik <shubhang@...amperecomputing.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ