lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20260203-fix-nohz-idle-v1-1-ad05a5872080@os.amperecomputing.com>
Date: Tue, 03 Feb 2026 16:49:03 -0800
From: Shubhang Kaushik <shubhang@...amperecomputing.com>
To: Anna-Maria Behnsen <anna-maria@...utronix.de>, 
 Frederic Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...nel.org>, 
 Thomas Gleixner <tglx@...nel.org>, 
 Vincent Guittot <vincent.guittot@...aro.org>, 
 Valentin Schneider <vschneid@...hat.com>
Cc: dietmar.eggemann@....com, bsegall@...gle.com, mgorman@...e.de, 
 rostedt@...dmis.org, Shubhang Kaushik <sh@...two.org>, 
 Christoph Lameter <cl@...two.org>, linux-kernel@...r.kernel.org, 
 Shubhang Kaushik <shubhang@...amperecomputing.com>, 
 Adam Li <adamli@...amperecomputing.com>
Subject: [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state

Under CONFIG_NO_HZ_FULL, the scheduler tick can get stopped earlier via
tick_nohz_full_stop_tick() before the CPU subsequently enters the idle
path. In this case, tick_nohz_idle_stop_tick() observes TS_FLAG_STOPPED
already set and skips nohz_balance_enter_idle() because the !was_stopped
condition assumes tick-stop and idle-entry are coupled.
This leaves a tickless idle CPU absent from nohz.idle_cpus_mask, making
it invisible to NOHZ idle load balancing while periodic balancing is
also suppressed.

The patch fixes this by decoupling tick-stop transition accounting from
scheduler bookkeeping. idle_jiffies remains updated only on the
tick-stop transition, while nohz_balance_enter_idle() is invoked
whenever a CPU enters idle with the tick already stopped, relying on its
existing idempotent gaurd to avoid duplicate registration.

Tested on Ampere Altra on 6.19.0-rc8 with CONFIG_NO_HZ_FULL enabled:
- This change improves load distribution by ensuring that tickless idle
  CPUs are visible to NOHZ idle load balancing. In llama-batched-bench,
  throughput improves by up to ~14% across multiple thread counts.
- Hackbench single-process results improve by 5% and multi-process
  results improve by up to ~26%, consistent with reduced scheduler
  jitter and earlier utilization of fully idle cores.
  No regressions observed.

Signed-off-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
Signed-off-by: Adam Li <adamli@...amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@...two.org>
Reviewed-by: Shubhang Kaushik <shubhang@...amperecomputing.com>
---
This is a resend of the original patch to ensure visibility.
Previous resend: https://lkml.org/lkml/2025/8/21/170
Original thread: https://lkml.org/lkml/2025/8/21/171

The patch addresses a performance regression in NOHZ idle load balancing 
observed under CONFIG_NO_HZ_FULL, where idle CPUs were becoming 
invisible to the balancer.
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa279409ffe950f770ff2eac868f6ece..eee6fcebe78c2f8d93464a55fe332e12fe9c164e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1250,8 +1250,9 @@ void tick_nohz_idle_stop_tick(void)
 		ts->idle_sleeps++;
 		ts->idle_expires = expires;
 
-		if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
-			ts->idle_jiffies = ts->last_jiffies;
+		if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+			if (!was_stopped)
+				ts->idle_jiffies = ts->last_jiffies;
 			nohz_balance_enter_idle(cpu);
 		}
 	} else {

---
base-commit: 18f7fcd5e69a04df57b563360b88be72471d6b62
change-id: 20260203-fix-nohz-idle-b2838276cb91

Best regards,
-- 
Shubhang Kaushik <shubhang@...amperecomputing.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ