lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260106034209.2703289-2-atomlin@atomlin.com>
Date: Mon,  5 Jan 2026 22:42:08 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com
Cc: neelx@...e.com,
	sean@...e.io,
	mproche@...il.com,
	linux-kernel@...r.kernel.org
Subject: [RFC PATCH 1/1] sched/fair: Introduce RT_SUPPRESS_FAIR_SERVER to optimise NOHZ_FULL isolation

In strictly partitioned, latency-critical environments, such as High
Frequency Trading (HFT) platforms, CPUs are frequently configured in
fully adaptive-tick mode to execute specific SCHED_FIFO workloads. The
paramount design objective in these scenarios is the elimination of all
sources of jitter, with particular emphasis on suppressing the clock-tick.

However, recent architectural amendments regarding bandwidth control
introduced the "Fair Server" (or DL Server) - a proxy SCHED_DEADLINE entity
designed to preserve bandwidth for SCHED_OTHER (CFS) tasks. Currently, when
a CFS task enqueues on a CPU, enqueue_task_fair() invokes
dl_server_start(). Crucially, this action increments rq->dl.dl_nr_running,
forcing sched_can_stop_tick() to return false and immediately restarting
the periodic tick. This behaviour prioritises fairness over the absolute
isolation necessitated by real-time workloads.

To address this, I propose the introduction of a new scheduling feature,
RT_SUPPRESS_FAIR_SERVER, guarded by CONFIG_NO_HZ_FULL.

When this feature is engaged - provided that RT bandwidth enforcement is
inactive and a real-time task is in execution - the scheduler foregoes the
invocation of dl_server_start() within enqueue_task_fair(). Consequently:

    1. The Fair Server remains inactive (rq->dl.dl_nr_running is not
       incremented)

	2. The tick accounting logic in sched_can_stop_tick() defers to the
       standard SCHED_FIFO checks

    3. The tick remains suppressed, preserving the "Run until Block"
       isolation guarantee for the active real-time task

It must be noted that enabling this feature explicitly compromises
general purpose system fairness in favour of determinism.

     - Starvation: Any queued CFS tasks shall endure total starvation until
       such time as the RT task voluntarily yields (blocks, sleeps, or
       terminates); they will not be preempted via the server mechanism

     - Accounting: Load tracking metrics (PELT) for queued CFS entities may
       effectively freeze or suffer inaccuracies, as the tick is ordinarily
       required to update these statistics during contention

To maintain standard safety and fairness guarantees on general purpose
systems, this feature is disabled by default.


Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
---
 kernel/sched/fair.c     | 19 ++++++++++++++++++-
 kernel/sched/features.h |  9 +++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..68a8011146c5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6962,8 +6962,25 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 			h_nr_idle = 1;
 	}
 
-	if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
+	if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
+#ifdef CONFIG_NO_HZ_FULL
+		/*
+		 * Normally, we start the Fair Server to ensure CFS
+		 * bandwidth enforcement. However, if the
+		 * RT_SUPPRESS_FAIR_SERVER feature is enabled and RT
+		 * bandwidth throttling is disabled, we skip starting the
+		 * server when an RT task is running. This prevents the
+		 * server (a Deadline entity) from forcing the tick active,
+		 * thereby preserving NOHZ_FULL isolation.
+		 */
+		if (likely(!sched_feat(RT_SUPPRESS_FAIR_SERVER) ||
+					rt_bandwidth_enabled() ||
+					!rt_task(rq->curr)))
+			dl_server_start(&rq->fair_server);
+#else
 		dl_server_start(&rq->fair_server);
+#endif
+	}
 
 	/* At this point se is NULL and we are at root level*/
 	add_nr_running(rq, 1);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 980d92bab8ab..feb7cae9ce75 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -108,6 +108,15 @@ SCHED_FEAT(RT_PUSH_IPI, true)
 #endif
 
 SCHED_FEAT(RT_RUNTIME_SHARE, false)
+#ifdef CONFIG_NO_HZ_FULL
+/*
+ * Suppress Fair Server activation for SCHED_FIFO/RR tasks on
+ * NOHZ_FULL CPUs. This prevents the tick from being restarted for
+ * background CFS maintenance, prioritising deterministic RT
+ * execution over CFS fairness.
+ */
+SCHED_FEAT(RT_SUPPRESS_FAIR_SERVER, false)
+#endif
 SCHED_FEAT(LB_MIN, false)
 SCHED_FEAT(ATTACH_AGE_LOAD, true)
 
-- 
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ