[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260106034209.2703289-2-atomlin@atomlin.com>
Date: Mon, 5 Jan 2026 22:42:08 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: mingo@...hat.com,
peterz@...radead.org,
juri.lelli@...hat.com,
vincent.guittot@...aro.org,
dietmar.eggemann@....com,
rostedt@...dmis.org,
bsegall@...gle.com,
mgorman@...e.de,
vschneid@...hat.com
Cc: neelx@...e.com,
sean@...e.io,
mproche@...il.com,
linux-kernel@...r.kernel.org
Subject: [RFC PATCH 1/1] sched/fair: Introduce RT_SUPPRESS_FAIR_SERVER to optimise NOHZ_FULL isolation
In strictly partitioned, latency-critical environments, such as High
Frequency Trading (HFT) platforms, CPUs are frequently configured in
fully adaptive-tick mode to execute specific SCHED_FIFO workloads. The
paramount design objective in these scenarios is the elimination of all
sources of jitter, with particular emphasis on suppressing the clock-tick.
However, recent architectural amendments regarding bandwidth control
introduced the "Fair Server" (or DL Server) - a proxy SCHED_DEADLINE entity
designed to preserve bandwidth for SCHED_OTHER (CFS) tasks. Currently, when
a CFS task enqueues on a CPU, enqueue_task_fair() invokes
dl_server_start(). Crucially, this action increments rq->dl.dl_nr_running,
forcing sched_can_stop_tick() to return false and immediately restarting
the periodic tick. This behaviour prioritises fairness over the absolute
isolation necessitated by real-time workloads.
To address this, I propose the introduction of a new scheduling feature,
RT_SUPPRESS_FAIR_SERVER, guarded by CONFIG_NO_HZ_FULL.
When this feature is engaged - provided that RT bandwidth enforcement is
inactive and a real-time task is in execution - the scheduler foregoes the
invocation of dl_server_start() within enqueue_task_fair(). Consequently:
1. The Fair Server remains inactive (rq->dl.dl_nr_running is not
incremented)
2. The tick accounting logic in sched_can_stop_tick() defers to the
standard SCHED_FIFO checks
3. The tick remains suppressed, preserving the "Run until Block"
isolation guarantee for the active real-time task
It must be noted that enabling this feature explicitly compromises
general purpose system fairness in favour of determinism.
- Starvation: Any queued CFS tasks shall endure total starvation until
such time as the RT task voluntarily yields (blocks, sleeps, or
terminates); they will not be preempted via the server mechanism
- Accounting: Load tracking metrics (PELT) for queued CFS entities may
effectively freeze or suffer inaccuracies, as the tick is ordinarily
required to update these statistics during contention
To maintain standard safety and fairness guarantees on general purpose
systems, this feature is disabled by default.
Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
---
kernel/sched/fair.c | 19 ++++++++++++++++++-
kernel/sched/features.h | 9 +++++++++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..68a8011146c5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6962,8 +6962,25 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
h_nr_idle = 1;
}
- if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
+ if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
+#ifdef CONFIG_NO_HZ_FULL
+ /*
+ * Normally, we start the Fair Server to ensure CFS
+ * bandwidth enforcement. However, if the
+ * RT_SUPPRESS_FAIR_SERVER feature is enabled and RT
+ * bandwidth throttling is disabled, we skip starting the
+ * server when an RT task is running. This prevents the
+ * server (a Deadline entity) from forcing the tick active,
+ * thereby preserving NOHZ_FULL isolation.
+ */
+ if (likely(!sched_feat(RT_SUPPRESS_FAIR_SERVER) ||
+ rt_bandwidth_enabled() ||
+ !rt_task(rq->curr)))
+ dl_server_start(&rq->fair_server);
+#else
dl_server_start(&rq->fair_server);
+#endif
+ }
/* At this point se is NULL and we are at root level*/
add_nr_running(rq, 1);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 980d92bab8ab..feb7cae9ce75 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -108,6 +108,15 @@ SCHED_FEAT(RT_PUSH_IPI, true)
#endif
SCHED_FEAT(RT_RUNTIME_SHARE, false)
+#ifdef CONFIG_NO_HZ_FULL
+/*
+ * Suppress Fair Server activation for SCHED_FIFO/RR tasks on
+ * NOHZ_FULL CPUs. This prevents the tick from being restarted for
+ * background CFS maintenance, prioritising deterministic RT
+ * execution over CFS fairness.
+ */
+SCHED_FEAT(RT_SUPPRESS_FAIR_SERVER, false)
+#endif
SCHED_FEAT(LB_MIN, false)
SCHED_FEAT(ATTACH_AGE_LOAD, true)
--
2.51.0
Powered by blists - more mailing lists