lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zmjr43kk2m52huk2vvetvwefil7waletzuijiu5y34v3n4slgi@3wdtd3xckx7m>
Date: Mon, 12 Jan 2026 20:43:49 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, 
	Shrikanth Hegde <sshegde@...ux.ibm.com>, neelx@...e.com, sean@...e.io, mproche@...il.com, 
	linux-kernel@...r.kernel.org, mingo@...hat.com, vincent.guittot@...aro.org, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, 
	vschneid@...hat.com
Subject: Re: [RFC PATCH 0/1] sched/fair: Feature to suppress Fair Server for
 NOHZ_FULL isolation

On Wed, Jan 07, 2026 at 11:26:59AM +0100, Peter Zijlstra wrote:
> We must not starve fair tasks -- this can severely affect the system
> health.
> 
> Specifically per-cpu kthreads getting starved can cause complete system
> lockup when other CPUs go wait for completion and such.
> 
> We must not disable the fair server, ever. Doing do means you get to
> keep the pieces.
> 
> The only sane way is to ensure these tasks do not get queued in the
> first place.

Hi Peter,

To your point, in an effort to steer CFS (SCHED_NORMAL) tasks away from
isolated, RT-busy CPUs, I would be interested in your thoughts on the
following approach. By redirecting these "leaked" CFS tasks to housekeeping
CPUs prior to enqueueing, we ensure that rq->cfs.h_nr_queued remains at
zero on the isolated core. This prevents the activation of the Fair Server
and preserves the silence of the adaptive-tick mode.

While a race condition exists - specifically, an RT task could wake up on the
target CPU after our check returns false - this is likely acceptable. Should
an RT task wake up later, it will preempt the CFS task regardless;
consequently, the next time the CFS task sleeps and wakes, the logic will
intercept and redirect it, I think.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da46c3164537..3db7a590a24d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8526,6 +8526,32 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	/* SD_flags and WF_flags share the first nibble */
 	int sd_flag = wake_flags & 0xF;
 
+	/*
+	 * When RT_SUPPRESS_FAIR_SERVER is enabled, we proactively steer CFS tasks
+	 * away from isolated CPUs that are currently executing Real-Time tasks.
+	 *
+	 * Enqueuing a CFS task on such a CPU would trigger dl_server_start(),
+	 * which in turn restarts the tick to enforce bandwidth control. By
+	 * redirecting the task to a housekeeping CPU during the selection
+	 * phase, we preserve strict isolation and silence on the target CPU.
+	 */
+#if defined(CONFIG_NO_HZ_FULL)
+	if (sched_feat(RT_SUPPRESS_FAIR_SERVER) && !rt_bandwidth_enabled()
+			&& housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) {
+		struct rq *target_rq = cpu_rq(prev_cpu);
+		/*
+		 * Use READ_ONCE() to safely load the remote CPU's current task
+		 * pointer without holding the rq lock.
+		 */
+		struct task_struct *curr = READ_ONCE(target_rq->curr);
+
+		/* If the target CPU is isolated and busy with RT, redirect */
+		if (rt_task(curr) &&
+			!housekeeping_test_cpu(prev_cpu, HK_TYPE_KERNEL_NOISE)) {
+			return housekeeping_any_cpu(HK_TYPE_KERNEL_NOISE);
+		}
+	}
+#endif
 	/*
 	 * required for stable ->cpus_allowed
 	 */


-- 
Aaron Tomlin

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ