lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260107102659.GE2393663@noisy.programming.kicks-ass.net>
Date: Wed, 7 Jan 2026 11:26:59 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Aaron Tomlin <atomlin@...mlin.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>, neelx@...e.com,
	sean@...e.io, mproche@...il.com, linux-kernel@...r.kernel.org,
	mingo@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com
Subject: Re: [RFC PATCH 0/1] sched/fair: Feature to suppress Fair Server for
 NOHZ_FULL isolation

On Wed, Jan 07, 2026 at 10:48:12AM +0100, Juri Lelli wrote:
> Hello!
> 
> On 06/01/26 09:49, Aaron Tomlin wrote:
> > On Tue, Jan 06, 2026 at 02:37:49PM +0530, Shrikanth Hegde wrote:
> > > If all your SCHED_FIFO is pinned and their scheduling decisions
> > > are managed in userspace, using isolcpus would offer you better
> > > isolations compared to nohz_full.
> > 
> > Hi Shrikanth,
> > 
> > You are entirely correct; isolcpus=domain (or isolcpus= without flags as
> > per housekeeping_isolcpus_setup()) indeed offers superior isolation by
> > removing the CPU from the scheduler load-balancing domains.
> > 
> > I must apologise for the omission in my previous correspondence. I
> > neglected to mention that our specific configuration utilises isolcpus= in
> > conjunction with nohz_full=.
> > 
> > > > However, the extant "Fair Server" (Deadline Server) architecture
> > > > compromises this isolation guarantee. At present, should a background
> > > > SCHED_OTHER task be enqueued, the scheduler initiates the Fair Server
> > > > (dl_server_start). As the Fair Server functions as a SCHED_DEADLINE entity,
> > > > its activation increments rq->dl.dl_nr_running.
> > > > 
> > > 
> > > There is runtime allocated to fair server. If you make them 0 on CPUs of
> > > interest, wouldn't that work?
> > > 
> > > /sys/kernel/debug/sched/fair_server/<cpu>/runtime
> > 
> > Yes, you are quite right; setting the fair server runtime to 0 (via
> > /sys/kernel/debug/sched/fair_server/[cpu]/runtime) does indeed achieve the
> > desired effect. In my testing, the SCHED_FIFO task on the fully
> > adaptive-tick CPU remains uninterrupted by the restored clock-tick when
> > this configuration is applied. Thank you.
> > 
> > However, I believe it would be beneficial if this scheduling feature were
> > available as an automatic kernel detection mechanism. While the manual
> > runtime adjustment works, having the kernel automatically detect the
> > condition - where an RT task is running and bandwidth enforcement is
> > disabled - would provide a more seamless and robust solution for
> > partitioned systems without requiring external intervention.
> > I may consider an improved version of the patch that includes a "Fair
> > server disabled" warning much like in sched_fair_server_write().
> 
> I am not sure either we need/want the automatic mechanism, as we already
> have the fair_server interface. I kind of think that if any (kthread
> included) CFS task is enqueued on an "isolated" CPU the problem might
> reside in sub-optimal isolation (usually a config issue or a kernel
> issue that might need solving - e.g. a for_each_cpu loop that needs
> changing). Starving such tasks might anyway end in a system crash of
> sort.

We must not starve fair tasks -- this can severely affect the system
health.

Specifically per-cpu kthreads getting starved can cause complete system
lockup when other CPUs go wait for completion and such.

We must not disable the fair server, ever. Doing do means you get to
keep the pieces.

The only sane way is to ensure these tasks do not get queued in the
first place.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ