lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <wfxeuwsqwoz5zb4uebsjeew2hjq5xqnnwygjgbwka3f5ftzcc5@fwtxyuvt5sak>
Date: Fri, 9 Jan 2026 09:30:46 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, sshegde@...ux.ibm.com, 
	neelx@...e.com, sean@...e.io, mproche@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] sched/deadline: Log Fair Server re-enablement for
 symmetry with debugfs

On Fri, Jan 09, 2026 at 11:47:10AM +0530, K Prateek Nayak wrote:
> Well, if you are disabling the fair_server, you're opening the doors to
> bigger problems and that printk mainly serves as an indicator to dismiss
> user induced starvation issues during debugs.
> 
> Why do you care about the symmetry of this log when you shouldn't be
> setting the runtime to 0 in the first place?

Hi Prateek,

Whilst I fully appreciate that indefinitely disabling the Fair Server
invites systemic peril, I would respectfully submit that there are
legitimate, transient scenarios where such intervention is warranted.

Consider a strictly partitioned environment utilising isolcpus=domain,5-8
alongside nohz_full=5-8. A latency-critical SCHED_FIFO task executing on
CPU 5 that never enters the kernel requires absolute isolation. If a
SCHED_NORMAL (CFS) task is enqueued - perhaps a CPU-specific kthread or
some other user-specific task - the current architecture wakes the Deadline
Server, which in turn restarts the clock-tick - see sched_can_stop_tick().
By temporarily disabling the Fair Server via the debug interface, an
administrator can preclude this interruption during a specific, sensitive
window of execution, before restoring standard operation once the critical
phase has concluded.

> > This patch amends dl_server_apply_params() to introduce the requisite
> > logging. By detecting the transition from zero to non-zero
> > bandwidth - strictly for the Fair Server entity and excluding
> > initialisation - we ensure that a "Fair server re-enabled" message is
> > emitted. This restores logging symmetry and provides administrators with
> > a clear audit trail of manual runtime adjustments.
> > 
> > Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
> > ---
> >  kernel/sched/deadline.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 319439fe1870..e64fb988e957 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1867,6 +1867,7 @@ int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 perio
> >  	u64 old_bw = init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> >  	u64 new_bw = to_ratio(period, runtime);
> >  	struct rq *rq = dl_se->rq;
> > +	bool fair_server = dl_se == &rq->fair_server;
> >  	int cpu = cpu_of(rq);
> >  	struct dl_bw *dl_b;
> >  	unsigned long cap;
> > @@ -1876,6 +1877,11 @@ int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 perio
> >  	dl_b = dl_bw_of(cpu);
> >  	guard(raw_spinlock)(&dl_b->lock);
> >  
> > +	/* Symmetric to disable message in sched_fair_server_write() */
> > +	if (!init && fair_server && !old_bw && new_bw)
> > +		printk_deferred("Fair server re-enabled on CPU %d.\n",
> > +				cpu);
> 
> That is an absolutely terrible place to put it. Why can't we have it in
> sched_fair_server_write() for DL_RUNTIME when the
> "rq->fair_server.dl_runtime" is 0 initially and is modified to a
> non-zero value similar to the "Fair server disabled" message?
> 
> I still think once the fair server is disabled, the pieces are for the
> user to keep. I wouldn't want us debugging:
> 
>     Fair server disabled in CPU X ...
>     Fair server re-enabled in CPU X ...
>     INFO: rcu_tasks detected stalls ...
> 
> only to realise the stalls were a result of starving the fair threads
> and the fair server didn't run in time / didn't have enough B/W to
> prevent that stall.

Regarding the implementation, I concede that placing the logging logic
within dl_server_apply_params() was suboptimal. You are quite right;
sched_fair_server_write() is the appropriate location for this mechanism,
as it aligns the logging directly with the user-space interaction.

Regarding your concern about debugging RCU stalls and the "keep the pieces"
philosophy: I would argue that this is precisely why the symmetry in
logging is essential.

Without the "re-enabled" marker, the audit trail is incomplete. If a system
stalls, seeing only a "Fair server disabled" message leaves the duration of
the starvation event ambiguous. By explicitly logging the re-enablement, we
establish a definitive timeline. If an RCU stall occurs shortly after the
server is re-enabled, the timestamp provides the necessary evidence to
correlate the crash directly with the preceding starvation
period — confirming that the user's intervention was indeed the root cause.
Transparency, in this case, expedites the diagnosis of "user-induced"
failure.

I shall prepare a revised patch that moves the logic to
sched_fair_server_write() and ensures the message is emitted only upon the
transition from zero to a non-zero runtime.


Kind regards,
-- 
Aaron Tomlin

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ