lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2jsgf6zir3j3aadheuj5xlrclb3sq2dj4zni642mbnzxdw3ux@axkou56rx7ds>
Date: Mon, 12 Jan 2026 09:32:04 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, sshegde@...ux.ibm.com, 
	neelx@...e.com, sean@...e.io, mproche@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] sched/deadline: Log Fair Server re-enablement for
 symmetry with debugfs

On Mon, Jan 12, 2026 at 10:44:03AM +0530, K Prateek Nayak wrote:
> I believe the suggested solution to that was to trace the reason for the
> kthread/fair task waking up on isolated CPUs and prevent the wakeup if
> it is for some unnecessary operation as opposed to disabling the fair
> server.

Hi Prateek,

> We have tools like https://docs.kernel.org/trace/osnoise-tracer.html to
> capture these noise. Trace the noise, bring up the case where isolation
> is broken on the current *upstream* kernel to the mailing list, and we
> can solve it for everyone instead of disabling fair server as a duct
> tape.

Thank you for your insights.

I fully concur that, in an ideal world, the "correct" solution is
invariably to identify and eliminate the root cause of any spurious
SCHED_NORMAL wakeups on isolated CPUs. Tools such as the osnoise tracer are
indeed invaluable for this pursuit.

However, I would respectfully submit that there remains a distinction
between the theoretical purity of the kernel and the pragmatic reality of
managing highly specialised, latency-critical partitions.

It is pertinent to note that the kernel currently affords users the
capability to manually modify the Fair Server's parameters via
/sys/kernel/debug/sched/fair_server/. As this resides within debugfs, it
is, by definition, a debug-only interface and not strictly considered
"production safe" or guaranteed to be free from side effects. The capacity
for a user to destabilise their system via this interface - effectively
"shooting themselves in the foot" - already exists. This existing interface
is useful for educated users who are willing to accept full accountability
for system stability in exchange for absolute determinism for a defined
period of time.

> Juri, Peter, is changing the fair server's bandwidth frequently very
> common scenario is the field?
> 
> If not, can we add a pr_warn() for when the fair server's parameters
> are changed by the userspace just to catch any absurd values that
> reduce the bandwidth to a minimum without disabling the server?
> 
> I can do something absolutely stupid like this without dmesg logging
> anything that would indicate I'm being stupid:
> 
>     # echo 4000000000 > /sys/kernel/debug/sched/fair_server/cpu0/period
>     # echo 1 > /sys/kernel/debug/sched/fair_server/cpu0/runtime
>     # sudo taskset -c 0 chrt -r 99 ~/scripts/loop&
>     # taskset -c 0 bash -c 'mkdir /sys/fs/cgroup/cg0; echo $$ > /sys/fs/cgroup/cg0/cgroup.procs;'
> 
>     ... wait for a while
> 
>      INFO: task bash:4272 blocked for more than 120 seconds.
>            Not tainted 6.19.0-rc1-tip+ #162
>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      task:bash            state:D stack:0     pid:4272  tgid:4272  ppid:4271   task_flags:0x400100 flags:0x00080000
> 
> 
> A taint might be too far but a log should be acceptable?

Regarding your valid concern about visibility and safety: I am agreeable to
hardening the observability of such changes. In the next iteration, I
propose to introduce a pr_warn() that triggers whenever the Fair Server's
runtime or period is modified from its default value (50 * NSEC_PER_MSEC
and 1000 * NSEC_PER_MSEC). This will ensure that any deviation - whether it
be a complete disablement or a reduction to unsafe levels - is clearly
logged, rightfully alerting administrators to the non-standard
configuration without removing the latitude required by those who
explicitly need to make that trade-off.


Kind regards,
-- 
Aaron Tomlin

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ