lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d2897ca4-b53a-4047-860f-c19d668505c4@linux.ibm.com>
Date: Tue, 13 Jan 2026 00:18:57 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Aaron Tomlin <atomlin@...mlin.com>,
        K Prateek Nayak <kprateek.nayak@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        vschneid@...hat.com, neelx@...e.com, sean@...e.io, mproche@...il.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] sched/deadline: Log Fair Server re-enablement for
 symmetry with debugfs



On 1/12/26 8:02 PM, Aaron Tomlin wrote:
> On Mon, Jan 12, 2026 at 10:44:03AM +0530, K Prateek Nayak wrote:
>> I believe the suggested solution to that was to trace the reason for the
>> kthread/fair task waking up on isolated CPUs and prevent the wakeup if
>> it is for some unnecessary operation as opposed to disabling the fair
>> server.
> 
> Hi Prateek,
> 
>> We have tools like https://docs.kernel.org/trace/osnoise-tracer.html to
>> capture these noise. Trace the noise, bring up the case where isolation
>> is broken on the current *upstream* kernel to the mailing list, and we
>> can solve it for everyone instead of disabling fair server as a duct
>> tape.
> 
> Thank you for your insights.
> 
> I fully concur that, in an ideal world, the "correct" solution is
> invariably to identify and eliminate the root cause of any spurious
> SCHED_NORMAL wakeups on isolated CPUs. Tools such as the osnoise tracer are
> indeed invaluable for this pursuit.
> 
> However, I would respectfully submit that there remains a distinction
> between the theoretical purity of the kernel and the pragmatic reality of
> managing highly specialised, latency-critical partitions.
> 
> It is pertinent to note that the kernel currently affords users the
> capability to manually modify the Fair Server's parameters via
> /sys/kernel/debug/sched/fair_server/. As this resides within debugfs, it
> is, by definition, a debug-only interface and not strictly considered
> "production safe" or guaranteed to be free from side effects. The capacity
> for a user to destabilise their system via this interface - effectively
> "shooting themselves in the foot" - already exists. This existing interface
> is useful for educated users who are willing to accept full accountability
> for system stability in exchange for absolute determinism for a defined
> period of time.
> 
>> Juri, Peter, is changing the fair server's bandwidth frequently very
>> common scenario is the field?
>>
>> If not, can we add a pr_warn() for when the fair server's parameters
>> are changed by the userspace just to catch any absurd values that
>> reduce the bandwidth to a minimum without disabling the server?
>>
>> I can do something absolutely stupid like this without dmesg logging
>> anything that would indicate I'm being stupid:
>>
>>      # echo 4000000000 > /sys/kernel/debug/sched/fair_server/cpu0/period
>>      # echo 1 > /sys/kernel/debug/sched/fair_server/cpu0/runtime
>>      # sudo taskset -c 0 chrt -r 99 ~/scripts/loop&
>>      # taskset -c 0 bash -c 'mkdir /sys/fs/cgroup/cg0; echo $$ > /sys/fs/cgroup/cg0/cgroup.procs;'
>>
>>      ... wait for a while
>>
>>       INFO: task bash:4272 blocked for more than 120 seconds.
>>             Not tainted 6.19.0-rc1-tip+ #162
>>       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>       task:bash            state:D stack:0     pid:4272  tgid:4272  ppid:4271   task_flags:0x400100 flags:0x00080000
>>
>>
>> A taint might be too far but a log should be acceptable?
> 
> Regarding your valid concern about visibility and safety: I am agreeable to
> hardening the observability of such changes. In the next iteration, I
> propose to introduce a pr_warn() that triggers whenever the Fair Server's
> runtime or period is modified from its default value (50 * NSEC_PER_MSEC
> and 1000 * NSEC_PER_MSEC). This will ensure that any deviation - whether it
> be a complete disablement or a reduction to unsafe levels - is clearly
> logged, rightfully alerting administrators to the non-standard
> configuration without removing the latitude required by those who
> explicitly need to make that trade-off.
> 

Currently it is 5%. It is going to be tricky to define unsafe levels.

Looks like Either one wants it or don't want interference from it. Are there any
users changing the default value?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ