linux-kernel - Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbb364c8-5008-4fa4-b604-2d04e0095c9c@arm.com>
Date: Tue, 25 Feb 2025 10:09:30 +0000
From: Christian Loehle <christian.loehle@....com>
To: Juri Lelli <juri.lelli@...hat.com>, Qais Yousef <qyousef@...alina.io>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
 Jon Hunter <jonathanh@...dia.com>, Thierry Reding <treding@...dia.com>,
 Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Phil Auld <pauld@...hat.com>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
 Shin Kawamura <kawasin@...gle.com>,
 Vineeth Remanan Pillai <vineeth@...byteword.org>,
 linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
 "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

On 2/25/25 09:46, Juri Lelli wrote:
> On 25/02/25 00:02, Qais Yousef wrote:
>> On 02/24/25 10:27, Juri Lelli wrote:
>>
>>>> Okay I see. The issue though is that for a DL system with power management
>>>> features on that warrant to wake up a sugov thread to update the frequency is
>>>> sort of half broken by design. I don't see the benefit over using RT in this
>>>> case. But I appreciate I could be misguided. So take it easy on me if it is
>>>> obviously wrong understanding :) I know in Android usage of DL has been
>>>> difficult, but many systems ship with slow switch hardware.
>>>>
>>>> How does DL handle the long softirqs from block and network layers by the way?
>>>> This has been in a practice a problem for RT tasks so they should be to DL.
>>>> sugov done in stopper should be handled similarly IMHO. I *think* it would be
>>>> simpler to masquerade sugov thread as irq pressure.
>>>
>>> Kind of a trick question :), as DL doesn't handle this kind of
>>
>> :-)
>>
>>> load/pressure explicitly. It is essentially agnostic about it. From a
>>> system design point of view though, I would say that one should take
>>> that into account and maybe convert sensible kthreads to DL, so that the
>>> overall bandwidth can be explicitly evaluated. If one doesn't do that
>>> probably a less sound approach is to treat anything not explicitly
>>> scheduled by DL, but still required from a system perspective, as
>>> overload and be more conservative when assigning bandwidth to DL tasks
>>> (i.e. reduce the maximum amount of available bandwidth, so that the
>>> system doesn't get saturated).
>>
>> Maybe I didn't understand your initial answer properly. But what I got is that
>> we set as DL to do what you just suggested of converting it kthread to DL to
>> take its bandwidth into account. But we have been lying about bandwidth so far
>> and it was ignored? (I saw early bailouts of SCHED_FLAG_SUGOV was set in
>> bandwidth related operations)
> 
> Ignored as to have something 'that works'. :)
> 
> But, it's definitely far from being good.
> 
>>>> You can use the rate_limit_us as a potential guide for how much bandwidth sugov
>>>> needs if moving it to another class really doesn't make sense instead?
>>>
>>> Or maybe try to estimate/measure how much utilization sugov threads are
>>> effectively using while running some kind of workload of interest and
>>> use that as an indication for DL runtime/period.
>>
>> I don't want to side track this thread. So maybe I should start a new thread to
>> discuss this. You might have seen my other series on consolidating cpufreq
>> updates. I'm not sure sugov can have a predictable period. Maybe runtime, but
>> it could run repeatedly, or it could be quite for a long time.
> 
> Doesn't need to have a predictable period. Sporadic (activations are not
> periodic) tasks work well with DEADLINE if one is able to come up with a
> sensible bandwidth allocation for them. So for sugov (and other
> kthreads) the system designer should be thinking about the amount of CPU
> to give to each kthread (runtime/period) and the granularity of such
> allocation (period).

The only really sensible choice I see is
rate_limit * some_constant_approximated_runtime
and on many systems that may yield >100% of the capacity.
Qais' proposed changes would even remove the theoretical rate_limit cap here.
A lot of complexity for something that is essentially a non-issue in practice
AFAICS...