[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cbb364c8-5008-4fa4-b604-2d04e0095c9c@arm.com>
Date: Tue, 25 Feb 2025 10:09:30 +0000
From: Christian Loehle <christian.loehle@....com>
To: Juri Lelli <juri.lelli@...hat.com>, Qais Yousef <qyousef@...alina.io>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
Jon Hunter <jonathanh@...dia.com>, Thierry Reding <treding@...dia.com>,
Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Phil Auld <pauld@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
for hotplug
On 2/25/25 09:46, Juri Lelli wrote:
> On 25/02/25 00:02, Qais Yousef wrote:
>> On 02/24/25 10:27, Juri Lelli wrote:
>>
>>>> Okay I see. The issue though is that for a DL system with power management
>>>> features on that warrant to wake up a sugov thread to update the frequency is
>>>> sort of half broken by design. I don't see the benefit over using RT in this
>>>> case. But I appreciate I could be misguided. So take it easy on me if it is
>>>> obviously wrong understanding :) I know in Android usage of DL has been
>>>> difficult, but many systems ship with slow switch hardware.
>>>>
>>>> How does DL handle the long softirqs from block and network layers by the way?
>>>> This has been in a practice a problem for RT tasks so they should be to DL.
>>>> sugov done in stopper should be handled similarly IMHO. I *think* it would be
>>>> simpler to masquerade sugov thread as irq pressure.
>>>
>>> Kind of a trick question :), as DL doesn't handle this kind of
>>
>> :-)
>>
>>> load/pressure explicitly. It is essentially agnostic about it. From a
>>> system design point of view though, I would say that one should take
>>> that into account and maybe convert sensible kthreads to DL, so that the
>>> overall bandwidth can be explicitly evaluated. If one doesn't do that
>>> probably a less sound approach is to treat anything not explicitly
>>> scheduled by DL, but still required from a system perspective, as
>>> overload and be more conservative when assigning bandwidth to DL tasks
>>> (i.e. reduce the maximum amount of available bandwidth, so that the
>>> system doesn't get saturated).
>>
>> Maybe I didn't understand your initial answer properly. But what I got is that
>> we set as DL to do what you just suggested of converting it kthread to DL to
>> take its bandwidth into account. But we have been lying about bandwidth so far
>> and it was ignored? (I saw early bailouts of SCHED_FLAG_SUGOV was set in
>> bandwidth related operations)
>
> Ignored as to have something 'that works'. :)
>
> But, it's definitely far from being good.
>
>>>> You can use the rate_limit_us as a potential guide for how much bandwidth sugov
>>>> needs if moving it to another class really doesn't make sense instead?
>>>
>>> Or maybe try to estimate/measure how much utilization sugov threads are
>>> effectively using while running some kind of workload of interest and
>>> use that as an indication for DL runtime/period.
>>
>> I don't want to side track this thread. So maybe I should start a new thread to
>> discuss this. You might have seen my other series on consolidating cpufreq
>> updates. I'm not sure sugov can have a predictable period. Maybe runtime, but
>> it could run repeatedly, or it could be quite for a long time.
>
> Doesn't need to have a predictable period. Sporadic (activations are not
> periodic) tasks work well with DEADLINE if one is able to come up with a
> sensible bandwidth allocation for them. So for sugov (and other
> kthreads) the system designer should be thinking about the amount of CPU
> to give to each kthread (runtime/period) and the granularity of such
> allocation (period).
The only really sensible choice I see is
rate_limit * some_constant_approximated_runtime
and on many systems that may yield >100% of the capacity.
Qais' proposed changes would even remove the theoretical rate_limit cap here.
A lot of complexity for something that is essentially a non-issue in practice
AFAICS...
Powered by blists - more mailing lists