[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e41ad66f-b8eb-4a17-aab0-6dc0f8fa55f8@arm.com>
Date: Tue, 17 Sep 2024 11:22:11 +0100
From: Christian Loehle <christian.loehle@....com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate
response time
On 9/16/24 23:22, Dietmar Eggemann wrote:
> On 20/08/2024 18:35, Qais Yousef wrote:
>> The new tunable, response_time_ms, allow us to speed up or slow down
>> the response time of the policy to meet the perf, power and thermal
>> characteristic desired by the user/sysadmin. There's no single universal
>> trade-off that we can apply for all systems even if they use the same
>> SoC. The form factor of the system, the dominant use case, and in case
>> of battery powered systems, the size of the battery and presence or
>> absence of active cooling can play a big role on what would be best to
>> use.
>>
>> The new tunable provides sensible defaults, but yet gives the power to
>> control the response time to the user/sysadmin, if they wish to.
>>
>> This tunable is applied before we apply the DVFS headroom.
>>
>> The default behavior of applying 1.25 headroom can be re-instated easily
>> now. But we continue to keep the min required headroom to overcome
>> hardware limitation in its speed to change DVFS. And any additional
>> headroom to speed things up must be applied by userspace to match their
>> expectation for best perf/watt as it dictates a type of policy that will
>> be better for some systems, but worse for others.
>>
>> There's a whitespace clean up included in sugov_start().
>>
>> Signed-off-by: Qais Yousef <qyousef@...alina.io>
>> ---
>> Documentation/admin-guide/pm/cpufreq.rst | 17 +++-
>> drivers/cpufreq/cpufreq.c | 4 +-
>> include/linux/cpufreq.h | 3 +
>> kernel/sched/cpufreq_schedutil.c | 115 ++++++++++++++++++++++-
>> 4 files changed, 132 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
>> index 6adb7988e0eb..fa0d602a920e 100644
>> --- a/Documentation/admin-guide/pm/cpufreq.rst
>> +++ b/Documentation/admin-guide/pm/cpufreq.rst
>> @@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency
>> to go up to the allowed maximum immediately and then draw back to the value
>> returned by the above formula over time.
>>
>> -This governor exposes only one tunable:
>> +This governor exposes two tunables:
>>
>> ``rate_limit_us``
>> Minimum time (in microseconds) that has to pass between two consecutive
>> @@ -427,6 +427,21 @@ This governor exposes only one tunable:
>> The purpose of this tunable is to reduce the scheduler context overhead
>> of the governor which might be excessive without it.
>>
>> +``respone_time_ms``
s/respone/response
>> + Amount of time (in milliseconds) required to ramp the policy from
>> + lowest to highest frequency. Can be decreased to speed up the
> ^^^^^^^^^^^^^^^^^
>
> This has changed IMHO. Should be the time from lowest (or better 0) to
> second highest frequency.
>
> https://lkml.kernel.org/r/20230827233203.1315953-6-qyousef@layalina.io
>
> [...]
>
Isn't it even more complicated than that?
We have the headroom applied on top of the response_time_ms, so
response_time_ms will be longer than the time it takes to reach highest cap OPP.
Furthermore, applying this to a big CPU e.g. with OPP0 cap of 200, starting
from 0 is (usually?) irrelevant, as we likely wouldn't be here if we were at 0.
I get the intent, but conveying this in an understandable interface is hard.
Powered by blists - more mailing lists