linux-kernel - Re: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate response time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e41ad66f-b8eb-4a17-aab0-6dc0f8fa55f8@arm.com>
Date: Tue, 17 Sep 2024 11:22:11 +0100
From: Christian Loehle <christian.loehle@....com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
 Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Viresh Kumar <viresh.kumar@...aro.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
 John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate
 response time

On 9/16/24 23:22, Dietmar Eggemann wrote:
> On 20/08/2024 18:35, Qais Yousef wrote:
>> The new tunable, response_time_ms,  allow us to speed up or slow down
>> the response time of the policy to meet the perf, power and thermal
>> characteristic desired by the user/sysadmin. There's no single universal
>> trade-off that we can apply for all systems even if they use the same
>> SoC. The form factor of the system, the dominant use case, and in case
>> of battery powered systems, the size of the battery and presence or
>> absence of active cooling can play a big role on what would be best to
>> use.
>>
>> The new tunable provides sensible defaults, but yet gives the power to
>> control the response time to the user/sysadmin, if they wish to.
>>
>> This tunable is applied before we apply the DVFS headroom.
>>
>> The default behavior of applying 1.25 headroom can be re-instated easily
>> now. But we continue to keep the min required headroom to overcome
>> hardware limitation in its speed to change DVFS. And any additional
>> headroom to speed things up must be applied by userspace to match their
>> expectation for best perf/watt as it dictates a type of policy that will
>> be better for some systems, but worse for others.
>>
>> There's a whitespace clean up included in sugov_start().
>>
>> Signed-off-by: Qais Yousef <qyousef@...alina.io>
>> ---
>>  Documentation/admin-guide/pm/cpufreq.rst |  17 +++-
>>  drivers/cpufreq/cpufreq.c                |   4 +-
>>  include/linux/cpufreq.h                  |   3 +
>>  kernel/sched/cpufreq_schedutil.c         | 115 ++++++++++++++++++++++-
>>  4 files changed, 132 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
>> index 6adb7988e0eb..fa0d602a920e 100644
>> --- a/Documentation/admin-guide/pm/cpufreq.rst
>> +++ b/Documentation/admin-guide/pm/cpufreq.rst
>> @@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency
>>  to go up to the allowed maximum immediately and then draw back to the value
>>  returned by the above formula over time.
>>  
>> -This governor exposes only one tunable:
>> +This governor exposes two tunables:
>>  
>>  ``rate_limit_us``
>>  	Minimum time (in microseconds) that has to pass between two consecutive
>> @@ -427,6 +427,21 @@ This governor exposes only one tunable:
>>  	The purpose of this tunable is to reduce the scheduler context overhead
>>  	of the governor which might be excessive without it.
>>  
>> +``respone_time_ms``
s/respone/response
>> +	Amount of time (in milliseconds) required to ramp the policy from
>> +	lowest to highest frequency. Can be decreased to speed up the
>                   ^^^^^^^^^^^^^^^^^
> 
> This has changed IMHO. Should be the time from lowest (or better 0) to
> second highest frequency.
> 
> https://lkml.kernel.org/r/20230827233203.1315953-6-qyousef@layalina.io
> 
> [...]
> 

Isn't it even more complicated than that?
We have the headroom applied on top of the response_time_ms, so
response_time_ms will be longer than the time it takes to reach highest cap OPP.
Furthermore, applying this to a big CPU e.g. with OPP0 cap of 200, starting
from 0 is (usually?) irrelevant, as we likely wouldn't be here if we were at 0.
I get the intent, but conveying this in an understandable interface is hard.