linux-kernel - Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60fe6b16-0fc6-6ac4-f8fe-87ae9b6592c0@arm.com>
Date:   Thu, 23 Mar 2023 17:29:52 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Qais Yousef <qyousef@...alina.io>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Kajetan Puchalski <kajetan.puchalski@....com>,
        Jian-Min Liu <jian-min.liu@...iatek.com>,
        Ingo Molnar <mingo@...nel.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        Patrick Bellasi <patrick.bellasi@...bug.net>,
        Abhijeet Dharmapurikar <adharmap@...cinc.com>,
        Qais Yousef <qais.yousef@....com>,
        linux-kernel@...r.kernel.org,
        Jonathan JMChen <jonathan.jmchen@...iatek.com>
Subject: Re: [RFC PATCH 0/1] sched/pelt: Change PELT halflife at runtime

On 01/03/2023 18:24, Qais Yousef wrote:
> On 03/01/23 11:39, Vincent Guittot wrote:
>> On Thu, 23 Feb 2023 at 16:37, Qais Yousef <qyousef@...alina.io> wrote:
>>>
>>> On 02/09/23 17:16, Vincent Guittot wrote:
>>>
>>>> I don't see how util_est_faster can help this 1ms task here ? It's
>>>> most probably never be preempted during this 1ms. For such an Android
>>>> Graphics Pipeline short task, hasn't uclamp_min been designed for and
>>>> a better solution ?
>>>
>>> uclamp_min is being used in UI and helping there. But your mileage might vary
>>> with adoption still.
>>>
>>> The major motivation behind this is to help things like gaming as the original
>>> thread started. It can help UI and other use cases too. Android framework has
>>> a lot of context on the type of workload that can help it make a decision when
>>> this helps. And OEMs can have the chance to tune and apply based on the
>>> characteristics of their device.
>>>
>>>> IIUC how util_est_faster works, it removes the waiting time when
>>>> sharing cpu time with other tasks. So as long as there is no (runnable
>>>> but not running time), the result is the same as current util_est.
>>>> util_est_faster makes a difference only when the task alternates
>>>> between runnable and running slices.
>>>> Have you considered using runnable_avg metrics in the increase of cpu
>>>> freq ? This takes into the runnable slice and not only the running
>>>> time and increase faster than util_avg when tasks compete for the same
>>>> CPU
>>>
>>> Just to understand why we're heading into this direction now.
>>>
>>> AFAIU the desired outcome to have faster rampup time (and on HMP faster up
>>> migration) which both are tied to utilization signal.
>>>
>>> Wouldn't make the util response time faster help not just for rampup, but
>>> rampdown too?
>>>
>>> If we improve util response time, couldn't this mean we can remove util_est or
>>> am I missing something?
>>
>> not sure because you still have a ramping step whereas util_est
>> directly gives you the final tager

util_est gives us instantaneous signal at enqueue for periodic tasks,
something PELT will never be able to do.
 
> I didn't get you. tager?
> 
>>
>>>
>>> Currently we have util response which is tweaked by util_est and then that is
>>> tweaked further by schedutil with that 25% margin when maping util to
>>> frequency.
>>
>> the 25% is not related to the ramping time but to the fact that you
>> always need some margin to cover unexpected events and estimation
>> error
> 
> At the moment we have
> 
> 	util_avg -> util_est -> (util_est_faster) -> util_map_freq -> schedutil filter ==> current frequency selection
> 
> I think we have too many transformations before deciding the current
> frequencies. Which makes it hard to tweak the system response.

To me it looks more like this:

max(max(util_avg, util_est), runnable_avg) -> schedutil's rate limit* -> freq. selection
                             ^^^^^^^^^^^^ 
                             new proposal to factor in root cfs_rq contention


Like Vincent mentioned, util_map_freq() (now: map_util_perf()) is only
there to create the safety margin used by schedutil & EAS.

* The schedutil up/down filter thing has been already naked in Nov 2016.
IMHO, this is where util_est was initially discussed as an alternative.
We have it in mainline as well, but one value (default 10ms) for both
directions. There was discussion to map it to the driver's
translation_latency instead.

In Pixel7 you use 0.5ms up and `5/20/20ms` down for `little/medium/big`.

So on `up` your rate is as small as possible (only respecting the
driver's translation_latency) but on `down` you use much more than that. 

Why exactly do you have this higher value on `down`? My hunch is
scenarios in which the CPU (all CPUs in the freq. domain) goes idle,
so util_est is 0 and the blocked utilization is decaying (too fast,
4ms (250Hz) versus 20ms?). So you don't want to ramp-up frequency
again when the CPU wakes up in those 20ms?   

>>> I think if we can allow improving general util response time by tweaking PELT
>>> HALFLIFE we can potentially remove util_est and potentially that magic 25%
>>> margin too.
>>>
>>> Why the approach of further tweaking util_est is better?
>>
>> note that in this case it doesn't really tweak util_est but Dietmar
>> has taken into account runnable_avg to increase the freq in case of
>> contention
>>
>> Also IIUC Dietmar's results, the problem seems more linked to the
>> selection of a higher freq than increasing the utilization;
>> runnable_avg tests give similar perf results than shorter half life
>> and better power consumption.
> 
> Does it ramp down faster too?

Not sure why you are interested in this? Can't be related to the
`driving DVFS` functionality discussed above.