linux-kernel - Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bb7d4b51-d609-b407-4f92-d2bec8273031@arm.com>
Date:   Mon, 23 Nov 2020 11:34:44 +0000
From:   Lukasz Luba <lukasz.luba@....com>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Amit Daniel Kachhap <amit.kachhap@...il.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Javi Merino <javi.merino@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Amit Kucheria <amitk@...nel.org>, linux-kernel@...r.kernel.org,
        Quentin Perret <qperret@...gle.com>, linux-pm@...r.kernel.org
Subject: Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util()
 for SMP platforms



On 11/23/20 10:41 AM, Viresh Kumar wrote:
> On 20-11-20, 14:51, Lukasz Luba wrote:
>> On 11/19/20 7:38 AM, Viresh Kumar wrote:
>>> Scenario 1: The CPUs were mostly idle in the previous polling window of
>>> the IPA governor as the tasks were sleeping and here are the details
>>> from traces (load is in %):
>>>
>>>    Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
>>>    New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
>>>
>>> Here, the "Old" line gives the load and requested_power (dynamic_power
>>> here) numbers calculated using the idle time based implementation, while
>>> "New" is based on the CPU utilization from scheduler.
>>>
>>> As can be clearly seen, the load and requested_power numbers are simply
>>> incorrect in the idle time based approach and the numbers collected from
>>> CPU's utilization are much closer to the reality.
>>
>> It is contradicting to what you have put in 'Scenario 1' description,
>> isn't it?
> 
> At least I didn't think so when I wrote this and am still not sure :)
> 
>> Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were
>> mostly idle than 1.3W would better reflect that state.
> 
> The CPUs were idle because the tasks were sleeping, but once the tasks
> resume to work, we need a frequency that matches the real load of the
> tasks. This is exactly what schedutil would ask for as well as it uses
> the same metric and so we should be looking to ask for the same power
> budget..

Yes, agree.

> 
>> What was the IPA period in your setup?
> 
> It is 100 ms by default, though I remember that I tried with 10 ms as
> well.
> 
>> It depends on your platform IPA period (e.g. 100ms) and your current
>> runqueues state (at that sampling point in time). The PELT decay/rise
>> period is different. I am not sure if you observe the system avg load
>> for last e.g. 100ms looking at these signals. Maybe IPA period is too
>> short/long and couldn't catch up with PELT signals?
>> But we won't too short averaging, since 16ms is a display tick.
>>
>> IMHO based on this result it looks like the util could lost older
>> information from the past or didn't converge yet to this low load yet.
>>
>>>
>>> Scenario 2: The CPUs were busy in the previous polling window of the IPA
>>> governor:
>>>
>>>    Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
>>>    New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
>>>
>>> As can be seen, the idle time based load is 100% for all the CPUs as it
>>> took only the last window into account, but in reality the CPUs aren't
>>> that loaded as shown by the utilization numbers.
>>
>> This is also odd. The ~88% of total_load, looks like started decaying or
>> didn't converge yet to 100% or some task vanished?
> 
> They must have decayed a bit because of the idle period, so looks okay
> that way.
> 

I have experimented with this new estimation and compared with real
power meter and other models. It looks good, better than current
mainline. I will continue experiments, but this patch LGTM and
I will add my reviewed-by today (after finishing it).

It would make more sense to adjust IPA period to util signal then the
opposite. I have to play with this a bit...

Regards,
Lukasz