[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d4383b7a-a5e7-18ca-14ed-c533b4d43f62@arm.com>
Date: Thu, 10 Jun 2021 13:30:00 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
linux-kernel <linux-kernel@...r.kernel.org>,
"open list:THERMAL" <linux-pm@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Viresh Kumar <viresh.kumar@...aro.org>,
Quentin Perret <qperret@...gle.com>,
Vincent Donnefort <vincent.donnefort@....com>,
Beata Michalska <Beata.Michalska@....com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>, segall@...gle.com,
Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v2 1/2] sched/fair: Take thermal pressure into account
while estimating energy
On 6/10/21 1:19 PM, Vincent Guittot wrote:
> On Thu, 10 Jun 2021 at 12:37, Lukasz Luba <lukasz.luba@....com> wrote:
>>
>>
>>
>> On 6/10/21 11:07 AM, Dietmar Eggemann wrote:
>>> On 10/06/2021 11:04, Lukasz Luba wrote:
>>>>
>>
>> [snip]
>>
>>>> Not always, it depends on thermal governor decision, workload and
>>>> 'power actors' (in IPA naming convention). Then it depends when and how
>>>> hard you clamp the CPUs. They (CPUs) don't have to be always
>>>> overutilized, they might be even 50-70% utilized but the GPU reduced
>>>> power budget by 2 Watts, so CPUs left with only 1W. Which is still OK
>>>> for the CPUs, since they are only 'feeding' the GPU with new 'jobs'.
>>>
>>> All this pretty much confines the usefulness of you proposed change. A
>>> precise description of it with the patches is necessary to allow people
>>> to start from there while exploring your patches.
>>
>> OK, I see your point.
>>
>> [snip]
>>
>>>> True, I hope this description above would help to understand the
>>>> scenario.
>>>
>>> This description belongs in the patch header. The scenario in which your
>>> functionality would improve things has to be clear.
>>> I'm sure that not everybody looking at this patches is immediately aware
>>> on how IPA setups work and which specific setup you have in mind here.
>>
>> Agree. I will add this description into the patch header for v3.
>>
>> [snip]
>>
>>>>
>>>> Yes, this code implementation tries to address those issues.
>>>
>>> The point I was making here is: why using the PELT signal
>>> thermal_load_avg() and not per_cpu(thermal_pressure, cpu) directly,
>>> given the fact that the latter perfectly represents the frequency clamping?
>>>
>>
>> Good question. I wanted to be aligned with other parts in the fair.c
>> like cpu_capacity() and all it's users. The CPU capacity is reduced by
>> RT, DL, IRQ and thermal load avg, not the 'raw' value from the
>> per-cpu variable.
>>
>> TBH I cannot recall what was the argument back then
>> when thermal pressure geometric series was introduced.
>> Maybe to have a better control how fast it raises and decays
>> so other mechanisms in the scheduler will see the change in thermal
>> as not so sharp... (?)
>>
>>
>> Vincent do you remember the motivation to have geometric series
>> in thermal pressure and not use just the 'raw' value from per-cpu?
>
> In order to have thermal pressure synced with other metrics used by
> the scheduler like util/rt/dl_avg. As an example, when thermal
> pressure will decrease because thermal capping is removed, the
> utilization will increase at the same pace as thermal will decrease
> and it will not create some fake spare cycle. util_avg is the average
> expected utilization of the cpu, thermal pressure reflects the average
> stolen capacity for the same averaging time scale but this can be the
> result of a toggle between several OPP
>
> Using current capping is quite volatile to make a decision as it might
> have changed by the time you apply your decision.
>
So for this scenario, where we want to just align EAS with SchedUtil
frequency decision, which is instantaneous and has 'raw' value
of capping from policy->max, shouldn't we use:
thermal_pressure = arch_scale_thermal_pressure(cpu_id)
instead of geometric series thermal_pressure signal?
This would avoid the hassling with idle CPUs and not updated
thermal signal.
Powered by blists - more mailing lists