linux-kernel - Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <64f78eec-61fb-447f-bbba-706bd5a54cd7@arm.com>
Date:   Thu, 16 Nov 2023 13:14:49 +0000
From:   Lukasz Luba <lukasz.luba@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     wyes.karny@....com, peterz@...radead.org, linux-pm@...r.kernel.org,
        rafael@...nel.org, vschneid@...hat.com, bristot@...hat.com,
        bsegall@...gle.com, rostedt@...dmis.org, dietmar.eggemann@....com,
        juri.lelli@...hat.com, beata.michalska@....com,
        linux-kernel@...r.kernel.org, qyousef@...alina.io,
        viresh.kumar@...aro.org, mingo@...hat.com, mgorman@...e.de
Subject: Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation

Hi Vincent,

I know that there is v3, but just to respond to this below.

On 10/31/23 09:48, Vincent Guittot wrote:
> Hi Lukasz,
> 
> On Mon, 30 Oct 2023 at 18:45, Lukasz Luba <lukasz.luba@....com> wrote:
>>
>> Hi Vincent,
>>
>> On 10/26/23 18:09, Vincent Guittot wrote:
>>> The current method to take into account uclamp hints when estimating the
>>> target frequency can end into situation where the selected target
>>> frequency is finally higher than uclamp hints whereas there are no real
>>> needs. Such cases mainly happen because we are currently mixing the
>>> traditional scheduler utilization signal with the uclamp performance
>>> hints. By adding these 2 metrics, we loose an important information when
>>> it comes to select the target frequency and we have to make some
>>> assumptions which can't fit all cases.
>>>
>>> Rework the interface between the scheduler and schedutil governor in order
>>> to propagate all information down to the cpufreq governor.
>>>
>>> effective_cpu_util() interface changes and now returns the actual
>>> utilization of the CPU with 2 optional inputs:
>>> - The minimum performance for this CPU; typically the capacity to handle
>>>     the deadline task and the interrupt pressure. But also uclamp_min
>>>     request when available.
>>> - The maximum targeting performance for this CPU which reflects the
>>>     maximum level that we would like to not exceed. By default it will be
>>>     the CPU capacity but can be reduced because of some performance hints
>>>     set with uclamp. The value can be lower than actual utilization and/or
>>>     min performance level.
>>
>> You have probably missed my question in the last v1 patch set.
> 
> Yes, sorry
> 
>>
>> The description above needs a bit of clarification, since looking at the
>> patches some dark corners are introduced IMO:
>>
>> Currently, we have a less aggressive power saving policy than this
>> proposal.
>>
>> The questions:
>> What if the PD has 4 CPUs, the max util found is 500 and is from a CPU
>> w/ uclamp_max, but there is another CPU with normal utilization 499?
>> What should be the final frequency for that PD?
> 
> We now follow the same sequence everywhere which can be summarized by:
> 
> for each cpu sharing the same frequency domain:
>      util = cpu_util(cpu)
>      eff_util = effective_cpu_util(util, &min, &max)
>      eff_util = sugov_effective_cpu_perf(eff_util, min, max) which
> applies the dvfs headroom if needed
>      max_util = max(max_util, eff_util);
> 
> EAS anticipates the impact of the waking task on utilization and max
> but the end result is the same as above once the task is enqueued so I
> didn't show it for simplicity
> 
> Coming back to your example
>    CPU0 has uclamp_max = 500 and an actual utilization above 500. Its
> eff_util will be 500
>    CPU1 doesn't have uclamp_max constraint and an actual utilization of
> 499 which will be increase with dvfs headroom to 623 in
> sugov_effective_cpu_perf()
> 
> The final max util will be 623
> 
> With the current implementation we apply the dvfs headroom to the
> final max_util (which is the CPU0 with uclamp_max == 500) whereas we
> now apply the dvfs headroom on each CPU inside
> sugov_effective_cpu_perf()
> 
> The main difference is that if CPU1 has an actual utilization of 400,
> the max_util of the frequency domain will be 500 whereas it is 625
> after applying dvfs headroom with current implementation
> 
>>
>> In current design, where we care more about 'delivered performance
>> to the tasks' than power saving, the +20% would be applied for the
>> frequency. Therefore if that CPU with 499 util doesn't have uclamp_max,
>> it would get a decent amount of idle time for its tasks (to compensate
>> some workload variation).
> 
> CPU1 with 499 still gets its 25% margin or I missed something in your example ?

You understood this correctly. I don't have more questions. It should
than work OK.

Thanks,
Lukasz