linux-kernel - Re: [PATCH 1/2] sched/schedutil: rework performance estimation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6709d44b-39c3-414d-b0f9-fe217bb32876@arm.com>
Date:   Thu, 26 Oct 2023 11:07:56 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, rafael@...nel.org,
        viresh.kumar@...aro.org, qyousef@...alina.io,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        lukasz.luba@....com
Subject: Re: [PATCH 1/2] sched/schedutil: rework performance estimation

On 20/10/2023 15:58, Vincent Guittot wrote:
> On Fri, 20 Oct 2023 at 11:48, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>>
>> On 13/10/2023 17:14, Vincent Guittot wrote:

[...]

>>> A new sugov_effective_cpu_perf() interface is also available to compute
>>> the final performance level that is targeted for the CPU after applying
>>> some cpufreq headroom and taking into account all inputs.
>>>
>>> With these 2 functions, schedutil is now able to decide when it must go
>>> above uclamp hints. It now also have a generic way to get the min
>>> perfromance level.
>>>
>>> The dependency between energy model and cpufreq governor and its headroom
>>> policy doesn't exist anymore.
>>
>> But the dependency that both are doing the same thing still exists, right?
> 
> For the energy model itself, it is now fully removed; only EAS still
> has to estimate which perf level will be selected by schedutil but it
> uses now a schedutil function without having to care about headroom
> and cpufreq governor policy

I see now. (1) replaces (2) so only schedutil and EAS, EM dependency is
gone.

compute_energy()

  max_util = eenv_pd_max_util()

                 sugov_effective_cpu_perf()

                     actual = map_util_perf(actual)   (1)


  energy = em_cpu_energy(..., max_util, ...);

               max_util = map_util_perf(max_util)     (2)

[...]

>>>  unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>>> -                              enum cpu_util_type type,
>>> -                              struct task_struct *p)
>>> +                              unsigned long *min,
>>> +                              unsigned long *max)
>>
>> FREQUENCY_UTIL relates to *min != NULL and *max != NULL
>>
>> ENERGY_UTIL relates to *min == NULL and *max == NULL
>>
>> so both must be either NULL or !NULL.
>>
>> Calling it with one equa NULL and the other with !NULL should be
>> undefined, right?
> 
> At now there is no user but one could consider only asking for min or
> max. So I would not say undefined but unused

OK.

[...]

>>> -      * OTOH, for energy computation we need the estimated running time, so
>>> -      * include util_dl and ignore dl_bw.
>>> -      */
>>> -     if (type == ENERGY_UTIL)
>>> -             util += dl_util;
>>> +     if (util >= scale) {
>>> +             if (max)
>>> +                     *max = scale;
>>
>> But that means that ucamp_max cannot constrain a system in which the
>> 'util > ucamp_max'. I guess that's related to you saying uclamp_min is a
>> hard req and uclamp_max is a soft req. I don't think that's in sync with
>> the rest of the uclamp_max implantation.
> 
> That's a mistake, I made a shortcut here. I wanted to save the
> scale_irq_capacity() step but forgot to update max 1st.
> 
> Will fix it

I see.

[...]

>> effective_cpu_util for FREQUENCY_UTIL (i.e. (*min != NULL && *max !=
>> NULL)) is slightly different.
>>
>>   missing:
>>
>>   if (!uclamp_is_used() && rt_rq_is_runnable(&rq->rt)
>>     return max
>>
>>   probably moved into sugov_effective_cpu_perf() (which is only called
>>   for `FREQUENCY_UTIL`) ?
> 
> yes, it's in sugov_effective_cpu_perf()

OK.

[...]

>>> @@ -306,7 +329,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
>>>   */
>>>  static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
>>>  {
>>> -     if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
>>> +     if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
>>
>> bw_min is more than DL right?
> 
> yes
> 
> Interruptions are preempting DL so we should include them
> And now that we can take into account uclamp_min, use it when
> computing the min perf parameter of cpufreq_driver_adjust_perf()

OK.