linux-kernel - Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCLc3z6k9MwW6XKHjbh78AFrAg1T1MONYtf8N8GGR6fGQ@mail.gmail.com>
Date:   Tue, 31 Oct 2023 10:48:46 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     wyes.karny@....com, peterz@...radead.org, linux-pm@...r.kernel.org,
        rafael@...nel.org, vschneid@...hat.com, bristot@...hat.com,
        bsegall@...gle.com, rostedt@...dmis.org, dietmar.eggemann@....com,
        juri.lelli@...hat.com, beata.michalska@....com,
        linux-kernel@...r.kernel.org, qyousef@...alina.io,
        viresh.kumar@...aro.org, mingo@...hat.com, mgorman@...e.de
Subject: Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation

Hi Lukasz,

On Mon, 30 Oct 2023 at 18:45, Lukasz Luba <lukasz.luba@....com> wrote:
>
> Hi Vincent,
>
> On 10/26/23 18:09, Vincent Guittot wrote:
> > The current method to take into account uclamp hints when estimating the
> > target frequency can end into situation where the selected target
> > frequency is finally higher than uclamp hints whereas there are no real
> > needs. Such cases mainly happen because we are currently mixing the
> > traditional scheduler utilization signal with the uclamp performance
> > hints. By adding these 2 metrics, we loose an important information when
> > it comes to select the target frequency and we have to make some
> > assumptions which can't fit all cases.
> >
> > Rework the interface between the scheduler and schedutil governor in order
> > to propagate all information down to the cpufreq governor.
> >
> > effective_cpu_util() interface changes and now returns the actual
> > utilization of the CPU with 2 optional inputs:
> > - The minimum performance for this CPU; typically the capacity to handle
> >    the deadline task and the interrupt pressure. But also uclamp_min
> >    request when available.
> > - The maximum targeting performance for this CPU which reflects the
> >    maximum level that we would like to not exceed. By default it will be
> >    the CPU capacity but can be reduced because of some performance hints
> >    set with uclamp. The value can be lower than actual utilization and/or
> >    min performance level.
>
> You have probably missed my question in the last v1 patch set.

Yes, sorry

>
> The description above needs a bit of clarification, since looking at the
> patches some dark corners are introduced IMO:
>
> Currently, we have a less aggressive power saving policy than this
> proposal.
>
> The questions:
> What if the PD has 4 CPUs, the max util found is 500 and is from a CPU
> w/ uclamp_max, but there is another CPU with normal utilization 499?
> What should be the final frequency for that PD?

We now follow the same sequence everywhere which can be summarized by:

for each cpu sharing the same frequency domain:
    util = cpu_util(cpu)
    eff_util = effective_cpu_util(util, &min, &max)
    eff_util = sugov_effective_cpu_perf(eff_util, min, max) which
applies the dvfs headroom if needed
    max_util = max(max_util, eff_util);

EAS anticipates the impact of the waking task on utilization and max
but the end result is the same as above once the task is enqueued so I
didn't show it for simplicity

Coming back to your example
  CPU0 has uclamp_max = 500 and an actual utilization above 500. Its
eff_util will be 500
  CPU1 doesn't have uclamp_max constraint and an actual utilization of
499 which will be increase with dvfs headroom to 623 in
sugov_effective_cpu_perf()

The final max util will be 623

With the current implementation we apply the dvfs headroom to the
final max_util (which is the CPU0 with uclamp_max == 500) whereas we
now apply the dvfs headroom on each CPU inside
sugov_effective_cpu_perf()

The main difference is that if CPU1 has an actual utilization of 400,
the max_util of the frequency domain will be 500 whereas it is 625
after applying dvfs headroom with current implementation

>
> In current design, where we care more about 'delivered performance
> to the tasks' than power saving, the +20% would be applied for the
> frequency. Therefore if that CPU with 499 util doesn't have uclamp_max,
> it would get a decent amount of idle time for its tasks (to compensate
> some workload variation).

CPU1 with 499 still gets its 25% margin or I missed something in your example ?

Vincent

>
> Regards,
> Lukasz