[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCLc3z6k9MwW6XKHjbh78AFrAg1T1MONYtf8N8GGR6fGQ@mail.gmail.com>
Date: Tue, 31 Oct 2023 10:48:46 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Lukasz Luba <lukasz.luba@....com>
Cc: wyes.karny@....com, peterz@...radead.org, linux-pm@...r.kernel.org,
rafael@...nel.org, vschneid@...hat.com, bristot@...hat.com,
bsegall@...gle.com, rostedt@...dmis.org, dietmar.eggemann@....com,
juri.lelli@...hat.com, beata.michalska@....com,
linux-kernel@...r.kernel.org, qyousef@...alina.io,
viresh.kumar@...aro.org, mingo@...hat.com, mgorman@...e.de
Subject: Re: [PATCH v2 1/2] sched/schedutil: rework performance estimation
Hi Lukasz,
On Mon, 30 Oct 2023 at 18:45, Lukasz Luba <lukasz.luba@....com> wrote:
>
> Hi Vincent,
>
> On 10/26/23 18:09, Vincent Guittot wrote:
> > The current method to take into account uclamp hints when estimating the
> > target frequency can end into situation where the selected target
> > frequency is finally higher than uclamp hints whereas there are no real
> > needs. Such cases mainly happen because we are currently mixing the
> > traditional scheduler utilization signal with the uclamp performance
> > hints. By adding these 2 metrics, we loose an important information when
> > it comes to select the target frequency and we have to make some
> > assumptions which can't fit all cases.
> >
> > Rework the interface between the scheduler and schedutil governor in order
> > to propagate all information down to the cpufreq governor.
> >
> > effective_cpu_util() interface changes and now returns the actual
> > utilization of the CPU with 2 optional inputs:
> > - The minimum performance for this CPU; typically the capacity to handle
> > the deadline task and the interrupt pressure. But also uclamp_min
> > request when available.
> > - The maximum targeting performance for this CPU which reflects the
> > maximum level that we would like to not exceed. By default it will be
> > the CPU capacity but can be reduced because of some performance hints
> > set with uclamp. The value can be lower than actual utilization and/or
> > min performance level.
>
> You have probably missed my question in the last v1 patch set.
Yes, sorry
>
> The description above needs a bit of clarification, since looking at the
> patches some dark corners are introduced IMO:
>
> Currently, we have a less aggressive power saving policy than this
> proposal.
>
> The questions:
> What if the PD has 4 CPUs, the max util found is 500 and is from a CPU
> w/ uclamp_max, but there is another CPU with normal utilization 499?
> What should be the final frequency for that PD?
We now follow the same sequence everywhere which can be summarized by:
for each cpu sharing the same frequency domain:
util = cpu_util(cpu)
eff_util = effective_cpu_util(util, &min, &max)
eff_util = sugov_effective_cpu_perf(eff_util, min, max) which
applies the dvfs headroom if needed
max_util = max(max_util, eff_util);
EAS anticipates the impact of the waking task on utilization and max
but the end result is the same as above once the task is enqueued so I
didn't show it for simplicity
Coming back to your example
CPU0 has uclamp_max = 500 and an actual utilization above 500. Its
eff_util will be 500
CPU1 doesn't have uclamp_max constraint and an actual utilization of
499 which will be increase with dvfs headroom to 623 in
sugov_effective_cpu_perf()
The final max util will be 623
With the current implementation we apply the dvfs headroom to the
final max_util (which is the CPU0 with uclamp_max == 500) whereas we
now apply the dvfs headroom on each CPU inside
sugov_effective_cpu_perf()
The main difference is that if CPU1 has an actual utilization of 400,
the max_util of the frequency domain will be 500 whereas it is 625
after applying dvfs headroom with current implementation
>
> In current design, where we care more about 'delivered performance
> to the tasks' than power saving, the +20% would be applied for the
> frequency. Therefore if that CPU with 499 util doesn't have uclamp_max,
> it would get a decent amount of idle time for its tasks (to compensate
> some workload variation).
CPU1 with 499 still gets its 25% margin or I missed something in your example ?
Vincent
>
> Regards,
> Lukasz
Powered by blists - more mailing lists