[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50c3fc7c-7f0a-4acb-b6da-b39dc123d9b9@arm.com>
Date: Wed, 18 Oct 2023 13:20:15 +0100
From: Lukasz Luba <lukasz.luba@....com>
To: Beata Michalska <beata.michalska@....com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
rafael@...nel.org, viresh.kumar@...aro.org, qyousef@...alina.io,
linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] sched/schedutil: rework performance estimation
On 10/18/23 08:04, Beata Michalska wrote:
> Hi Vincent,
>
> On Fri, Oct 13, 2023 at 05:14:49PM +0200, Vincent Guittot wrote:
>> The current method to take into account uclamp hints when estimating the
>> target frequency can end into situation where the selected target
>> frequency is finally higher than uclamp hints whereas there are no real
>> needs. Such cases mainly happen because we are currently mixing the
>> traditional scheduler utilization signal with the uclamp performance
>> hints. By adding these 2 metrics, we loose an important information when
>> it comes to select the target frequency and we have to make some
>> assumptions which can't fit all cases.
[snip]
>>
>> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
>> index b9caa01dfac4..adec808b371a 100644
>> --- a/include/linux/energy_model.h
>> +++ b/include/linux/energy_model.h
>> @@ -243,7 +243,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
>> scale_cpu = arch_scale_cpu_capacity(cpu);
>> ps = &pd->table[pd->nr_perf_states - 1];
>>
>> - max_util = map_util_perf(max_util);
> Even though the effective_cpu_util does no longer include the headroom, it is
> being applied by sugov further down the line (sugov_effective_cpu_perf).
> Won't that bring back the original problem when freq selection within EM is
> not align with the one performed by sugov ?
It should be OK here to remove the above line. The map_util_perf()
is done before this em_cpu_energy() call, in the new code in
[snip]
>> }
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 922905194c0c..d4f7b2f49c44 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7628,7 +7628,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
>> for_each_cpu(cpu, pd_cpus) {
>> unsigned long util = cpu_util(cpu, p, -1, 0);
>>
>> - busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
>> + busy_time += effective_cpu_util(cpu, util, NULL, NULL);
>> }
>>
>> eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
>> @@ -7651,7 +7651,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
>> for_each_cpu(cpu, pd_cpus) {
>> struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
>> unsigned long util = cpu_util(cpu, p, dst_cpu, 1);
>> - unsigned long eff_util;
>> + unsigned long eff_util, min, max;
>>
>> /*
>> * Performance domain frequency: utilization clamping
>> @@ -7660,7 +7660,23 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
>> * NOTE: in case RT tasks are running, by default the
>> * FREQUENCY_UTIL's utilization can be max OPP.
>> */
>> - eff_util = effective_cpu_util(cpu, util, FREQUENCY_UTIL, tsk);
>> + eff_util = effective_cpu_util(cpu, util, &min, &max);
>> +
>> + /* Task's uclamp can modify min and max value */
>> + if (tsk && uclamp_is_used()) {
>> + min = max(min, uclamp_eff_value(p, UCLAMP_MIN));
>> +
>> + /*
>> + * If there is no active max uclamp constraint,
>> + * directly use task's one otherwise keep max
>> + */
>> + if (uclamp_rq_is_idle(cpu_rq(cpu)))
>> + max = uclamp_eff_value(p, UCLAMP_MAX);
>> + else
>> + max = max(max, uclamp_eff_value(p, UCLAMP_MAX));
>> + }
>> +
>> + eff_util = sugov_effective_cpu_perf(cpu, eff_util, min, max);
> This will include the headroom so won't it inflate the util here ?
Yes, that's the goal. It will inflate when needed. Currently, the
problem is that we always inflate (blindly) in the em_cpu_energy().
We don't know if the util value which is comming is from uclamp_max
and the frequency should not be higher, because something want to
clamp it.
The other question would be:
What if the PD has 4 CPUs, the max util found is 500 and is from
uclamp_max, but there is onother util on some other CPU 490?
That CPU is allowed and can have the +20% freq in voting.
In current design we don't punish the whole domain in such
scenario.
Powered by blists - more mailing lists