[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB8ipk-o-S4epugFa-ViYOW9cpu5Do+M0KRBzEJiDq4k3c=KWg@mail.gmail.com>
Date: Mon, 1 Jul 2024 20:00:28 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, dietmar.eggemann@....com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, qyousef@...alina.io,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
vschneid@...hat.com, christian.loehle@....com, vincent.donnefort@....com,
ke.wang@...soc.com, di.shen@...soc.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
On Fri, Jun 28, 2024 at 12:15 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@...il.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@...soc.com> wrote:
> > > >
> > > > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > changed the PD's util from per-CPU to per-PD capping. But because
> > > > the effective_cpu_util() would return a util which maybe bigger
> > > > than the actual_cpu_capacity, this could cause the pd_busy_time
> > > > calculation errors.
> > >
> > > I'm still not convinced that this is an error. Your example used for v1 is :
> > >
> > > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > > of cpufreq-limit, the cpu_actual_cap = 512.
> > >
> > > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 256;
> > > effective_cpu_util(7) = 0;
> > >
> > > so env->pd_busy_time = 2304
> > >
> > > Even if effective_cpu_util(4) = 1024; is above the current max compute
> > > capacity of 512, this also means that activity of cpu4 will run twice
> > > longer . If you cap effective_cpu_util(4) to 512 you miss the
> > > information that it will run twice longer at the selected OPP. The
> > > extreme case being:
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 1024;
> > > effective_cpu_util(7) = 1024;
> > >
> > > in this case env->pd_busy_time = 4096
> > >
> > > If we cap, we can't make any difference between the 2 cases
> > >
> > > Do you have more details about the problem you are facing ?
> >
> > Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
> >
> > energy = ps->cost * sum_util = ps->cost * eenv->pd_busy_time;
> >
> > Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
> > cost instead of the max freq's cost.
> > So the energy is determined by pd_busy_time.
> >
> > Still the example above:
> >
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit, the cpu_actual_cap = 512.
> >
> > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 256;
> > effective_cpu_util(7) = 0;
> >
> > Before the patch:
> > env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
> > However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
> > the actual_cpu_cap.
> >
> > After the patch:
> > cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(6) = min(256, eenv->cpu_cap) = 256;
> > cpu_util(7) = 0;
> > env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.
> >
> > As a result, without this patch, the energy is bigger than actual_energy.
> >
> > And even if cpu4 would run twice longer, the energy may not be equal.
> > Because:
> > * ps->power * cpu_max_freq
> > * cpu_nrg = ------------------------ * cpu_util (3)
> > * ps->freq * scale_cpu
> >
> > the ps->power = cfv2, and then:
> >
> > * cv2 * cpu_max_freq
> > * cpu_nrg = ------------------------ * cpu_util (3)
> > * scale_cpu
> >
> > because the limited-freq's voltage is not equal to the max-freq's voltage.
>
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
>
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
>
> ps->power is fixed because of the limited frequency constraint
>
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
>
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode.
Well, with patch2/2, this patch is no longer needed.
But if we want to remove the restriction of feec() on rd->overutilized
later, this patch should be reconsidered.
> But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.
>
> >
> > >
> > >
> > >
> > > > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > > > the actual_cpu_capacity.
> > > >
> > > > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
> > > > Tested-by: Christian Loehle <christian.loehle@....com>
> > > > ---
> > > > V2:
> > > > - change commit message.
> > > > - remove the eenv->pd_cap capping in eenv_pd_busy_time(). (Dietmar)
> > > > - add Tested-by.
> > > > ---
> > > > kernel/sched/fair.c | 9 +++++----
> > > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 8a5b1ae0aa55..5ca6396ef0b7 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -7864,16 +7864,17 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> > > > struct cpumask *pd_cpus,
> > > > struct task_struct *p)
> > > > {
> > > > - unsigned long busy_time = 0;
> > > > int cpu;
> > > >
> > > > + eenv->pd_busy_time = 0;
> > > > +
> > > > for_each_cpu(cpu, pd_cpus) {
> > > > unsigned long util = cpu_util(cpu, p, -1, 0);
> > > >
> > > > - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > > > + util = effective_cpu_util(cpu, util, NULL, NULL);
> > > > + util = min(eenv->cpu_cap, util);
> > > > + eenv->pd_busy_time += util;
> > > > }
> > > > -
> > > > - eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> > > > }
> > > >
> > > > /*
> > > > --
> > > > 2.25.1
> > > >
> > > >
Powered by blists - more mailing lists