[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBikWsyPon6HweEZg5qjSP+QX=WZDQu4NHs7PUcSCqDDA@mail.gmail.com>
Date: Mon, 17 Jun 2024 11:07:03 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: Xuewen Yan <xuewen.yan94@...il.com>, Xuewen Yan <xuewen.yan@...soc.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
vschneid@...hat.com, vincent.donnefort@....com, ke.wang@...soc.com,
linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity
On Mon, 17 Jun 2024 at 00:20, Qais Yousef <qyousef@...alina.io> wrote:
>
> On 06/12/24 16:11, Xuewen Yan wrote:
> > Hi Qais
> >
> > On Mon, Jun 10, 2024 at 6:55 AM Qais Yousef <qyousef@...alina.io> wrote:
> > >
> > > On 06/06/24 15:06, Xuewen Yan wrote:
> > > > Because the effective_cpu_util() would return a util which
> > > > maybe bigger than the actual_cpu_capacity, this could cause
> > > > the pd_busy_time calculation errors.
> > > > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > > > the actual_cpu_capacity.
> > >
> > > I actually think capping by pd_cap is something we should remove. Saturated
> > > systems aren't calculated properly especially when uclamp_max is used.
> > >
> > > But this might a bigger change and out of scope of what you're proposing..
> >
> > I agree, there are other things to consider before doing this.
> >
> > >
> > > Did this 'wrong' calculation cause an actual problem for task placement?
> > > I assume the pd looked 'busier' because some CPUs were too busy.
> >
> > This will not only affect calculations in scenarios with high temperatures.
> > Sometimes, users will set scalimg_max_freq to actively limit the CPU frequency,
> > so that even if the CPU load is large, the CPU frequency will not be very high.
> > At this time, even if tasks are placed on other CPUs in the same PD,
> > the energy increment may not be too large, thus affecting core selection.
> >
> > >
> > > Was the system in overutilzied state? Usually if one CPU is that busy
> > > overutilized should be set and we'd skip EAS - unless uclamp_max was used.
> >
> > As Christian said, This case occurs not only in the overutil scenario,
> > this scenario holds true as long as the actual-cpu-capacity caused by
> > the reduction in max cpu frequency is smaller than the cpu util.
>
> Hmm. Sorry I might be thick here, but shouldn't fits_capacity() use
> capacity_of() which should return capacity based on get_actual_cpu_capacity()
> to compare if we are overutilized? If we are higher than this value that you
> need to cap, then the CPU must be overutilized and we shouldn't be in feec() in
> the first place, no? Unless the rq is capped with uclamp_max that is.
>
> I generally think our current definition of overutilized is wrong and the use
> case you're talking about should hold true if it's just a single CPU that is
> overutilized. But I can't see how you end up in feec() if the util is higher
> than the CPU in our current code base.
>
> What did I miss?
>
> And should effective_cpu_util() return a value higher than
> get_actual_cpu_capacity()?
I don't think we should because we want to return the effective
utilization not the actual compute capacity.
Having an utilization of the cpu or group of cpus above the actual
capacity or the original capacity mainly means that we will have to
run longer
By capping the utilization we filter this information.
capacity orig = 800
util_avg = 700
if we cap the capacity to 400 the cpu is expected to run twice longer
for the same amount of work to be done
>
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 8b1194c39161..91acc0f92ae4 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -286,7 +286,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> unsigned long util, irq, scale;
> struct rq *rq = cpu_rq(cpu);
>
> - scale = arch_scale_cpu_capacity(cpu);
> + scale = get_actual_cpu_capacity(cpu);
>
> /*
> * Early check to see if IRQ/steal time saturates the CPU, can be
Powered by blists - more mailing lists