[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240620113756.hivzk7sj4uj4sm6j@airbuntu>
Date: Thu, 20 Jun 2024 12:37:56 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan94@...il.com>, Xuewen Yan <xuewen.yan@...soc.com>,
mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
vincent.donnefort@....com, ke.wang@...soc.com,
linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding
actual_cpu_capacity
On 06/20/24 09:45, Vincent Guittot wrote:
> On Wed, 19 Jun 2024 at 20:10, Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 06/19/24 11:05, Xuewen Yan wrote:
> > > On Tue, Jun 18, 2024 at 11:39 PM Qais Yousef <qyousef@...alina.io> wrote:
> > > >
> > > > On 06/18/24 17:23, Vincent Guittot wrote:
> > > > > On Mon, 17 Jun 2024 at 12:53, Qais Yousef <qyousef@...alina.io> wrote:
> > > > > >
> > > > > > On 06/17/24 11:07, Vincent Guittot wrote:
> > > > > >
> > > > > > > > And should effective_cpu_util() return a value higher than
> > > > > > > > get_actual_cpu_capacity()?
> > > > > > >
> > > > > > > I don't think we should because we want to return the effective
> > > > > > > utilization not the actual compute capacity.
> > > > > > > Having an utilization of the cpu or group of cpus above the actual
> > > > > > > capacity or the original capacity mainly means that we will have to
> > > > > > > run longer
> > > > > > >
> > > > > > > By capping the utilization we filter this information.
> > > > > > >
> > > > > > > capacity orig = 800
> > > > > > > util_avg = 700
> > > > > > >
> > > > > > > if we cap the capacity to 400 the cpu is expected to run twice longer
> > > > > > > for the same amount of work to be done
> > > > > >
> > > > > > Okay makes sense. Wouldn't the util be 'wrong' (to what degree will depend on
> > > > > > min/max freq ratio) though?
> > > > > >
> > > > > > We cap with arch_scale_capacity() still, I guess we know at this stage it is
> > > > > > 100% wrong if we allow returning higher values?
> > > > >
> > > > > I think that capping utilization to max capacity generates some energy
> > > > > estimation error because it filters the fact that we run longer in
> > > > > some cases.
> > > >
> > > > Yes, I think so too and that was my first statement. But I think this is
> > > > a bigger change to do separately.
> > >
> > > I saw the the sched_cpu_util() was used by teo.c and cpufreq_cooling.c
> > > If we change the arch_scale_capacity() to actual_cpu_capacity(), it may cause
> > > some errors?
> >
> > The plan to revert this now.
> >
> > >
> > > For-example:
> > > In teo:
> > > 212 static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data)
> > > 213 {
> > > 214 return sched_cpu_util(cpu) > cpu_data->util_threshold;
> > > 215 }
> > > It may cause the teo_cpu_is_utilized() to return false forever if the
> > > actual_cpu_capacity is smaller than util_threshold.
> > > However, the util_threshold is frome actual_cpu_capacity.
> > >
> > > In cpufreq_cooling.c:
> > > May we should change:
> > >
> > > diff --git a/drivers/thermal/cpufreq_cooling.c
> > > b/drivers/thermal/cpufreq_cooling.c
> > > index 280071be30b1..a8546d69cc10 100644
> > > --- a/drivers/thermal/cpufreq_cooling.c
> > > +++ b/drivers/thermal/cpufreq_cooling.c
> > > @@ -164,7 +164,7 @@ static u32 get_load(struct cpufreq_cooling_device
> > > *cpufreq_cdev, int cpu,
> > > {
> > > unsigned long util = sched_cpu_util(cpu);
> > >
> > > - return (util * 100) / arch_scale_cpu_capacity(cpu);
> > > + return (util * 100) / get_actual_cpu_capacity(cpu);
> > > }
> > > #else /* !CONFIG_SMP */
> > > static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
> > >
> > >
> > > Because if still use arch_scale_cpu_capacity(), the load pct may be decreased,
> > > It may affect the thermal-IPA-governor's power consideration.
> >
> > I am not sure about this one. But looks plausible. Vincent?
>
> I don't see why we should change them ? We don't want to change
> sched_cpu_util() as well
> AFAICT, the only outcome of this thread is that we should use
> get_actual_cpu_capacity() instead of arch_scale_cpu_capacity() in
> util_fits_cpu(). capping the utilization only make the estimation
> worse
Yes my bad. Only util_fits_cpu() is needed now
Powered by blists - more mailing lists