linux-kernel - Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240619180102.ehh5ogh6n26vofun@airbuntu>
Date: Wed, 19 Jun 2024 19:01:02 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan94@...il.com>, Xuewen Yan <xuewen.yan@...soc.com>,
	mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	vincent.donnefort@....com, ke.wang@...soc.com,
	linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding
 actual_cpu_capacity

On 06/18/24 23:05, Vincent Guittot wrote:
> On Tue, 18 Jun 2024 at 17:39, Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 06/18/24 17:23, Vincent Guittot wrote:
> > > On Mon, 17 Jun 2024 at 12:53, Qais Yousef <qyousef@...alina.io> wrote:
> > > >
> > > > On 06/17/24 11:07, Vincent Guittot wrote:
> > > >
> > > > > > And should effective_cpu_util() return a value higher than
> > > > > > get_actual_cpu_capacity()?
> > > > >
> > > > > I don't think we should because we want to return the effective
> > > > > utilization not the actual compute capacity.
> > > > > Having an utilization of the cpu or group of cpus above the actual
> > > > > capacity or the original capacity mainly means that we will have to
> > > > > run longer
> > > > >
> > > > > By capping the utilization we filter this information.
> > > > >
> > > > > capacity orig = 800
> > > > > util_avg = 700
> > > > >
> > > > > if we cap the capacity to 400 the cpu is expected to run twice longer
> > > > > for the same amount of work to be done
> > > >
> > > > Okay makes sense. Wouldn't the util be 'wrong' (to what degree will depend on
> > > > min/max freq ratio) though?
> > > >
> > > > We cap with arch_scale_capacity() still, I guess we know at this stage it is
> > > > 100% wrong if we allow returning higher values?
> > >
> > > I think that capping utilization to max capacity generates some energy
> > > estimation error because it filters the fact that we run longer in
> > > some cases.
> >
> > Yes, I think so too and that was my first statement. But I think this is
> > a bigger change to do separately.
> >
> > I *think* we have another source of error, we take util/cpu_cap as a percentage
> > of time the CPU is busy. We assume an implicit multiplication with a time
> > period, T. I am not sure if this implicit assumption is accurate and things are
> > aligned properly. Especially with how utilization loses the temporal info due
> > to invariance. util can be low but actual runtime will be much longer. I'm not
> 
> I'm not sure to get what you mean by " how utilization loses the
> temporal info due to invariance"

The utilization value itself doesn't tell us about the length of runtime of the
task. But its compute capacity.

> 
> Utilization aims to estimate the number of instructions to execute
> whatever the CPU of the system, which once divided by the compute

Yes for the number of instructions.

And yes, the *ratio* can potentially be a proxy for *percentage* of time we are
running. But we have no idea about absolute runtime.

AFAIU, there's an assumption that this percentage of running time is multiplied
by 'unidentified' period value to get a proxy of time the perf domain will run
for. This time then multiplied by power we get the energy.

I am just not sure if we're losing informations with all of these
transformations. I need to investigate more.

And we assume a periodic time interval for which this percentage of busy time
we say the CPU will be busy for.

I am not sure if at every wake up this period needs to be aligned.

I think this will matter the most for calculating the base_energy.

I am not sure if this makes sense :).

I need to study the details more anyway and collect some data. But my worry is
generally whether our approximate of runtime is good enough and how to improve
it.

> capacity of the OPP of a CPU will estimate how long it will take to do
> the job. So if the capa of an OPP of a CPU is low, it will reflect
> that the actual runtime will be much longer.  A low utilization means
> that you don't have much instruction to execute but not the speed at
> which you will execute them.

Yes. But I am worried about actual absolute time is being approximated
good enough or not.

> 
> Then, problems start when we cap utilization to the CPU capacity as an
> example because we cap this temporal info.

Yes. We agree on the existence of this problem.

> 
> > sure if this implicit multiplication is handling this properly. Beside due
> > performance domains having shared CPUs, I am not sure this period is aligned
> > across all CPUs for this implicit multiplication to work as intended.
> 
> It's all about average because it's too expensive if not even possible
> to know when the instruction will be executed on the other CPUs. We
> can only take the edge case (currently the worst case)

Yes..

> 
> Beside the impact of uclamp making the selected OPP not always
> sustainable but sometimes temporary
> 
> >
> > I yet to study this properly. But I thought I'll mention it as I think this
> > (energy estimation) is increasingly becoming an important area to improve on.