linux-kernel - Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240618153931.ub5ezml3imd5mwu7@airbuntu>
Date: Tue, 18 Jun 2024 16:39:31 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan94@...il.com>, Xuewen Yan <xuewen.yan@...soc.com>,
	mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	vincent.donnefort@....com, ke.wang@...soc.com,
	linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding
 actual_cpu_capacity

On 06/18/24 17:23, Vincent Guittot wrote:
> On Mon, 17 Jun 2024 at 12:53, Qais Yousef <qyousef@...alina.io> wrote:
> >
> > On 06/17/24 11:07, Vincent Guittot wrote:
> >
> > > > And should effective_cpu_util() return a value higher than
> > > > get_actual_cpu_capacity()?
> > >
> > > I don't think we should because we want to return the effective
> > > utilization not the actual compute capacity.
> > > Having an utilization of the cpu or group of cpus above the actual
> > > capacity or the original capacity mainly means that we will have to
> > > run longer
> > >
> > > By capping the utilization we filter this information.
> > >
> > > capacity orig = 800
> > > util_avg = 700
> > >
> > > if we cap the capacity to 400 the cpu is expected to run twice longer
> > > for the same amount of work to be done
> >
> > Okay makes sense. Wouldn't the util be 'wrong' (to what degree will depend on
> > min/max freq ratio) though?
> >
> > We cap with arch_scale_capacity() still, I guess we know at this stage it is
> > 100% wrong if we allow returning higher values?
> 
> I think that capping utilization to max capacity generates some energy
> estimation error because it filters the fact that we run longer in
> some cases.

Yes, I think so too and that was my first statement. But I think this is
a bigger change to do separately.

I *think* we have another source of error, we take util/cpu_cap as a percentage
of time the CPU is busy. We assume an implicit multiplication with a time
period, T. I am not sure if this implicit assumption is accurate and things are
aligned properly. Especially with how utilization loses the temporal info due
to invariance. util can be low but actual runtime will be much longer. I'm not
sure if this implicit multiplication is handling this properly. Beside due
performance domains having shared CPUs, I am not sure this period is aligned
across all CPUs for this implicit multiplication to work as intended.

I yet to study this properly. But I thought I'll mention it as I think this
(energy estimation) is increasingly becoming an important area to improve on.