linux-kernel - Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAB8ipk86jmb6xnEUnv_Vs5=A5EnNfnHTy3FXYZRhKhuEFKhzDw@mail.gmail.com>
Date: Wed, 19 Jun 2024 11:05:08 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Qais Yousef <qyousef@...alina.io>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, Xuewen Yan <xuewen.yan@...soc.com>, mingo@...hat.com, 
	peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com, 
	vschneid@...hat.com, vincent.donnefort@....com, ke.wang@...soc.com, 
	linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On Tue, Jun 18, 2024 at 11:39 PM Qais Yousef <qyousef@...alina.io> wrote:
>
> On 06/18/24 17:23, Vincent Guittot wrote:
> > On Mon, 17 Jun 2024 at 12:53, Qais Yousef <qyousef@...alina.io> wrote:
> > >
> > > On 06/17/24 11:07, Vincent Guittot wrote:
> > >
> > > > > And should effective_cpu_util() return a value higher than
> > > > > get_actual_cpu_capacity()?
> > > >
> > > > I don't think we should because we want to return the effective
> > > > utilization not the actual compute capacity.
> > > > Having an utilization of the cpu or group of cpus above the actual
> > > > capacity or the original capacity mainly means that we will have to
> > > > run longer
> > > >
> > > > By capping the utilization we filter this information.
> > > >
> > > > capacity orig = 800
> > > > util_avg = 700
> > > >
> > > > if we cap the capacity to 400 the cpu is expected to run twice longer
> > > > for the same amount of work to be done
> > >
> > > Okay makes sense. Wouldn't the util be 'wrong' (to what degree will depend on
> > > min/max freq ratio) though?
> > >
> > > We cap with arch_scale_capacity() still, I guess we know at this stage it is
> > > 100% wrong if we allow returning higher values?
> >
> > I think that capping utilization to max capacity generates some energy
> > estimation error because it filters the fact that we run longer in
> > some cases.
>
> Yes, I think so too and that was my first statement. But I think this is
> a bigger change to do separately.

I saw the the sched_cpu_util() was used by teo.c and cpufreq_cooling.c
If we change the arch_scale_capacity() to actual_cpu_capacity(), it may cause
some errors?

For-example:
In teo:
212 static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data)
213 {
214         return sched_cpu_util(cpu) > cpu_data->util_threshold;
215 }
It may cause the teo_cpu_is_utilized() to return false forever if the
actual_cpu_capacity is smaller than util_threshold.
However, the util_threshold is frome actual_cpu_capacity.

In cpufreq_cooling.c:
May we should change:

diff --git a/drivers/thermal/cpufreq_cooling.c
b/drivers/thermal/cpufreq_cooling.c
index 280071be30b1..a8546d69cc10 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -164,7 +164,7 @@ static u32 get_load(struct cpufreq_cooling_device
*cpufreq_cdev, int cpu,
 {
        unsigned long util = sched_cpu_util(cpu);

-       return (util * 100) / arch_scale_cpu_capacity(cpu);
+       return (util * 100) / get_actual_cpu_capacity(cpu);
 }
 #else /* !CONFIG_SMP */
 static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,


Because if still use arch_scale_cpu_capacity(), the load pct may be decreased,
It may affect the thermal-IPA-governor's power consideration.

>
> I *think* we have another source of error, we take util/cpu_cap as a percentage
> of time the CPU is busy. We assume an implicit multiplication with a time
> period, T. I am not sure if this implicit assumption is accurate and things are
> aligned properly. Especially with how utilization loses the temporal info due
> to invariance. util can be low but actual runtime will be much longer. I'm not
> sure if this implicit multiplication is handling this properly. Beside due
> performance domains having shared CPUs, I am not sure this period is aligned
> across all CPUs for this implicit multiplication to work as intended.
>
> I yet to study this properly. But I thought I'll mention it as I think this
> (energy estimation) is increasingly becoming an important area to improve on.