linux-kernel - Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAB8ipk-ejDKQTr8nAmK9MkhL2Ra=0J==p3Q+U-4K18G6MeJhQw@mail.gmail.com>
Date: Mon, 17 Jun 2024 15:27:57 +0800
From: Xuewen Yan <xuewen.yan94@...il.com>
To: Qais Yousef <qyousef@...alina.io>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, vincent.guittot@...aro.org, mingo@...hat.com, 
	peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com, 
	vschneid@...hat.com, vincent.donnefort@....com, ke.wang@...soc.com, 
	linux-kernel@...r.kernel.org, christian.loehle@....com
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

Hi Qais,

On Mon, Jun 17, 2024 at 6:20 AM Qais Yousef <qyousef@...alina.io> wrote:
>
> On 06/12/24 16:11, Xuewen Yan wrote:
> > Hi Qais
> >
> > On Mon, Jun 10, 2024 at 6:55 AM Qais Yousef <qyousef@...alina.io> wrote:
> > >
> > > On 06/06/24 15:06, Xuewen Yan wrote:
> > > > Because the effective_cpu_util() would return a util which
> > > > maybe bigger than the actual_cpu_capacity, this could cause
> > > > the pd_busy_time calculation errors.
> > > > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > > > the actual_cpu_capacity.
> > >
> > > I actually think capping by pd_cap is something we should remove. Saturated
> > > systems aren't calculated properly especially when uclamp_max is used.
> > >
> > > But this might a bigger change and out of scope of what you're proposing..
> >
> > I agree, there are other things to consider before doing this.
> >
> > >
> > > Did this 'wrong' calculation cause an actual problem for task placement?
> > > I assume the pd looked 'busier' because some CPUs were too busy.
> >
> > This will not only affect calculations in scenarios with high temperatures.
> > Sometimes, users will set scalimg_max_freq to actively limit the CPU frequency,
> > so that even if the CPU load is large, the CPU frequency will not be very high.
> > At this time, even if tasks are placed on other CPUs in the same PD,
> > the energy increment may not be too large, thus affecting core selection.
> >
> > >
> > > Was the system in overutilzied state? Usually if one CPU is that busy
> > > overutilized should be set and we'd skip EAS - unless uclamp_max was used.
> >
> > As Christian said, This case occurs not only in the overutil scenario,
> > this scenario holds true as long as the actual-cpu-capacity caused by
> > the reduction in max cpu frequency is smaller than the cpu util.
>
> Hmm. Sorry I might be thick here, but shouldn't fits_capacity() use
> capacity_of() which should return capacity based on get_actual_cpu_capacity()
> to compare if we are overutilized? If we are higher than this value that you
> need to cap, then the CPU must be overutilized and we shouldn't be in feec() in
> the first place, no? Unless the rq is capped with uclamp_max that is.

Sorry, I miss the "fits_capacity() use capacity_of()", and without
uclamp_max, the rd is over-utilized,
and would not use feec().
But I notice the uclamp_max, if the rq's uclamp_max is smaller than
SCHED_CAPACITY_SCALE,
and is bigger than actual_cpu_capacity, the util_fits_cpu() would
return true, and the rd is not over-utilized.
Is this setting intentional?

>
> I generally think our current definition of overutilized is wrong and the use
> case you're talking about should hold true if it's just a single CPU that is
> overutilized. But I can't see how you end up in feec() if the util is higher
> than the CPU in our current code base.
>
> What did I miss?
>
> And should effective_cpu_util() return a value higher than
> get_actual_cpu_capacity()?

I also thought about changing this at first, but because this function
is called in many places,
I am not 100% sure that changing it will not have any unexpected consequences,
so I only changed feec():)


cheers
---
xuewen

>
> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
> index 8b1194c39161..91acc0f92ae4 100644
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -286,7 +286,7 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>         unsigned long util, irq, scale;
>         struct rq *rq = cpu_rq(cpu);
>
> -       scale = arch_scale_cpu_capacity(cpu);
> +       scale = get_actual_cpu_capacity(cpu);
>
>         /*
>          * Early check to see if IRQ/steal time saturates the CPU, can be