lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 28 Jun 2024 02:39:57 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Xuewen Yan <xuewen.yan94@...il.com>, Xuewen Yan <xuewen.yan@...soc.com>,
	dietmar.eggemann@....com, mingo@...hat.com, peterz@...radead.org,
	juri.lelli@...hat.com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	christian.loehle@....com, vincent.donnefort@....com,
	ke.wang@...soc.com, di.shen@...soc.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V2 1/2] sched/fair: Prevent cpu_busy_time from exceeding
 actual_cpu_capacity

On 06/27/24 18:15, Vincent Guittot wrote:
> On Thu, 27 Jun 2024 at 04:02, Xuewen Yan <xuewen.yan94@...il.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 9:05 PM Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Mon, 24 Jun 2024 at 10:22, Xuewen Yan <xuewen.yan@...soc.com> wrote:
> > > >
> > > > Commit 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > > > changed the PD's util from per-CPU to per-PD capping. But because
> > > > the effective_cpu_util() would return a util which maybe bigger
> > > > than the actual_cpu_capacity, this could cause the pd_busy_time
> > > > calculation errors.
> > >
> > > I'm still not convinced that this is an error. Your example used for v1 is :
> > >
> > > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > > of cpufreq-limit, the cpu_actual_cap = 512.
> > >
> > > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 256;
> > > effective_cpu_util(7) = 0;
> > >
> > > so env->pd_busy_time = 2304
> > >
> > > Even if effective_cpu_util(4) = 1024; is above the current max compute
> > > capacity of 512, this also means that activity of cpu4 will run twice
> > > longer . If you cap effective_cpu_util(4) to 512 you miss the
> > > information that it will run twice longer at the selected OPP. The
> > > extreme case being:
> > > effective_cpu_util(4) = 1024;
> > > effective_cpu_util(5) = 1024;
> > > effective_cpu_util(6) = 1024;
> > > effective_cpu_util(7) = 1024;
> > >
> > > in this case env->pd_busy_time = 4096
> > >
> > > If we cap, we can't make any difference between the 2 cases
> > >
> > > Do you have more details about the problem you are facing ?
> >
> > Because of the cpufreq-limit, the opp was also limited, and when compute_energy:
> >
> > energy =  ps->cost * sum_util =  ps->cost * eenv->pd_busy_time;
> >
> > Because of the cpufreq-limit, the ps->cost is the limited-freq's opp's
> > cost instead of the max freq's cost.
> > So the energy is determined by pd_busy_time.
> >
> > Still the example above:
> >
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit, the cpu_actual_cap = 512.
> >
> > Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
> > effective_cpu_util(4) = 1024;
> > effective_cpu_util(5) = 1024;
> > effective_cpu_util(6) = 256;
> > effective_cpu_util(7) = 0;
> >
> > Before the patch:
> > env->pd_busy_time = min(1024+1024+256, eenv->pd_cap) = 2048.
> > However, because the effective_cpu_util(7) = 0, indeed, the 2048 is bigger than
> > the actual_cpu_cap.
> >
> > After the patch:
> > cpu_util(4) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(5) = min(1024, eenv->cpu_cap) = 512;
> > cpu_util(6) = min(256, eenv->cpu_cap) = 256;
> > cpu_util(7) = 0;
> > env->pd_busy_time = min(512+512+256, eenv->pd_cap) = 1280.
> >
> > As a result, without this patch, the energy is bigger than actual_energy.
> >
> > And even if cpu4 would run twice longer, the energy may not be equal.
> > Because:
> >  *             ps->power * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *               ps->freq * scale_cpu
> >
> > the ps->power = cfv2, and then:
> >
> > *                  cv2 * cpu_max_freq
> > *   cpu_nrg = ------------------------ * cpu_util           (3)
> > *                    scale_cpu
> >
> > because the limited-freq's voltage is not equal to the max-freq's voltage.
> 
> I'm still struggling to understand why it's wrong. If the frequency is
> capped, we will never go above this limited frequency and its
> associated voltage so there is no reason to consider max-freq's
> voltage. If there is more things to do than the actual capacity can do
> per unit of time then we will run more than 1 unit of time.
> 
> nrg of PD = /Sum(cpu) ps->power * cpu-running-time
> 
> ps->power is fixed because of the limited frequency constraint
> 
> we estimate cpu-running-time = utilization / ps->performance
> with
> - utilization = util_avg
> - performance = ps->freq / cpu_max_freq * arch_scale_cpu_capacity() =
> ps->performance
> 
> Up to now we were assuming that utilization was always lower than
> performance otherwise the system was overutilized andwe fallback in
> performance mode. But when the frequency of a cpu is limited by
> userspace or thermal mitigation, the utilization can become higher
> than the limited capacity which can be translated by cpu will run
> longer.

I might have contributed a bit to this confusion. So my apologies.

We certainly want to remove this limit and I thought to be consistent with
current code behavior it might be good to have this patch. But as Vincent
pointed out it actually moves us backward.

So I think this is no longer needed too. We want to actually make the pd
capping removed too - but I haven't looked at the detail if this can be done
without side effects. We need to be more accurate in estimating the runtime and
capping in general hides information about how long the cpu/pd will be busy
for.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ