lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 15 Jan 2024 11:55:04 +0530
From: Wyes Karny <wkarny@...il.com>
To: Qais Yousef <qyousef@...alina.io>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Ingo Molnar <mingo@...nel.org>, 
	linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Juri Lelli <juri.lelli@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8

Hi Qais,

On Mon, Jan 15, 2024 at 5:07 AM Qais Yousef <qyousef@...alina.io> wrote:
>
> On 01/14/24 19:58, Qais Yousef wrote:
>
> > > This is not correct because you will have to wait to reach full
> > > utilization at the current OPP possibly the lowest OPP before moving
> > > directly to max OPP
> >
> > Isn't this already the case? The ratio (util+headroom/max) will be less than
> > 1 until util is 80% (with 25% headroom). And for all values <= 80% * max, we
> > will request a frequency smaller than/equal policy->cur, no?
> >
> > ie:
> >
> >       util = 600
> >       max = 1024
> >
> >       freq = 1.25 * 600 * policy->cur / 1024 = 0.73 * policy->cur
> >
> > (util+headroom/max) must be greater than 1 for us to start going above
> > policy->cur - which seems to have been working by accident IIUC.
> >
> > So yes my proposal is incorrect, but it seems the conversion is not right to me
> > now.
> >
> > I could reproduce the problem now (thanks Wyes!). I have 3 freqs on my system
> >
> > 2.2GHz, 2.8GHz and 3.8GHz
> >
> > which (I believe) translates into capacities
> >
> > ~592, ~754, 1024
> >
> > which means we should pick 2.8GHz as soon as util * 1.25 > 592; which
> > translates into util = ~473.
> >
> > But what I see is that we go to 2.8GHz when we jump from 650 to 680 (see
> > attached picture), which is what you'd expect since we apply two headrooms now,
> > which means the ratio (util+headroom/max) will be greater than 1 after go above
> > this value
> >
> >       1024 * 0.8 * 0.8 = ~655
> >
> > So I think the math makes sense logically, but we're missing some other
> > correction factor.
> >
> > When I re-enable CPPC I see for the same test that we go into 3.8GHz straight
> > away. My test is simple busyloop via
> >
> >       cat /dev/zero > /dev/null
> >
> > I see the CPU util_avg is at 523 at fork. I expected us to run to 2.8GHz here
> > to be honest, but I am not sure if util_cfs_boost() and util_est() are maybe
> > causing us to be slightly above 523 and that's why we start with max freq.
> >
> > Or I've done the math wrong :-) But the two don't behave the same for the same
> > kernel with and without CPPC.
>
> I think the relationship should be:
>
>         freq = util * f_curr / cap_curr

I guess to know the curr_cap correctly we need to know the max_freq,
which is not available when CPPC is disabled.

>
> (patch below)
>
> with that I see (almost) the expected behavior (picture attached). We go to
> 2.8GHz when we are above 500. But the move to 3.8GHz is a bit earlier at 581
> (instead of 754 * 0.8 = 603). Not sure why. With 25% headroom 581 is 726. So
> it's a tad too early.
>
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..155f96a44fa0 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -123,7 +123,8 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy)
>   * Return: the reference CPU frequency to compute a capacity.
>   */
>  static __always_inline
> -unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> +unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy,
> +                                   unsigned long *max)
>  {
>         unsigned int freq = arch_scale_freq_ref(policy->cpu);
>
> @@ -133,6 +134,9 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
>         if (arch_scale_freq_invariant())
>                 return policy->cpuinfo.max_freq;
>
> +       if (max)
> +               *max = policy->cur * (*max) / policy->cpuinfo.max_freq;

But when freq_invaiant is disabled we don't have policy->cpuinfo.max_freq.

Thanks,
Wyes
> +
>         return policy->cur;
>  }
>
> @@ -164,7 +168,7 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         struct cpufreq_policy *policy = sg_policy->policy;
>         unsigned int freq;
>
> -       freq = get_capacity_ref_freq(policy);
> +       freq = get_capacity_ref_freq(policy, &max);
>         freq = map_util_freq(util, freq, max);
>
>         if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)



-- 
Thanks & Regards
Wyes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ