linux-kernel - Re: [GIT PULL] Scheduler changes for v6.8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240114151250.5wfexq44o3mdm3nh@airbuntu>
Date: Sun, 14 Jan 2024 15:12:50 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Wyes Karny <wkarny@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Daniel Bristot de Oliveira <bristot@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [GIT PULL] Scheduler changes for v6.8

On 01/14/24 14:03, Vincent Guittot wrote:

> Thanks for the trace. It was really helpful and I think that I got the
> root cause.
> 
> The problem comes from get_capacity_ref_freq() which returns current
> freq when arch_scale_freq_invariant() is not enable, and the fact that
> we apply map_util_perf() earlier in the path now which is then capped
> by max capacity.
> 
> Could you try the below ?
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e420e2ee1a10..611c621543f4 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,7 @@ unsigned long get_capacity_ref_freq(struct
> cpufreq_policy *policy)
>         if (arch_scale_freq_invariant())
>                 return policy->cpuinfo.max_freq;
> 
> -       return policy->cur;
> +       return policy->cur + policy->cur >> 2;
>  }
> 
>  /**

Is this a test patch or a proper fix? I can't see it being the latter. It seems
the current logic fails when util is already 1024, and I think we're trying to
fix the invariance issue too late.

Is the problem that we can't read policy->cur in the scheduler to fix the util
while it's being updated that's why it's done here in this case?

If this is the problem, shouldn't the logic be if util is max then always go to
max frequency? I don't think we have enough info to correct the invariance here
IIUC. All we can see the system is saturated at this frequency and whether
a small jump or a big jump is required is hard to tell.

Something like this

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c097083e..473d0352030b 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -164,8 +164,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
        struct cpufreq_policy *policy = sg_policy->policy;
        unsigned int freq;

-       freq = get_capacity_ref_freq(policy);
-       freq = map_util_freq(util, freq, max);
+       if (util != max) {
+               freq = get_capacity_ref_freq(policy);
+               freq = map_util_freq(util, freq, max);
+       } else {
+               freq = policy->cpuinfo.max_freq;
+       }

        if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
                return sg_policy->next_freq;