[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63c76af4-6451-4d6a-8aeb-0bc4812c4101@arm.com>
Date: Tue, 2 Jul 2024 11:23:58 +0100
From: Hongyan Xia <hongyan.xia2@....com>
To: Tejun Heo <tj@...nel.org>, rafael@...nel.org, viresh.kumar@...aro.org
Cc: linux-pm@...r.kernel.org, void@...ifault.com,
linux-kernel@...r.kernel.org, kernel-team@...a.com, mingo@...hat.com,
peterz@...radead.org, David Vernet <dvernet@...a.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>
Subject: Re: [PATCH 2/2] sched_ext: Add cpuperf support
On 19/06/2024 04:12, Tejun Heo wrote:
> sched_ext currently does not integrate with schedutil. When schedutil is the
> governor, frequencies are left unregulated and usually get stuck close to
> the highest performance level from running RT tasks.
>
> Add CPU performance monitoring and scaling support by integrating into
> schedutil. The following kfuncs are added:
>
> - scx_bpf_cpuperf_cap(): Query the relative performance capacity of
> different CPUs in the system.
>
> - scx_bpf_cpuperf_cur(): Query the current performance level of a CPU
> relative to its max performance.
>
> - scx_bpf_cpuperf_set(): Set the current target performance level of a CPU.
>
> This gives direct control over CPU performance setting to the BPF scheduler.
> The only changes on the schedutil side are accounting for the utilization
> factor from sched_ext and disabling frequency holding heuristics as it may
> not apply well to sched_ext schedulers which may have a lot weaker
> connection between tasks and their current / last CPU.
>
> With cpuperf support added, there is no reason to block uclamp. Enable while
> at it.
>
> A toy implementation of cpuperf is added to scx_qmap as a demonstration of
> the feature.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Reviewed-by: David Vernet <dvernet@...a.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Cc: Viresh Kumar <viresh.kumar@...aro.org>
> ---
> kernel/sched/cpufreq_schedutil.c | 12 +-
> kernel/sched/ext.c | 83 ++++++++++++-
> kernel/sched/ext.h | 9 ++
> kernel/sched/sched.h | 1 +
> tools/sched_ext/include/scx/common.bpf.h | 3 +
> tools/sched_ext/scx_qmap.bpf.c | 142 ++++++++++++++++++++++-
> tools/sched_ext/scx_qmap.c | 8 ++
> 7 files changed, 252 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 972b7dd65af2..12174c0137a5 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -197,7 +197,9 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual,
>
> static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
> {
> - unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu);
> + unsigned long min, max;
> + unsigned long util = cpu_util_cfs_boost(sg_cpu->cpu) +
> + scx_cpuperf_target(sg_cpu->cpu);
>
> util = effective_cpu_util(sg_cpu->cpu, util, &min, &max);
> util = max(util, boost);
> @@ -330,6 +332,14 @@ static bool sugov_hold_freq(struct sugov_cpu *sg_cpu)
> unsigned long idle_calls;
> bool ret;
>
> + /*
> + * The heuristics in this function is for the fair class. For SCX, the
> + * performance target comes directly from the BPF scheduler. Let's just
> + * follow it.
> + */
> + if (scx_switched_all())
> + return false;
> +
> /* if capped by uclamp_max, always update to be in compliance */
> if (uclamp_rq_is_capped(cpu_rq(sg_cpu->cpu)))
> return false;
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index f814e84ceeb3..04fb0eeee5ec 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -16,6 +16,8 @@ enum scx_consts {
> SCX_EXIT_BT_LEN = 64,
> SCX_EXIT_MSG_LEN = 1024,
> SCX_EXIT_DUMP_DFL_LEN = 32768,
> +
> + SCX_CPUPERF_ONE = SCHED_CAPACITY_SCALE,
> };
>
> enum scx_exit_kind {
> @@ -3520,7 +3522,7 @@ DEFINE_SCHED_CLASS(ext) = {
> .update_curr = update_curr_scx,
>
> #ifdef CONFIG_UCLAMP_TASK
> - .uclamp_enabled = 0,
> + .uclamp_enabled = 1,
> #endif
> };
>
Hi. I know this is a bit late, but the implication of this change here
can be quite interesting.
With this patch but without switching this knob from 0 to 1, this series
gives me the perfect opportunity to implement a custom uclamp within
sched_ext on top of the cpufreq support added. I think this would be
what some vendors looking at sched_ext would also want. But, if
.uclamp_enabled == 1, then the mainline uclamp implementation is in
effect regardless of what ext scheduler is loaded. In fact,
uclamp_{inc,dec}() are before calling the {enqueue,dequeue}_task() so
now there's no easy way to circumvent it.
What would be really nice is to have cpufreq support in sched_ext but
not force uclamp_enabled. But, I also think there will be people who are
happy with the current uclamp implementation and want to just reuse it.
The best thing is to let the loaded scheduler decide, somehow, which I
don't know if there's an easy way to do this yet.
> [...]
Powered by blists - more mailing lists