[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250307200502.253867-1-arighi@nvidia.com>
Date: Fri, 7 Mar 2025 21:01:02 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>
Cc: bpf@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCHSET v2 sched_ext/for-6.15] sched_ext: Enhance built-in idle selection with allowed CPUs
Many scx schedulers define their own concept of scheduling domains to
represent topology characteristics, such as heterogeneous architectures
(e.g., big.LITTLE, P-cores/E-cores), or to categorize tasks based on
specific properties (e.g., setting the soft-affinity of certain tasks to a
subset of CPUs).
Currently, there is no mechanism to share these domains with the built-in
idle CPU selection policy. As a result, schedulers often implement their
own idle CPU selection policies, which are typically similar to one
another, leading to a lot of code duplication.
To address this, extend the built-in idle CPU selection policy introducing
the concept of allowed CPUs.
With this concept, BPF schedulers can apply the built-in idle CPU selection
policy to a subset of allowed CPUs, allowing them to implement their own
scheduling domains while still using the topology optimizations of the
built-in policy, preventing code duplication across different schedulers.
To implement this introduce a new helper kfunc scx_bpf_select_cpu_and()
that accepts a cpumask of allowed CPUs:
s32 scx_bpf_select_cpu_and(struct task_struct *p,
const struct cpumask *cpus_allowed,
s32 prev_cpu, u64 wake_flags, u64 flags);
Example usage
=============
s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
s32 prev_cpu, u64 wake_flags)
{
const struct cpumask *dom = task_domain(p) ?: p->cpus_ptr;
s32 cpu;
/*
* Pick an idle CPU in the task's domain.
*/
cpu = scx_bpf_select_cpu_and(p, dom, prev_cpu, wake_flags, 0);
if (cpu >= 0) {
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
return cpu;
}
return prev_cpu;
}
Results
=======
Load distribution on a 4 sockets / 4 cores per socket system, simulated
using virtme-ng, running a modified version of scx_bpfland that uses the
new helper scx_bpf_select_cpu_and() and 0xff00 as allowed domain:
$ vng --cpu 16,sockets=4,cores=4,threads=1
...
$ stress-ng -c 16
...
$ htop
...
0[ 0.0%] 8[||||||||||||||||||||||||100.0%]
1[ 0.0%] 9[||||||||||||||||||||||||100.0%]
2[ 0.0%] 10[||||||||||||||||||||||||100.0%]
3[ 0.0%] 11[||||||||||||||||||||||||100.0%]
4[ 0.0%] 12[||||||||||||||||||||||||100.0%]
5[ 0.0%] 13[||||||||||||||||||||||||100.0%]
6[ 0.0%] 14[||||||||||||||||||||||||100.0%]
7[ 0.0%] 15[||||||||||||||||||||||||100.0%]
With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across all
the available CPUs.
ChangeLog v1 -> v2:
- rename scx_bpf_select_cpu_pref() to scx_bpf_select_cpu_and() and always
select idle CPUs strictly within the allowed domain
- rename preferred CPUs -> allowed CPU
- drop %SCX_PICK_IDLE_IN_PREF (not required anymore)
- deprecate scx_bpf_select_cpu_dfl() in favor of scx_bpf_select_cpu_and()
and provide all the required backward compatibility boilerplate
Andrea Righi (6):
sched_ext: idle: Honor idle flags in the built-in idle selection policy
sched_ext: idle: Refactor scx_select_cpu_dfl()
sched_ext: idle: Introduce the concept of allowed CPUs
sched_ext: idle: Introduce scx_bpf_select_cpu_and()
selftests/sched_ext: Add test for scx_bpf_select_cpu_and()
sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl()
Documentation/scheduler/sched-ext.rst | 11 +-
kernel/sched/ext.c | 13 +-
kernel/sched/ext_idle.c | 243 +++++++++++++++------
kernel/sched/ext_idle.h | 3 +-
tools/sched_ext/include/scx/common.bpf.h | 5 +-
tools/sched_ext/include/scx/compat.bpf.h | 37 ++++
tools/sched_ext/scx_flatcg.bpf.c | 12 +-
tools/sched_ext/scx_simple.bpf.c | 9 +-
tools/testing/selftests/sched_ext/Makefile | 1 +
.../testing/selftests/sched_ext/allowed_cpus.bpf.c | 91 ++++++++
tools/testing/selftests/sched_ext/allowed_cpus.c | 57 +++++
.../selftests/sched_ext/enq_select_cpu_fails.bpf.c | 12 +-
.../selftests/sched_ext/enq_select_cpu_fails.c | 2 +-
tools/testing/selftests/sched_ext/exit.bpf.c | 6 +-
.../sched_ext/select_cpu_dfl_nodispatch.bpf.c | 13 +-
.../sched_ext/select_cpu_dfl_nodispatch.c | 2 +-
16 files changed, 405 insertions(+), 112 deletions(-)
create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.c
Powered by blists - more mailing lists