[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241209061531.257531-1-changwoo@igalia.com>
Date: Mon, 9 Dec 2024 15:15:25 +0900
From: Changwoo Min <multics69@...il.com>
To: tj@...nel.org,
void@...ifault.com,
mingo@...hat.com,
peterz@...radead.org
Cc: changwoo@...lia.com,
kernel-dev@...lia.com,
linux-kernel@...r.kernel.org
Subject: [PATCH v4 0/6] sched_ext: Support high-performance monotonically non-decreasing clock
Many BPF schedulers (such as scx_central, scx_lavd, scx_rusty, scx_bpfland,
and scx_flash) frequently call bpf_ktime_get_ns() for tracking tasks' runtime
properties. If supported, bpf_ktime_get_ns() eventually reads a hardware
timestamp counter (TSC). However, reading a hardware TSC is not
performant in some hardware platforms, degrading IPC.
This patchset addresses the performance problem of reading hardware TSC
by leveraging the rq clock in the scheduler core, introducing a
scx_bpf_now_ns() function for BPF schedulers. Whenever the rq clock
is fresh and valid, scx_bpf_now_ns() provides the rq clock, which is
already updated by the scheduler core (update_rq_clock), so it can reduce
reading the hardware TSC.
When the rq lock is released (rq_unpin_lock), the rq clock is invalidated,
so a subsequent scx_bpf_now_ns() call gets the fresh sched_clock for the caller.
In addition, scx_bpf_now_ns() guarantees the clock is monotonically
non-decreasing for the same CPU, so the clock cannot go backward
in the same CPU.
Using scx_bpf_now_ns() reduces the number of reading hardware TSC
by 40-70% (65% for scx_lavd, 58% for scx_bpfland, and 43% for scx_rusty)
for the following benchmark:
perf bench -f simple sched messaging -t -g 20 -l 6000
The patchset begins by managing the status of rq clock in the scheduler
core, then implementing scx_bpf_now_ns(), and finally applying it to the
BPF schedulers.
ChangwLog v3 -> v4:
- Separate the code relocation related to scx_enabled() into a
separate patch.
- Remove scx_rq_clock_stale() after (or before) ops.running() and
ops.update_idle() calls
- Rename scx_bpf_clock_get_ns() into scx_bpf_now_ns() and revise it to
address the comments
- Move the per-CPU variable holding a prev clock into scx_rq
(rq->scx.prev_clock)
- Add a comment describing when the clock could go backward in
scx_bpf_now_ns()
- Rebase the code to the tip of Tejun's sched_ext repo (for-next
branch)
ChangeLog v2 -> v3:
- To avoid unnecessarily modifying cache lines, scx_rq_clock_update()
and scx_rq_clock_stale() update the clock and flags only when a
sched_ext scheduler is enabled.
ChangeLog v1 -> v2:
- Rename SCX_RQ_CLK_UPDATED to SCX_RQ_CLK_VALID to denote the validity
of an rq clock clearly.
- Rearrange the clock and flags fields in struct scx_rq to make sure
they are in the same cacheline to minimize the cache misses
- Add an additional explanation to the commit message in the 2/5 patch
describing when the rq clock will be reused with an example.
- Fix typos
- Rebase the code to the tip of Tejun's sched_ext repo
Changwoo Min (6):
sched_ext: Relocate scx_enabled() related code
sched_ext: Implement scx_rq_clock_update/stale()
sched_ext: Manage the validity of scx_rq_clock
sched_ext: Implement scx_bpf_now_ns()
sched_ext: Add scx_bpf_now_ns() for BPF scheduler
sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now_ns()
kernel/sched/core.c | 6 +-
kernel/sched/ext.c | 73 ++++++++++++++++++++++++
kernel/sched/sched.h | 52 ++++++++++++-----
tools/sched_ext/include/scx/common.bpf.h | 1 +
tools/sched_ext/include/scx/compat.bpf.h | 5 ++
tools/sched_ext/scx_central.bpf.c | 4 +-
tools/sched_ext/scx_flatcg.bpf.c | 2 +-
7 files changed, 124 insertions(+), 19 deletions(-)
--
2.47.1
Powered by blists - more mailing lists