[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <a732cb65347843e8b5fdbe182363c5438ac0916f.1764648076.git.wen.yang@linux.dev>
Date: Tue, 2 Dec 2025 13:51:19 +0800
From: wen.yang@...ux.dev
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>
Cc: Wen Yang <wen.yang@...ux.dev>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH 2/2] sched/rt: add RT throttle statistics
From: Wen Yang <wen.yang@...ux.dev>
A priority inversion scenario can occur when a CFS task is starved
due to RT throttling. The scenario is as follows:
0. An rtmutex (e.g., softirq_ctrl.lock) is contended by both CFS
tasks (e.g., ksoftirqd) and RT tasks (e.g., ktimer).
1. An RT task 'A' (e.g., ktimer) acquired the rtmutex.
2. A CFS task 'B' (e.g., ksoftirqd) attempts to acquire the same
rtmutex and blocks.
3. A higher-priority RT task 'C' (e.g., stress-ng) runs for an
extended period, preempting task 'A' and causing the RT runqueue
to be throttled.
4. Once rt throttled, CFS task 'B' should run, but it remains blocked
because the lock is still held by the non-running RT task 'A'. This
can even lead to the CPU going idle.
5. When the rt throttle period ends, the high-priority RT task 'C'
resumes execution, and the cycle repeats, leading to indefinite
starvation of CFS task 'B'.
A typical stack trace for the blocked ksoftirqd shows it in a 'D'
(TASK_RTLOCK_WAIT) state, waiting on the lock:
ksoftirqd/5-61 [005] d...211 58212.064160: sched_switch: prev_comm=ksoftirqd/5 prev_pid=61 prev_prio=120 prev_state=D ==> next_comm=swapper/5 next_pid=0 next_prio=120
ksoftirqd/5-61 [005] d...211 58212.064161: <stack trace>
=> __schedule
=> schedule_rtlock
=> rtlock_slowlock_locked
=> rt_spin_lock
=> __local_bh_disable_ip
=> run_ksoftirqd
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
This patch adds throttle_count to rt_rq, incremented on each throttling event
and displayed in print_rt_rq for /proc/sched_debug.
Thus user-space tools (e.g. stalld) can monitor throttle_comunt to detect
the huge CPU consumption by RT processes and find tasks in the
'TASK_RTLOCK_WAIT' state to handle priority inversion.
Signed-off-by: Wen Yang <wen.yang@...ux.dev>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: linux-kernel@...r.kernel.org
---
kernel/sched/debug.c | 1 +
kernel/sched/rt.c | 1 +
kernel/sched/sched.h | 1 +
3 files changed, 3 insertions(+)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 41caa22e0680..8ed33c74e5a5 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -894,6 +894,7 @@ void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
P(rt_throttled);
PN(rt_time);
PN(rt_runtime);
+ PU(throttle_count);
#endif
#undef PN
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f1867fe8e5c5..88c659285c70 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -884,6 +884,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
*/
if (likely(rt_b->rt_runtime)) {
rt_rq->rt_throttled = 1;
+ rt_rq->throttle_count++;
printk_deferred_once("sched: RT throttling activated\n");
} else {
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bbf513b3e76c..88119540e4d4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -840,6 +840,7 @@ struct rt_rq {
int rt_throttled;
u64 rt_time; /* consumed RT time, goes up in update_curr_rt */
u64 rt_runtime; /* allotted RT time, "slice" from rt_bandwidth, RT sharing/balancing */
+ u64 throttle_count;
/* Nests inside the rq lock: */
raw_spinlock_t rt_runtime_lock;
--
2.25.1
Powered by blists - more mailing lists