[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250615131129.954975-1-kuyo.chang@mediatek.com>
Date: Sun, 15 Jun 2025 21:10:56 +0800
From: Kuyo Chang <kuyo.chang@...iatek.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, "Mel
Gorman" <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
"Matthias Brugger" <matthias.bgg@...il.com>, AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>
CC: jstultz <jstultz@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>,
<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<linux-mediatek@...ts.infradead.org>
Subject: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation when expiry time passed
From: kuyo chang <kuyo.chang@...iatek.com>
[Symptom]
The fair server mechanism, which is intended to prevent fair starvation
when higher-priority tasks monopolize the CPU.
Specifically, RT tasks on the runqueue may not be scheduled as expected.
[Analysis]
---------
The log "sched: DL replenish lagged too much" triggered.
By memory dump of dl_server:
--------------
curr = 0xFFFFFF80D6A0AC00 (
dl_server = 0xFFFFFF83CD5B1470(
dl_runtime = 0x02FAF080,
dl_deadline = 0x3B9ACA00,
dl_period = 0x3B9ACA00,
dl_bw = 0xCCCC,
dl_density = 0xCCCC,
runtime = 0x02FAF080,
deadline = 0x0000082031EB0E80,
flags = 0x0,
dl_throttled = 0x0,
dl_yielded = 0x0,
dl_non_contending = 0x0,
dl_overrun = 0x0,
dl_server = 0x1,
dl_server_active = 0x1,
dl_defer = 0x1,
dl_defer_armed = 0x0,
dl_defer_running = 0x1,
dl_timer = (
node = (
expires = 0x000008199756E700),
_softexpires = 0x000008199756E700,
function = 0xFFFFFFDB9AF44D30 = dl_task_timer,
base = 0xFFFFFF83CD5A12C0,
state = 0x0,
is_rel = 0x0,
is_soft = 0x0,
clock_update_flags = 0x4,
clock = 0x000008204A496900,
- The timer expiration time (rq->curr->dl_server->dl_timer->expires)
is already in the past, indicating the timer has expired.
- The timer state (rq->curr->dl_server->dl_timer->state) is 0.
[Suspected Root Cause]
--------------------
The relevant code flow in the throttle path of
update_curr_dl_se() as follows:
dequeue_dl_entity(dl_se, 0); // the DL entity is dequeued
if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) {
if (dl_server(dl_se)) // timer registration fails
enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);//enqueue immediately
...
}
The failure of `start_dl_timer` is caused by attempting to register a
timer with an expiration time that is already in the past. When this
situation persists, the code repeatedly re-enqueues the DL entity
without properly replenishing or restarting the timer, resulting in RT
task may not be scheduled as expected.
[Proposed Solution]:
------------------
Instead of immediately re-enqueuing the DL entity on timer registration
failure, this change ensures the DL entity is properly replenished and
the timer is restarted, preventing RT potential starvation.
Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
---
kernel/sched/deadline.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245..e50cb76c961b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1556,10 +1556,12 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
}
if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) {
- if (dl_server(dl_se))
- enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
- else
+ if (dl_server(dl_se)) {
+ replenish_dl_new_period(dl_se, rq);
+ start_dl_timer(dl_se);
+ } else {
enqueue_task_dl(rq, dl_task_of(dl_se), ENQUEUE_REPLENISH);
+ }
}
if (!is_leftmost(dl_se, &rq->dl))
--
2.45.2
Powered by blists - more mailing lists