[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFAyN4rfssKmbUE5@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 16 Jun 2025 17:03:19 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
jstultz <jstultz@...gle.com>, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org
Subject: Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation
when expiry time passed
Hello,
On 15/06/25 21:10, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@...iatek.com>
>
> [Symptom]
> The fair server mechanism, which is intended to prevent fair starvation
> when higher-priority tasks monopolize the CPU.
> Specifically, RT tasks on the runqueue may not be scheduled as expected.
>
> [Analysis]
> ---------
> The log "sched: DL replenish lagged too much" triggered.
>
> By memory dump of dl_server:
> --------------
> curr = 0xFFFFFF80D6A0AC00 (
> dl_server = 0xFFFFFF83CD5B1470(
> dl_runtime = 0x02FAF080,
> dl_deadline = 0x3B9ACA00,
> dl_period = 0x3B9ACA00,
> dl_bw = 0xCCCC,
> dl_density = 0xCCCC,
> runtime = 0x02FAF080,
> deadline = 0x0000082031EB0E80,
> flags = 0x0,
> dl_throttled = 0x0,
> dl_yielded = 0x0,
> dl_non_contending = 0x0,
> dl_overrun = 0x0,
> dl_server = 0x1,
> dl_server_active = 0x1,
> dl_defer = 0x1,
> dl_defer_armed = 0x0,
> dl_defer_running = 0x1,
> dl_timer = (
> node = (
> expires = 0x000008199756E700),
> _softexpires = 0x000008199756E700,
> function = 0xFFFFFFDB9AF44D30 = dl_task_timer,
> base = 0xFFFFFF83CD5A12C0,
> state = 0x0,
> is_rel = 0x0,
> is_soft = 0x0,
> clock_update_flags = 0x4,
> clock = 0x000008204A496900,
>
> - The timer expiration time (rq->curr->dl_server->dl_timer->expires)
> is already in the past, indicating the timer has expired.
> - The timer state (rq->curr->dl_server->dl_timer->state) is 0.
>
> [Suspected Root Cause]
> --------------------
> The relevant code flow in the throttle path of
> update_curr_dl_se() as follows:
>
> dequeue_dl_entity(dl_se, 0); // the DL entity is dequeued
>
> if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) {
> if (dl_server(dl_se)) // timer registration fails
> enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);//enqueue immediately
> ...
> }
>
> The failure of `start_dl_timer` is caused by attempting to register a
> timer with an expiration time that is already in the past. When this
> situation persists, the code repeatedly re-enqueues the DL entity
> without properly replenishing or restarting the timer, resulting in RT
> task may not be scheduled as expected.
>
> [Proposed Solution]:
> ------------------
> Instead of immediately re-enqueuing the DL entity on timer registration
> failure, this change ensures the DL entity is properly replenished and
> the timer is restarted, preventing RT potential starvation.
>
> Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
> ---
> kernel/sched/deadline.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ad45a8fea245..e50cb76c961b 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1556,10 +1556,12 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
> }
>
> if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) {
> - if (dl_server(dl_se))
> - enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
> - else
> + if (dl_server(dl_se)) {
> + replenish_dl_new_period(dl_se, rq);
> + start_dl_timer(dl_se);
But, even today, enqueue_dl_entity() is called with ENQUEUE_REPLENISH
flag, so I don't get why you say 're-enqueues the DL entity without
properly replenishing'.
Also, why restarting the replenishing timer right after having
replenished the entity?
Thanks,
Juri
Powered by blists - more mailing lists