[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2990c9d8-c957-4443-a2c5-4b62e48dc818@arm.com>
Date: Thu, 31 Jul 2025 16:00:09 +0100
From: Christian Loehle <christian.loehle@....com>
To: Geert Uytterhoeven <geert@...ux-m68k.org>,
Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
jstultz <jstultz@...gle.com>, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org
Subject: Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation
when expiry time passed
On 7/30/25 11:06, Geert Uytterhoeven wrote:
> Hi Kuyo,
>
> On Mon, 16 Jun 2025 at 14:39, Kuyo Chang <kuyo.chang@...iatek.com> wrote:
>> From: kuyo chang <kuyo.chang@...iatek.com>
>>
>> [Symptom]
>> The fair server mechanism, which is intended to prevent fair starvation
>> when higher-priority tasks monopolize the CPU.
>> Specifically, RT tasks on the runqueue may not be scheduled as expected.
>>
>> [Analysis]
>> ---------
>> The log "sched: DL replenish lagged too much" triggered.
>>
>> By memory dump of dl_server:
>> --------------
>> curr = 0xFFFFFF80D6A0AC00 (
>> dl_server = 0xFFFFFF83CD5B1470(
>> dl_runtime = 0x02FAF080,
>> dl_deadline = 0x3B9ACA00,
>> dl_period = 0x3B9ACA00,
>> dl_bw = 0xCCCC,
>> dl_density = 0xCCCC,
>> runtime = 0x02FAF080,
>> deadline = 0x0000082031EB0E80,
>> flags = 0x0,
>> dl_throttled = 0x0,
>> dl_yielded = 0x0,
>> dl_non_contending = 0x0,
>> dl_overrun = 0x0,
>> dl_server = 0x1,
>> dl_server_active = 0x1,
>> dl_defer = 0x1,
>> dl_defer_armed = 0x0,
>> dl_defer_running = 0x1,
>> dl_timer = (
>> node = (
>> expires = 0x000008199756E700),
>> _softexpires = 0x000008199756E700,
>> function = 0xFFFFFFDB9AF44D30 = dl_task_timer,
>> base = 0xFFFFFF83CD5A12C0,
>> state = 0x0,
>> is_rel = 0x0,
>> is_soft = 0x0,
>> clock_update_flags = 0x4,
>> clock = 0x000008204A496900,
>>
>> - The timer expiration time (rq->curr->dl_server->dl_timer->expires)
>> is already in the past, indicating the timer has expired.
>> - The timer state (rq->curr->dl_server->dl_timer->state) is 0.
>>
>> [Suspected Root Cause]
>> --------------------
>> The relevant code flow in the throttle path of
>> update_curr_dl_se() as follows:
>>
>> dequeue_dl_entity(dl_se, 0); // the DL entity is dequeued
>>
>> if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) {
>> if (dl_server(dl_se)) // timer registration fails
>> enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);//enqueue immediately
>> ...
>> }
>>
>> The failure of `start_dl_timer` is caused by attempting to register a
>> timer with an expiration time that is already in the past. When this
>> situation persists, the code repeatedly re-enqueues the DL entity
>> without properly replenishing or restarting the timer, resulting in RT
>> task may not be scheduled as expected.
>>
>> [Proposed Solution]:
>> ------------------
>> Instead of immediately re-enqueuing the DL entity on timer registration
>> failure, this change ensures the DL entity is properly replenished and
>> the timer is restarted, preventing RT potential starvation.
>>
>> Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
>
> Thanks, this fixes the issue I was seeing!
>
> Closes: https://lore.kernel.org/CAMuHMdXn4z1pioTtBGMfQM0jsLviqS2jwysaWXpoLxWYoGa82w@mail.gmail.com
> Tested-by: Geert Uytterhoeven <geert@...ux-m68k.org>
>
FWIW the reported issue is also present on an arm64 rk3399 and
$SUBJECT fixes that.
Powered by blists - more mailing lists