[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFV8qeH__bw0chWM@jlelli-thinkpadt14gen4.remote.csb>
Date: Fri, 20 Jun 2025 17:22:17 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
jstultz <jstultz@...gle.com>, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org
Subject: Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential starvation
when expiry time passed
On 20/06/25 11:00, Kuyo Chang wrote:
...
> "DL replenish lagged too much" means the fair_server took much longer
> than expected to use up its running time,
> so the deadline fell way behind the clock (which is also why
> start_dl_timer() failed).
> In this situation, just replenishing one dl_period isn’t enough to
> catch up.
>
> A corner case is when there are too many IRQs or IPIs in the system.
> In this case, runtime gets consumed very slowly, and the fair_server
> keep running without being throttled.
> Even the runtime is exhausted finally, the fair_server would be
> restarted immediately.
> In the end, IRQs, IPIs, and fair tasks can take over the whole system,
> no chance for RT tasks to run.
Thanks for the additional explanation.
The way I understand it now is the following (of course please correct
me if I am still not getting it :)
- a dl_server is actively servicing NORMAL tasks, but suffers lot of IRQ
load and cannot make much progress
- it does anyway make progress, but it reaches update_curr_dl_se@...ottle
only when its current deadline is past rq_clock
- dl_runtime_exceeded() branch is entered, but start_dl_timer() fails as
the computed act is still in the past
- enqueue_dl_entity(REPLENISH) call replenish_dl_entity() which tries to
add runtime and advance the deadline, but time moved on so far that
deadline is still behind rq_clock() and so "DL replenish ..." is
printed
- replenish_dl_new_period() updates runtime and deadline from current
clock and the dl-server is put back to run (so it continues to run
over/starve FIFO tasks)
It looks like your proposed fix might work in this particular corner
case, but I am not 100% comfortable with not trying to replenish
properly (catch up with runtime) at all. I wonder if we might then start
missing some other corner case. Maybe we could try to catch this
particular corner case before even attempting to start the dl_timer,
since we know it will fail, and do something at that point?
Thanks,
Juri
Powered by blists - more mailing lists