[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ec534be0618de3e2b4d81555e5f24155326c0b9.camel@mediatek.com>
Date: Fri, 20 Jun 2025 11:00:53 +0800
From: Kuyo Chang <kuyo.chang@...iatek.com>
To: Juri Lelli <juri.lelli@...hat.com>
CC: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, "Ben
Segall" <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
Schneider" <vschneid@...hat.com>, Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, jstultz
<jstultz@...gle.com>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <linux-mediatek@...ts.infradead.org>
Subject: Re: [RFC PATCH 1/1] sched/deadline: Fix RT task potential
starvation when expiry time passed
On Thu, 2025-06-19 at 15:13 +0200, Juri Lelli wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 18/06/25 22:20, Kuyo Chang wrote:
>
> ...
>
> > When dl_defer_running = 1 and the running time has been exhausted,
> > it means that the dl_server should stop at this point.
> > However, if start_dl_timer() returns a failure, it indicates that
> > the
> > actual time spent consuming the running time was unexpectedly long.
> >
> > At this point, there are two options:
> > [as-is] 1. re-enqueuing the dl entity with ENQUEUE_REPLENISH will
> > clear
> > the throttled flag
> > and re-enqueue the dl entity to keep the fair_server running.
> > enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);
> > => replenish_dl_entity
> > => replenish_dl_new_period(dl_se, rq);
> > => dl_se->dl_yielded = 0;
> > => dl_se->dl_throttled = 0;
> > => __enqueue_dl_entity(dl_se);
> >
> > [to-be] 2. To avoid RT latency, the fair_server should remain
> > throttled
> > while replenishing the dl_se.
> > Once replenishing is complete, we can ensure that a timer is
> > successfully started.
> > When the timer is triggered, the throttled state will be cleared,
> > ensuring that RT tasks can execute during this interval.
> >
> > It is a policy decision for dealing with the case of failure in
> > start_dl_timer().
> > The second approach is better for real-time (RT) latency in my
> > opinion,
> > as RT tasks must be prioritized.
>
> OK, I think I see your points, but I am still not sure I fully
> understand the link with the issue you describe in the changelog -
> the
> relation with "DL replenish lagged too much", that is.
>
> Could you please expand on the details of the situation that is
> opening
> up for the issue your patch is addressing? Do you know why we hit the
> corner case that causes the warning in the first place?
>
"DL replenish lagged too much" means the fair_server took much longer
than expected to use up its running time,
so the deadline fell way behind the clock (which is also why
start_dl_timer() failed).
In this situation, just replenishing one dl_period isn’t enough to
catch up.
A corner case is when there are too many IRQs or IPIs in the system.
In this case, runtime gets consumed very slowly, and the fair_server
keep running without being throttled.
Even the runtime is exhausted finally, the fair_server would be
restarted immediately.
In the end, IRQs, IPIs, and fair tasks can take over the whole system,
no chance for RT tasks to run.
> I would like to understand exactly what we are trying to fix before
> deciding how to fix it, sorry if I am being dense. :-)
>
> Thanks,
> Juri
>
Powered by blists - more mailing lists