[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918090445.GF3289052@noisy.programming.kicks-ass.net>
Date: Thu, 18 Sep 2025 11:04:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: John Stultz <jstultz@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Xuewen Yan <xuewen.yan94@...il.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Suleiman Souhlal <suleiman@...gle.com>,
Qais Yousef <qyousef@...alina.io>,
Joel Fernandes <joelagnelf@...dia.com>,
kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck,
allowing cpu starvation
On Thu, Sep 18, 2025 at 10:37:04AM +0200, Juri Lelli wrote:
> On 17/09/25 19:30, Peter Zijlstra wrote:
> > On Wed, Sep 17, 2025 at 03:56:20PM +0200, Juri Lelli wrote:
> >
> > > > + * By stopping at this point the dl_server retains bandwidth, which, if a new
> > > > + * task wakes up imminently (starting the server again), can be used --
> > > > + * subject to CBS wakeup rules -- without having to wait for the next period.
> > >
> > > In both cases we still defer until either the new period or the current
> > > 0-laxity, right?
> > >
> > > The stop cleans all the flags, so subsequent start calls
> > > enqueue(ENQUEUE_WAKEUP) -> update_dl_entity() which sets dl_throttled
> > > and dl_defer_armed in both cases and then we start_dl_timer (defer
> > > timer) after it (without enqueueing right away).
> > >
> > > Or maybe I am still a bit lost. :)
> >
> > The way I read it earlier today:
> >
> > dl_server_start()
> > enqueue_dl_entity(WAKEUP)
> > if (WAKEUP)
> > task_contending();
> > update_dl_entity()
> > dl_entity_overflows() := true
> > update_dl_revised_wakeup();
> >
> > In that case, it is possible to continue running with a slight
> > adjustment to the runtime (it gets scaled back to account for 'lost'
> > time or somesuch IIRC).
> >
>
> Hummm, but this is for !implicit (dl_deadline != dl_period) tasks, is
> it? And dl-servers are implicit.
Bah. You're right.
So how about this:
dl_server_timer()
if (dl_se->dl_defer_armed)
dl_se->dl_defer_running = 1;
enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH)
__pick_task_dl()
p = dl_se->server_pick_task(dl_se);
if (!p)
dl_server_stop()
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
dl_se->dl_server_active = 0;
/* notably it leaves dl_defer_running == 1 */
dl_server_start()
dl_se->dl_server_active = 1;
enqueue_dl_entity(WAKEUP)
if (WAKEUP)
task_contending();
update_dl_entity()
if (dl_server() && dl_se->dl_defer)
if (!dl_se->dl_defer_running) /* !true := false */
/* do not set dl_defer_armed / dl_throttled */
Note: update_curr_dl_se() will eventually clear dl_defer_running when it
gets throttled.
And so it continues with the previous reservation. And I suppose the
question is, should it do update_dl_revised_wakeup() in this case?
Powered by blists - more mailing lists