[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXjJG-72XEqvaVtL@gpd4>
Date: Tue, 27 Jan 2026 15:18:03 +0100
From: Andrea Righi <arighi@...dia.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>,
Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
stop
On Tue, Jan 27, 2026 at 08:52:48AM +0000, Gabriele Monaco wrote:
> 2026-01-26T21:27:11Z Andrea Righi <arighi@...dia.com>:
> > On Mon, Jan 26, 2026 at 04:56:52PM +0000, Gabriele Monaco wrote:
> >> Still if it starts before the deadline, the server is going to get throttled as you observed, and perhaps since in your tests the CPU isn't idle, we don't stop the server after that dequeue and then we never replenish after the deadline (because we never start and as you mentioned, the timer is not armed).
> >>
> >> Can this be what you're observing?
> >
> > Yes, I think it matches what I'm observing.
> >
> > In my case the server is (re)started before the deadline, so it immediately
> > runs with exhausted runtime, gets throttled, and is dequeued. Since the CPU
> > isn't idle, we don't hit a path that would stop the server cleanly and
> > reset its execution state.
> >
> > At that point, because dl_defer_running is still set, the restart path
> > assumes the server is already in the running phase and skips arming the
> > deferral/replenishment timer. Therefore, once the deadline passes there is
> > no remaining trigger to replenish a new period and the server gets stuck in
> > a throttled-but-running state.
> >
>
> Alright thanks. I believe your fix would work even if you reset the defer_running only when the runtime is exhausted.
>
> This way we'd still keep a bit of benefits of the start-running sequence if fair/scx tasks sleep and run back when the server still has runtime.
>
> We could even keep the defer_running as it is and mark the server as defer_armed (with laxity timer and stuff) only if it starts in this exact condition (runtime = 0 and deadline not expired). But this may just be overly complex for little benefit.
>
> What do you think?
I think my case should work also doing something like this (I'll run some
tests later to double check):
if (dl_se->runtime <= 0)
dl_se->dl_defer_running = 0;
In this way:
- short sleep + remaining runtime > 0
- dl_defer_running stays set
- restart can go A->D directly
- no extra defer / zero-laxity penalty
- stop with exhausted (or negative) runtime
- dl_defer_running is cleared
- restart must re-establish eligibility
- deferral / timer is armed again
- no stale "already running" server
However, I think the right assumption should be that both runtime **and**
deadline are still coherent, so we should probably do something like this
to be fully correct:
if (dl_se->runtime <= 0 ||
dl_time_before(dl_se->deadline, rq_clock(dl_se->rq)))
dl_se->dl_defer_running = 0;
This makes the stop path slightly more complex, so I'm not sure whether
it's preferable to go in this direction or just unconditionally clearing
dl_defer_running, which is simpler and more explicit from a state-machine
point of view.
Which one do we prefer? Happy to go with whatever approach you think makes
more sense.
Thanks,
-Andrea
Powered by blists - more mailing lists