[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXeWnzI3ymUSm3Cu@gpd4>
Date: Mon, 26 Jan 2026 17:30:23 +0100
From: Andrea Righi <arighi@...dia.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>,
Joel Fernandes <joelagnelf@...dia.com>,
David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>,
Daniel Hodges <hodgesd@...a.com>, sched-ext@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on
stop
Hi Gabriele,
On Mon, Jan 26, 2026 at 03:20:12PM +0100, Gabriele Monaco wrote:
> On Fri, 2026-01-23 at 17:16 +0100, Andrea Righi wrote:
> > dl_server_stop() can leave a deadline server in an inconsistent internal
> > state across stop/start transitions, causing it to bypass its required
> > deferral phase when restarted. This breaks the scheduler invariant that
> > a restarted server must re-establish eligibility before being allowed to
> > execute.
> >
> > When the server is stopped (e.g., because the associated task blocks),
> > it's expected to transition back to an inactive, initial state. However,
> > dl_server_stop() does not fully reset the execution state. As a result,
> > the server can be logically inactive while still appearing as if it was
> > still running.
> >
> > When the server is restarted via dl_server_start(), the following
> > sequence occurs:
> > 1. dl_server_start() calls enqueue_dl_entity(ENQUEUE_WAKEUP),
> > 2. enqueue_dl_entity() calls update_dl_entity(),
> > 3. update_dl_entity() checks (!dl_se->dl_defer_running) to decide
> > whether to arm the deferral mechanism,
> > 4. because dl_defer_running is stale, the check fails,
> > 5. dl_defer_armed and dl_throttled are not set,
> > 6. enqueue_dl_entity() skips start_dl_timer(), because
> > dl_throttled == 0,
> > 7. the server is enqueued via __enqueue_dl_entity(),
> > 8. the scheduler picks the server to run,
> > 9. update_curr_dl_se() detects that the server has exhausted its
> > runtime (or has negative runtime), as it wasn't properly
> > replenished/deferred,
> > 10. the server is throttled (dl_throttled set to 1) and dequeued,
> > 11. the server repeatedly cycles through wakeup and throttling,
> > effectively receiving no usable CPU bandwidth.
>
> Hello,
>
> I remember wondering why defer_running was kept after stop and Peter suggested
> it's to avoid penalising tasks with short sleeps. [1]
Correct, dl_defer_running was preserved across stop/start to avoid
penalizing very short sleeps. IIUC what Peter explained, this optimization
relies on the assumption that the server is stopped while its execution
context is still coherent (the remaining runtime is still usable and the
deadline has not yet expired), so that the server can resume execution
immediately instead of re-entering the full defer / zero-laxity path.
>
> Clearing defer_running on stop is in fact removing the edge from A:init to
> D:running , isn't it? The server should be able to start as running and not only
> deferred (dl_defer_armed and dl_throttled set).
Yes, that's true in general and preserving the A:init -> D:running
transition is desirable for short sleeps. However, it's only valid as long
as the execution context is still coherent. In the failing case that I'm
experiencing, the server restarts with exhausted runtime and no
deferral/replenishment timer pending, so starting directly in D:running is
no longer a valid transition and breaks the state machine.
Maybe a way to preserve the short-sleep optimization without breaking the
state machine could be to retain dl_defer_running across stop/start only
when the execution context is still coherent (i.e., positive runtime and
deadline not expired). Otherwise clear it, so the server cleanly re-enters
the deferral/replenishment path.
>
> In the sequence you described above, I wonder why the enqueue is never
> replenishing. As far as I understand the runtime should remain <= 0 only as long
> as the enqueue occurs before the deadline, after that it should simply replenish
> a new period (pushing deadline and restoring runtime).
>
> What am I missing here?
Replenishment is not triggered directly by enqueueing, but by the
deferral/replenishment timer. In this case the timer is never armed: stale
dl_defer_running makes the enqueue path believe the server is already in
the running phase, which suppresses deferral arming, causing
start_dl_timer() to be skipped.
Thanks,
-Andrea
>
> Thanks,
> Gabriele
>
> [1] -
> https://lore.kernel.org/lkml/20251111111716.GL278048@noisy.programming.kicks-ass.net
>
> >
> > This results in starvation of the tasks serviced by the deadline server
> > in the presence of competing RT workloads.
> >
> > This issue can be confirmed adding debugging traces, which show that the
> > server skips the deferral timer and is immediately throttled upon
> > execution with negative runtime:
> >
> > DEBUG: dl_server_start: dl_defer_running=1 active=0
> > DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1
> > DEBUG: update_dl_entity: dl_defer_running=1
> > DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0
> > ...
> > DEBUG: update_curr_dl_se: THROTTLED runtime=-954758
> >
> > Fix this by properly resetting dl_defer_running in dl_server_stop(),
> > ensuring the server correctly enters the defer phase upon restart.
> >
> > This issue is quite difficult to observe when only the fair server
> > is present, as the required stop/start patterns are relatively rare.
> > However, it becomes easier to trigger with an additional deadline server
> > with more frequent server lifecycle transitions (such as a sched_ext
> > deadline server).
> >
> > This change is a prerequisite for introducing a sched_ext deadline
> > server, as it ensures correct and predictable behavior across server
> > stop/start cycles.
> >
> > Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/
> > Signed-off-by: Andrea Righi <arighi@...dia.com>
> > ---
> > Changes in v2:
> > - Update state machine documentation
> > - Link to v1:
> > https://lore.kernel.org/all/20260122140833.1655020-1-arighi@nvidia.com/
> >
> > kernel/sched/deadline.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index c509f2e7d69de..e42867061ea77 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1615,7 +1615,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64
> > delta_exec)
> > * dl_server_active = 0
> > * dl_throttled = 0
> > * dl_defer_armed = 0
> > - * dl_defer_running = 0/1
> > + * dl_defer_running = 0
> > * dl_defer_idle = 0
> > *
> > * [B] - zero_laxity-wait
> > @@ -1704,6 +1704,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64
> > delta_exec)
> > * hrtimer_try_to_cancel();
> > * dl_defer_armed = 0;
> > * dl_throttled = 0;
> > + * dl_defer_running = 0;
> > * dl_server_active = 0;
> > * // [A]
> > * return p;
> > @@ -1813,6 +1814,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
> > hrtimer_try_to_cancel(&dl_se->dl_timer);
> > dl_se->dl_defer_armed = 0;
> > dl_se->dl_throttled = 0;
> > + dl_se->dl_defer_running = 0;
> > dl_se->dl_defer_idle = 0;
> > dl_se->dl_server_active = 0;
> > }
>
Powered by blists - more mailing lists