linux-kernel - [PATCH v2] sched/deadline: Reset dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260123161645.2181752-1-arighi@nvidia.com>
Date: Fri, 23 Jan 2026 17:16:45 +0100
From: Andrea Righi <arighi@...dia.com>
To: Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Tejun Heo <tj@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>,
	Daniel Hodges <hodgesd@...a.com>,
	sched-ext@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: [PATCH v2] sched/deadline: Reset dl_server execution state on stop

dl_server_stop() can leave a deadline server in an inconsistent internal
state across stop/start transitions, causing it to bypass its required
deferral phase when restarted. This breaks the scheduler invariant that
a restarted server must re-establish eligibility before being allowed to
execute.

When the server is stopped (e.g., because the associated task blocks),
it's expected to transition back to an inactive, initial state. However,
dl_server_stop() does not fully reset the execution state. As a result,
the server can be logically inactive while still appearing as if it was
still running.

When the server is restarted via dl_server_start(), the following
sequence occurs:
  1. dl_server_start() calls enqueue_dl_entity(ENQUEUE_WAKEUP),
  2. enqueue_dl_entity() calls update_dl_entity(),
  3. update_dl_entity() checks (!dl_se->dl_defer_running) to decide
     whether to arm the deferral mechanism,
  4. because dl_defer_running is stale, the check fails,
  5. dl_defer_armed and dl_throttled are not set,
  6. enqueue_dl_entity() skips start_dl_timer(), because
     dl_throttled == 0,
  7. the server is enqueued via __enqueue_dl_entity(),
  8. the scheduler picks the server to run,
  9. update_curr_dl_se() detects that the server has exhausted its
     runtime (or has negative runtime), as it wasn't properly
     replenished/deferred,
 10. the server is throttled (dl_throttled set to 1) and dequeued,
 11. the server repeatedly cycles through wakeup and throttling,
     effectively receiving no usable CPU bandwidth.

This results in starvation of the tasks serviced by the deadline server
in the presence of competing RT workloads.

This issue can be confirmed adding debugging traces, which show that the
server skips the deferral timer and is immediately throttled upon
execution with negative runtime:

 DEBUG: dl_server_start: dl_defer_running=1 active=0
 DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1
 DEBUG: update_dl_entity: dl_defer_running=1
 DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0
 ...
 DEBUG: update_curr_dl_se: THROTTLED runtime=-954758

Fix this by properly resetting dl_defer_running in dl_server_stop(),
ensuring the server correctly enters the defer phase upon restart.

This issue is quite difficult to observe when only the fair server
is present, as the required stop/start patterns are relatively rare.
However, it becomes easier to trigger with an additional deadline server
with more frequent server lifecycle transitions (such as a sched_ext
deadline server).

This change is a prerequisite for introducing a sched_ext deadline
server, as it ensures correct and predictable behavior across server
stop/start cycles.

Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/
Signed-off-by: Andrea Righi <arighi@...dia.com>
---
Changes in v2:
 - Update state machine documentation
 - Link to v1: https://lore.kernel.org/all/20260122140833.1655020-1-arighi@nvidia.com/

 kernel/sched/deadline.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c509f2e7d69de..e42867061ea77 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1615,7 +1615,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
  *   dl_server_active = 0
  *   dl_throttled = 0
  *   dl_defer_armed = 0
- *   dl_defer_running = 0/1
+ *   dl_defer_running = 0
  *   dl_defer_idle = 0
  *
  * [B] - zero_laxity-wait
@@ -1704,6 +1704,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
  *       hrtimer_try_to_cancel();
  *       dl_defer_armed = 0;
  *       dl_throttled = 0;
+ *       dl_defer_running = 0;
  *       dl_server_active = 0;
  *       // [A]
  *   return p;
@@ -1813,6 +1814,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
 	hrtimer_try_to_cancel(&dl_se->dl_timer);
 	dl_se->dl_defer_armed = 0;
 	dl_se->dl_throttled = 0;
+	dl_se->dl_defer_running = 0;
 	dl_se->dl_defer_idle = 0;
 	dl_se->dl_server_active = 0;
 }
-- 
2.52.0