[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250917122616.GG1386988@noisy.programming.kicks-ass.net>
Date: Wed, 17 Sep 2025 14:26:16 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: John Stultz <jstultz@...gle.com>
Cc: Juri Lelli <juri.lelli@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Xuewen Yan <xuewen.yan94@...il.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Suleiman Souhlal <suleiman@...gle.com>,
Qais Yousef <qyousef@...alina.io>,
Joel Fernandes <joelagnelf@...dia.com>,
kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
kernel-team@...roid.com
Subject: Re: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck,
allowing cpu starvation
On Wed, Sep 17, 2025 at 11:34:42AM +0200, Peter Zijlstra wrote:
> Yes. This makes sense.
>
> The old code would disable the dl_server when fair tasks drops to 0
> so even though we had that yield in __pick_task_dl(), we'd never hit it.
> So the moment another fair task shows up (0->1) we re-enqueue the
> dl_server (using update_dl_entity() / CBS wakeup rules) and continue
> consuming bandwidth.
>
> However, since we're now not stopping the thing, we hit that yield,
> getting this pretty terrible behaviour where we will only run fair tasks
> until there are none and then yield our entire period, forcing another
> task to wait until the next cycle.
>
> Let me go have a play, surely we can do better.
Can you please try:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent
That's yesterdays patch and the below. Its compile tested only, but
with a bit of luck it'll actually work ;-)
---
Subject: sched/deadline: Fix dl_server behaviour
From: Peter Zijlstra <peterz@...radead.org>
Date: Wed Sep 17 12:03:20 CEST 2025
John reported undesirable behaviour with the dl_server since commit:
cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling").
When starving fair tasks on purpose (starting spinning FIFO tasks),
his fair workload, which often goes (briefly) idle, would delay fair
invocations for a second, running one invocation per second was both
unexpected and terribly slow.
The reason this happens is that when dl_se->server_pick_task() returns
NULL, indicating no runnable tasks, it would yield, pushing any later
jobs out a whole period (1 second).
Instead simply stop the server. This should restore behaviour in that
a later wakeup (which restarts the server) will be able to continue
running (subject to the CBS wakeup rules).
Notably, this does not re-introduce the behaviour cccb45d7c4295 set
out to solve, any start/stop cycle is naturally throttled by the timer
period (no active cancel).
Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@...gle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
include/linux/sched.h | 1 -
kernel/sched/deadline.c | 23 ++---------------------
kernel/sched/sched.h | 33 +++++++++++++++++++++++++++++++--
3 files changed, 33 insertions(+), 24 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -706,7 +706,6 @@ struct sched_dl_entity {
unsigned int dl_defer : 1;
unsigned int dl_defer_armed : 1;
unsigned int dl_defer_running : 1;
- unsigned int dl_server_idle : 1;
/*
* Bandwidth enforcement timer. Each -deadline task has its
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1571,10 +1571,8 @@ void dl_server_update_idle_time(struct r
void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
{
/* 0 runtime = fair server disabled */
- if (dl_se->dl_runtime) {
- dl_se->dl_server_idle = 0;
+ if (dl_se->dl_runtime)
update_curr_dl_se(dl_se->rq, dl_se, delta_exec);
- }
}
void dl_server_start(struct sched_dl_entity *dl_se)
@@ -1602,20 +1600,6 @@ void dl_server_stop(struct sched_dl_enti
dl_se->dl_server_active = 0;
}
-static bool dl_server_stopped(struct sched_dl_entity *dl_se)
-{
- if (!dl_se->dl_server_active)
- return true;
-
- if (dl_se->dl_server_idle) {
- dl_server_stop(dl_se);
- return true;
- }
-
- dl_se->dl_server_idle = 1;
- return false;
-}
-
void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_pick_f pick_task)
{
@@ -2384,10 +2368,7 @@ static struct task_struct *__pick_task_d
if (dl_server(dl_se)) {
p = dl_se->server_pick_task(dl_se);
if (!p) {
- if (!dl_server_stopped(dl_se)) {
- dl_se->dl_yielded = 1;
- update_curr_dl_se(rq, dl_se, 0);
- }
+ dl_server_stop(dl_se);
goto again;
}
rq->dl_server = dl_se;
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -371,10 +371,39 @@ extern s64 dl_scaled_delta_exec(struct r
* dl_server_update() -- called from update_curr_common(), propagates runtime
* to the server.
*
- * dl_server_start()
- * dl_server_stop() -- start/stop the server when it has (no) tasks.
+ * dl_server_start() -- start the server when it has tasks; it will stop
+ * automatically when there are no more tasks, per
+ * dl_se::server_pick() returning NULL.
+ *
+ * dl_server_stop() -- (force) stop the server; use when updating
+ * parameters.
*
* dl_server_init() -- initializes the server.
+ *
+ * When started the dl_server will (per dl_defer) schedule a timer for its
+ * zero-laxity point -- that is, unlike regular EDF tasks which run ASAP, a
+ * server will run at the very end of its period.
+ *
+ * This is done such that any runtime from the target class can be accounted
+ * against the server -- through dl_server_update() above -- such that when it
+ * becomes time to run, it might already be out of runtime and get deferred
+ * until the next period. In this case dl_server_timer() will alternate
+ * between defer and replenish but never actually enqueue the server.
+ *
+ * Only when the target class does not manage to exhaust the server's runtime
+ * (there's actualy starvation in the given period), will the dl_server get on
+ * the runqueue. Once queued it will pick tasks from the target class and run
+ * them until either its runtime is exhaused, at which point its back to
+ * dl_server_timer, or until there are no more tasks to run, at which point
+ * the dl_server stops itself.
+ *
+ * By stopping at this point the dl_server retains bandwidth, which, if a new
+ * task wakes up imminently (starting the server again), can be used --
+ * subject to CBS wakeup rules -- without having to wait for the next period.
+ *
+ * Additionally, because of the dl_defer behaviour the start/stop behaviour is
+ * naturally thottled to once per period, avoiding high context switch
+ * workloads from spamming the hrtimer program/cancel paths.
*/
extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec);
extern void dl_server_start(struct sched_dl_entity *dl_se);
Powered by blists - more mailing lists