linux-kernel - [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing cpu starvation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250916052904.937276-1-jstultz@google.com>
Date: Tue, 16 Sep 2025 05:28:09 +0000
From: John Stultz <jstultz@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>
Cc: John Stultz <jstultz@...gle.com>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Suleiman Souhlal <suleiman@...gle.com>, Qais Yousef <qyousef@...alina.io>, 
	Joel Fernandes <joelagnelf@...dia.com>, kuyo chang <kuyo.chang@...iatek.com>, 
	hupu <hupu.gm@...il.com>, kernel-team@...roid.com
Subject: [RFC][PATCH] sched/deadline: Fix dl_server getting stuck, allowing
 cpu starvation

With 6.17-rc6, I found when running with locktorture enabled, on
a two core qemu VM, I could easily hit some lockup warnings:

[   92.301253] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 42s!
[   92.305170] Showing busy workqueues and worker pools:
[   92.307434] workqueue events_power_efficient: flags=0x80
[   92.309796]   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
[   92.309834]     pending: neigh_managed_work
[   92.314565]   pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=4 refcnt=5
[   92.314604]     pending: crda_timeout_work, neigh_managed_work, neigh_periodic_work, gc_worker
[   92.321151] workqueue mm_percpu_wq: flags=0x8
[   92.323124]   pwq 6: cpus=1 node=0 flags=0x0 nice=0 active=1 refcnt=2
[   92.323161]     pending: vmstat_update
[   92.327638] workqueue kblockd: flags=0x18
[   92.329429]   pwq 7: cpus=1 node=0 flags=0x0 nice=-20 active=1 refcnt=2
[   92.329467]     pending: blk_mq_timeout_work
[   92.334259] Showing backtraces of running workers in stalled CPU-bound worker pools:

I bisected it down to commit cccb45d7c429 ("sched/deadline: Less
agressive dl_server handling"), and in debugging it seems there
is a chance where we end up with the dl_server dequeued, with
dl_se->dl_server_active. This causes dl_server_start() to
return without enqueueing the dl_server, thus it fails to run
when RT tasks starve the cpu.

I found when this happens, the dl_timer hrtimer is set and calls
 dl_server_timer(), which catches on the
  `if (!dl_se->server_has_tasks(dl_se))`
case, which then calls replenish_dl_entity() and
dl_server_stopped() and finally returns HRTIMER_NORESTART.

The problem being, dl_server_stopped() will set
dl_se->dl_server_idle before returning false (and notably not
calling dl_server_stop() which would clear dl_server_active).

After this, we end up in a situation where the timer doesn't
fire again. And nothing enqueues the dl_server entity back onto
the runqueue, so it never picks from the fair sched and we see
the starvation on that core.

So in dl_server_timer() call dl_server_stop() instead of
dl_server_stopped(), as that will ensure dl_server_active
gets cleared when we are dequeued.

Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Signed-off-by: John Stultz <jstultz@...gle.com>
---
NOTE: I'm not confident this is the right fix, but I wanted
to share for feedback and testing.

Also, this resolves the lockup warnings and problematic behavior
I see with locktorture, but does *not* resolve the behavior
change I hit with my ksched_football test (which intentionally
causes RT starvation) that I bisected down to the same
problematic change and mentioned here:
  https://lore.kernel.org/lkml/20250722070600.3267819-1-jstultz@google.com/
This may be just a problem with my test, but I'm still a bit
wary that this behavior change may bite folks.

Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Xuewen Yan <xuewen.yan94@...il.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>
Cc: Suleiman Souhlal <suleiman@...gle.com>
Cc: Qais Yousef <qyousef@...alina.io>
Cc: Joel Fernandes <joelagnelf@...dia.com>
Cc: kuyo chang <kuyo.chang@...iatek.com>
Cc: hupu <hupu.gm@...il.com>
Cc: kernel-team@...roid.com
---
 kernel/sched/deadline.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f25301267e471..215c3e2cee370 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1152,8 +1152,6 @@ static void __push_dl_task(struct rq *rq, struct rq_flags *rf)
 /* a defer timer will not be reset if the runtime consumed was < dl_server_min_res */
 static const u64 dl_server_min_res = 1 * NSEC_PER_MSEC;
 
-static bool dl_server_stopped(struct sched_dl_entity *dl_se);
-
 static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_dl_entity *dl_se)
 {
 	struct rq *rq = rq_of_dl_se(dl_se);
@@ -1173,7 +1171,7 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
 
 		if (!dl_se->server_has_tasks(dl_se)) {
 			replenish_dl_entity(dl_se);
-			dl_server_stopped(dl_se);
+			dl_server_stop(dl_se);
 			return HRTIMER_NORESTART;
 		}
 
-- 
2.51.0.384.g4c02a37b29-goog