[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <8394f68afac3b26aeba3def48d927e3ab7c177e3.camel@redhat.com>
Date: Mon, 12 Jan 2026 10:02:38 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <jlelli@...hat.com>
Cc: Clark Williams <williams@...hat.com>
Subject: sched/deadline: Server stops while fair tasks are still runnable
The boost model in [1] spotted cases with fair tasks running after the server
stopped. I believe this means that it's equally possible for those tasks to
starve without the server even trying to start.
There are 2 related situation that can result in this:
1. dl_server_idle is set to true when the CPU is really idle
2. a fair task wakes up (nothing happens since the server is still active)
3. dl_server_timer stops the server since it's idle, although it isn't really
```
<idle>-0 d.h3. 2.521220: event_boost: -12: idle x dl_replenish -> idle
<idle>-0 d.h3. 2.521227: event_laxity: -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
thread3-3-580 d..4. 3.338067: event_boost: -12: idle x dl_server_resume_throttled -> throttled
thread3-3-580 d..3. 3.338076: sched_wakeup: ksoftirqd/12:118 [120] CPU:012
thread5-5-582 d.h2. 3.471237: event_boost: -12: throttled x dl_server_stop -> stopped (final)
thread5-5-582 d.h2. 3.471240: event_laxity: -12: idle_wait x dl_server_stop -> stopped (final)
ktimers/12-117 d..3. 3.538088: error_boost: -12: event sched_switch_in not expected in the state stopped
ktimers/12-117 d..2. 3.538089: sched_switch: ktimers/12:117 [98] S ==> ksoftirqd/12:118 [120]
```
1. dl_server_timer runs when the CPU is idle but after a fair task wakes up
2. the call to update_curr sets dl_server_idle since it's only checking rq->curr
(which is indeed idle)
3. the server is stopped within the same timer call although a fair task just
woke up
```
ktimers/13-125 d.s52 7.309878: event_boost: -12: stopped x dl_server_start -> ready
ktimers/13-125 d.s52 7.309878: event_laxity: -12: stopped x dl_server_start -> zero_laxity_wait
ktimers/13-125 d.s42 7.309879: sched_wakeup: kworker/u519:2:5559 [120] CPU:012
<idle>-0 d.h3. 7.309889: event_boost: -12: ready x dl_replenish -> ready
<idle>-0 d.h3. 7.309889: event_laxity: -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
<idle>-0 d.h3. 7.309890: event_boost: -12: ready x dl_server_stop -> stopped (final)
<idle>-0 d.h3. 7.309891: event_laxity: -12: idle_wait x dl_server_stop -> stopped (final)
<idle>-0 d..3. 7.309895: error_boost: -12: event sched_switch_in not expected in the state stopped
<idle>-0 d..2. 7.309896: sched_switch: swapper/12:0 [120] R ==> kworker/u519:2:5559 [120]
```
In both cases, if no new fair task wakes up, the one that woke up right before
stopping the server could starve. That's a problem isn't it?
I tried a quick solution to the first case by clearing dl_server_idle every time
the server started (even if it's still active, that is at every fair wakeup) and
I think the second case could be avoided by relying on idle_cpu() instead of
looking only on rq->curr, taking also waking tasks into account (not tried).
Is this a real issue or is the monitor just too sensitive?
Thanks,
Gabriele
[1] - https://lore.kernel.org/lkml/20251205131621.135513-1-gmonaco@redhat.com
Powered by blists - more mailing lists