[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWUJI4u0R35UYg0X@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 12 Jan 2026 15:45:55 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <jlelli@...hat.com>,
Clark Williams <williams@...hat.com>
Subject: Re: sched/deadline: Server stops while fair tasks are still runnable
Hi Gabriele,
On 12/01/26 10:02, Gabriele Monaco wrote:
> The boost model in [1] spotted cases with fair tasks running after the server
> stopped. I believe this means that it's equally possible for those tasks to
> starve without the server even trying to start.
>
> There are 2 related situation that can result in this:
>
> 1. dl_server_idle is set to true when the CPU is really idle
> 2. a fair task wakes up (nothing happens since the server is still active)
> 3. dl_server_timer stops the server since it's idle, although it isn't really
>
> ```
> <idle>-0 d.h3. 2.521220: event_boost: -12: idle x dl_replenish -> idle
> <idle>-0 d.h3. 2.521227: event_laxity: -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
> thread3-3-580 d..4. 3.338067: event_boost: -12: idle x dl_server_resume_throttled -> throttled
> thread3-3-580 d..3. 3.338076: sched_wakeup: ksoftirqd/12:118 [120] CPU:012
> thread5-5-582 d.h2. 3.471237: event_boost: -12: throttled x dl_server_stop -> stopped (final)
> thread5-5-582 d.h2. 3.471240: event_laxity: -12: idle_wait x dl_server_stop -> stopped (final)
> ktimers/12-117 d..3. 3.538088: error_boost: -12: event sched_switch_in not expected in the state stopped
> ktimers/12-117 d..2. 3.538089: sched_switch: ktimers/12:117 [98] S ==> ksoftirqd/12:118 [120]
> ```
>
>
> 1. dl_server_timer runs when the CPU is idle but after a fair task wakes up
> 2. the call to update_curr sets dl_server_idle since it's only checking rq->curr
> (which is indeed idle)
> 3. the server is stopped within the same timer call although a fair task just
> woke up
>
> ```
> ktimers/13-125 d.s52 7.309878: event_boost: -12: stopped x dl_server_start -> ready
> ktimers/13-125 d.s52 7.309878: event_laxity: -12: stopped x dl_server_start -> zero_laxity_wait
> ktimers/13-125 d.s42 7.309879: sched_wakeup: kworker/u519:2:5559 [120] CPU:012
> <idle>-0 d.h3. 7.309889: event_boost: -12: ready x dl_replenish -> ready
> <idle>-0 d.h3. 7.309889: event_laxity: -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
> <idle>-0 d.h3. 7.309890: event_boost: -12: ready x dl_server_stop -> stopped (final)
> <idle>-0 d.h3. 7.309891: event_laxity: -12: idle_wait x dl_server_stop -> stopped (final)
> <idle>-0 d..3. 7.309895: error_boost: -12: event sched_switch_in not expected in the state stopped
> <idle>-0 d..2. 7.309896: sched_switch: swapper/12:0 [120] R ==> kworker/u519:2:5559 [120]
>
> ```
>
> In both cases, if no new fair task wakes up, the one that woke up right before
> stopping the server could starve. That's a problem isn't it?
Yeah, think so. :/
Looks like we clear dl_defer_idle only while updating the fair task
entity (which didn't happen yet in both cases above).
> I tried a quick solution to the first case by clearing dl_server_idle every time
> the server started (even if it's still active, that is at every fair wakeup) and
> I think the second case could be avoided by relying on idle_cpu() instead of
> looking only on rq->curr, taking also waking tasks into account (not tried).
Not sure if you want to wait for Peter to chime in, but if you have
patches we can definitely take a look. :)
Thanks,
Juri
Powered by blists - more mailing lists