linux-kernel - sched/deadline: Server stops while fair tasks are still runnable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <8394f68afac3b26aeba3def48d927e3ab7c177e3.camel@redhat.com>
Date: Mon, 12 Jan 2026 10:02:38 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, 
 Juri Lelli <jlelli@...hat.com>
Cc: Clark Williams <williams@...hat.com>
Subject: sched/deadline: Server stops while fair tasks are still runnable

The boost model in [1] spotted cases with fair tasks running after the server
stopped. I believe this means that it's equally possible for those tasks to
starve without the server even trying to start.

There are 2 related situation that can result in this:

1. dl_server_idle is set to true when the CPU is really idle
2. a fair task wakes up (nothing happens since the server is still active)
3. dl_server_timer stops the server since it's idle, although it isn't really

```
    <idle>-0    d.h3.  2.521220: event_boost:     -12: idle x dl_replenish -> idle
    <idle>-0    d.h3.  2.521227: event_laxity:    -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
 thread3-3-580  d..4.  3.338067: event_boost:     -12: idle x dl_server_resume_throttled -> throttled
 thread3-3-580  d..3.  3.338076: sched_wakeup:    ksoftirqd/12:118 [120] CPU:012
 thread5-5-582  d.h2.  3.471237: event_boost:     -12: throttled x dl_server_stop -> stopped (final)
 thread5-5-582  d.h2.  3.471240: event_laxity:    -12: idle_wait x dl_server_stop -> stopped (final)
ktimers/12-117  d..3.  3.538088: error_boost:     -12: event sched_switch_in not expected in the state stopped
ktimers/12-117  d..2.  3.538089: sched_switch:    ktimers/12:117 [98] S ==> ksoftirqd/12:118 [120]
```

1. dl_server_timer runs when the CPU is idle but after a fair task wakes up
2. the call to update_curr sets dl_server_idle since it's only checking rq->curr
(which is indeed idle)
3. the server is stopped within the same timer call although a fair task just
woke up

```
ktimers/13-125  d.s52  7.309878: event_boost:    -12: stopped x dl_server_start -> ready
ktimers/13-125  d.s52  7.309878: event_laxity:   -12: stopped x dl_server_start -> zero_laxity_wait
ktimers/13-125  d.s42  7.309879: sched_wakeup:   kworker/u519:2:5559 [120] CPU:012
    <idle>-0    d.h3.  7.309889: event_boost:    -12: ready x dl_replenish -> ready
    <idle>-0    d.h3.  7.309889: event_laxity:   -12: zero_laxity_wait x dl_replenish_idle -> idle_wait
    <idle>-0    d.h3.  7.309890: event_boost:    -12: ready x dl_server_stop -> stopped (final)
    <idle>-0    d.h3.  7.309891: event_laxity:   -12: idle_wait x dl_server_stop -> stopped (final)
    <idle>-0    d..3.  7.309895: error_boost:    -12: event sched_switch_in not expected in the state stopped
    <idle>-0    d..2.  7.309896: sched_switch:   swapper/12:0 [120] R ==> kworker/u519:2:5559 [120]

```

In both cases, if no new fair task wakes up, the one that woke up right before
stopping the server could starve. That's a problem isn't it?

I tried a quick solution to the first case by clearing dl_server_idle every time
the server started (even if it's still active, that is at every fair wakeup) and
I think the second case could be avoided by relying on idle_cpu() instead of
looking only on rq->curr, taking also waking tasks into account (not tried).

Is this a real issue or is the monitor just too sensitive?

Thanks,
Gabriele

[1] - https://lore.kernel.org/lkml/20251205131621.135513-1-gmonaco@redhat.com