[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250627115118.438797-1-juri.lelli@redhat.com>
Date: Fri, 27 Jun 2025 13:51:13 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Waiman Long <llong@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>,
Luca Abeni <luca.abeni@...tannapisa.it>,
Juri Lelli <juri.lelli@...hat.com>
Subject: [PATCH 0/5] sched/deadline: Fix GRUB accounting
Hi All,
This patch series addresses a significant regression observed in
`SCHED_DEADLINE` performance, specifically when `SCHED_FLAG_RECLAIM`
(Greedy Reclamation of Unused Bandwidth - GRUB) is enabled alongside
overrunning jobs. This issue was reported by Marcel [1].
Marcel's team extensive real-time scheduler (`SCHED_DEADLINE`) tests on
mainline Linux kernels (amd64-based Intel NUCs and aarch64-based RADXA
ROCK5Bs) typically show zero deadline misses for 5ms granularity tasks.
However, with reclaim mode enabled and the same two overrunning jobs in
the mix, they observed a dramatic increase in deadline misses: 43
million on NUC and 600 thousand on ROCK55B. This highlights a critical
accounting issue within `SCHED_DEADLINE` when reclaim is active.
This series fixes the issue by doing the following.
- 1/5: sched/deadline: Initialize dl_servers after SMP
Currently, `dl-servers` are initialized too early during boot, before
all CPUs are online. This results in an incorrect calculation of
per-runqueue `DEADLINE` variables, such as `extra_bw`, which rely on a
stable CPU count. This patch moves the `dl-server` initialization to a
later stage, after SMP initialization, ensuring all CPUs are online and
correct `extra_bw` values can be computed from the start.
- 2/5: sched/deadline: Reset extra_bw to max_bw when clearing root domains
The `dl_clear_root_domain()` function was found to not properly account
for the fact that per-runqueue `extra_bw` variables retained stale
values computed before root domain changes. This led to broken
accounting. This patch fixes the issue by resetting `extra_bw` to
`max_bw` before restoring `dl-server` contributions, ensuring a clean
state.
- 3/5: sched/deadline: Fix accounting after global limits change
Changes to global `SCHED_DEADLINE` limits (handled by
`sched_rt_handler()` logic) were found to leave stale or incorrect
values in various accounting-related variables, including `extra_bw`.
This patch properly cleans up per-runqueue variables before implementing
the global limit change and then rebuilds the scheduling domains. This
ensures that the accounting is correctly restored and maintained after
such global limit adjustments.
- 4/5 and 5/5 are simple drgn scripts I put together to help debugging
this issue. I have the impression that they might be useful to have
around for the future.
Please review and test.
The set is also availabe at
git@...hub.com:jlelli/linux.git upstream/fix-grub-tip
1 - https://lore.kernel.org/lkml/ce8469c4fb2f3e2ada74add22cce4bfe61fd5bab.camel@codethink.co.uk/
Thanks,
Juri
Juri Lelli (5):
sched/deadline: Initialize dl_servers after SMP
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Fix accounting after global limits change
tools/sched: Add root_domains_dump.py which dumps root domains info
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
MAINTAINERS | 1 +
kernel/sched/core.c | 2 +
kernel/sched/deadline.c | 61 +++++++++++++++++++---------
kernel/sched/rt.c | 6 +++
kernel/sched/sched.h | 1 +
tools/sched/dl_bw_dump.py | 57 ++++++++++++++++++++++++++
tools/sched/root_domains_dump.py | 68 ++++++++++++++++++++++++++++++++
7 files changed, 177 insertions(+), 19 deletions(-)
create mode 100755 tools/sched/dl_bw_dump.py
create mode 100755 tools/sched/root_domains_dump.py
--
2.49.0
Powered by blists - more mailing lists