lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250627115118.438797-1-juri.lelli@redhat.com>
Date: Fri, 27 Jun 2025 13:51:13 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Waiman Long <llong@...hat.com>
Cc: linux-kernel@...r.kernel.org,
	Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>,
	Luca Abeni <luca.abeni@...tannapisa.it>,
	Juri Lelli <juri.lelli@...hat.com>
Subject: [PATCH 0/5] sched/deadline: Fix GRUB accounting

Hi All,

This patch series addresses a significant regression observed in
`SCHED_DEADLINE` performance, specifically when `SCHED_FLAG_RECLAIM`
(Greedy Reclamation of Unused Bandwidth - GRUB) is enabled alongside
overrunning jobs. This issue was reported by Marcel [1].

Marcel's team extensive real-time scheduler (`SCHED_DEADLINE`) tests on
mainline Linux kernels (amd64-based Intel NUCs and aarch64-based RADXA
ROCK5Bs) typically show zero deadline misses for 5ms granularity tasks.
However, with reclaim mode enabled and the same two overrunning jobs in
the mix, they observed a dramatic increase in deadline misses: 43
million on NUC and 600 thousand on ROCK55B. This highlights a critical
accounting issue within `SCHED_DEADLINE` when reclaim is active.

This series fixes the issue by doing the following.

- 1/5: sched/deadline: Initialize dl_servers after SMP
  Currently, `dl-servers` are initialized too early during boot, before
  all CPUs are online. This results in an incorrect calculation of
  per-runqueue `DEADLINE` variables, such as `extra_bw`, which rely on a
  stable CPU count. This patch moves the `dl-server` initialization to a
  later stage, after SMP initialization, ensuring all CPUs are online and
  correct `extra_bw` values can be computed from the start.

- 2/5: sched/deadline: Reset extra_bw to max_bw when clearing root domains
  The `dl_clear_root_domain()` function was found to not properly account
  for the fact that per-runqueue `extra_bw` variables retained stale
  values computed before root domain changes. This led to broken
  accounting. This patch fixes the issue by resetting `extra_bw` to
  `max_bw` before restoring `dl-server` contributions, ensuring a clean
  state.

- 3/5: sched/deadline: Fix accounting after global limits change
  Changes to global `SCHED_DEADLINE` limits (handled by
  `sched_rt_handler()` logic) were found to leave stale or incorrect
  values in various accounting-related variables, including `extra_bw`.
  This patch properly cleans up per-runqueue variables before implementing
  the global limit change and then rebuilds the scheduling domains. This
  ensures that the accounting is correctly restored and maintained after
  such global limit adjustments.

- 4/5 and 5/5 are simple drgn scripts I put together to help debugging
  this issue. I have the impression that they might be useful to have
  around for the future.

Please review and test.

The set is also availabe at

git@...hub.com:jlelli/linux.git upstream/fix-grub-tip

1 - https://lore.kernel.org/lkml/ce8469c4fb2f3e2ada74add22cce4bfe61fd5bab.camel@codethink.co.uk/

Thanks,
Juri

Juri Lelli (5):
  sched/deadline: Initialize dl_servers after SMP
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Fix accounting after global limits change
  tools/sched: Add root_domains_dump.py which dumps root domains info
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info

 MAINTAINERS                      |  1 +
 kernel/sched/core.c              |  2 +
 kernel/sched/deadline.c          | 61 +++++++++++++++++++---------
 kernel/sched/rt.c                |  6 +++
 kernel/sched/sched.h             |  1 +
 tools/sched/dl_bw_dump.py        | 57 ++++++++++++++++++++++++++
 tools/sched/root_domains_dump.py | 68 ++++++++++++++++++++++++++++++++
 7 files changed, 177 insertions(+), 19 deletions(-)
 create mode 100755 tools/sched/dl_bw_dump.py
 create mode 100755 tools/sched/root_domains_dump.py

-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ