lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251201124205.11169-1-yurand2000@gmail.com>
Date: Mon,  1 Dec 2025 13:41:33 +0100
From: Yuri Andriaccio <yurand2000@...il.com>
To: Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Cc: linux-kernel@...r.kernel.org,
	Luca Abeni <luca.abeni@...tannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@...tannapisa.it>
Subject: [RFC PATCH v4 00/28] Hierarchical Constant Bandwidth Server

Hello,

This is the v4 for Hierarchical Constant Bandwidth Server, aiming at replacing
the current RT_GROUP_SCHED mechanism with something more robust and
theoretically sound. The patchset has been presented at OSPM25
(https://retis.sssup.it/ospm-summit/), and a summary of its inner workings can
be found at https://lwn.net/Articles/1021332/ . You can find the previous
versions of this patchset at the bottom of the page, in particular version 1
which talks in more detail what this patchset is all about and how it is
implemented.

This v4 version reworks some of the patches as suggested by Juri Lelli and
Markus Elfring. Follows the list of changes:
- General refactorings, cleanups, removal of unnecessary ifdeffy and comments.
- Add Documentation for HCBS, with how-tos and some theoretical background.
- Change names/definitions of active groups:
  - A **live** group is one that is accounted for bw and tasks can be attached.
  - An **active** group is a **live** group with tasks running inside.
- Add correct cleanup of allocated memory in alloc_rt_sched_group (on allocation
  failure), even tho free_rt_sched_group is called on error.
- Fix computing of new bandwidth values in dl_init_tg.
- Fix check in dl_check_tg to use capacity scaling.
- Fix wakeup_preempt_rt to check if curr is a DEADLINE task.
- Update inc/dec_dl_tasks to account for served runqueues regardless of the
  server type.
  - Update add_nr_running to update root domains and perform tracing only if the
    given runqueue is global.
- Introduce server_try_pull_task, as server_has_task gets removed in kernel
  version 6.18. This is needed to perform a pull on HCBS server replenish.
- Introduce RELEASE_LOCK macro for cleaner guard-based lock code.
- Move debug BUG_ONs to separate patches, since they are not meant to be used as
  asserts. The last two patches are not meant to be incorporated in the kernel,
  but are just used to introduce debug asserts for easier testing of expected
  preconditions when executing some functions.

The testing system has also been updated to get rid of the closed-source
software dependency to generate the tasksets and their valid configurations.
More on that on the Testing section.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Summary of the patches:
   1-4) Preparation patches, so that the RT classes' code can be used both
        for normal and cgroup scheduling.
  5-16) Implementation of HCBS, no migration and only one level hierarchy.
        The old RT_GROUP_SCHED code is removed.
 17-18) Remove cgroups v1 in favour of v2.
    19) Add support for deeper hierarchies.
 20-25) Add support for tasks migration.
    26) Documentation for HCBS.
 27-28) Debug BUG_ONs optional patches.

Updates from v3:
- Rebase to latest tip/master.
- General rebasing/cleanup.
- Add Documentation.
- Define **live** and **active** groups.
- Introduce server_try_pull_task in place of the removed server_has_task.
- Introduce RELEASE_LOCK helper macro for guard-based locking.
- Update inc/dec_dl_tasks to account for served runqueues regardless of the
  server type.
- Fix computing of new bandwidth values in dl_init_tg.
- Fix check in dl_check_tg to use capacity scaling.
- Fix wakeup_preempt_rt to check if curr is a DEADLINE task.

Updates from v2:
- Rebase to latest tip/master.
- Remove fair-servers' bw reclaiming.
- Fix a check which prevented execution of wakeup_preempt code.
- Fix a priority check in group_pull_rt_task between tasks of different groups.
- Rework allocation/deallocation code for rt-cgroups.
- Update signatures for some group related migration functions.
- Add documentation for wakeup_preempt preemption rules.

Updates from v1:
- Rebase to latest tip/master.
- Add migration code.
- Split big patches for more readability.
- Refactor code to use guarded locks where applicable.
- Remove unnecessary patches from v1 which have been addressed differently by
  mainline updates.
- Remove unnecessary checks and general code cleanup.

Notes:

Task migration support needs some extra work to reduce its invasiveness,
especially patches 24-25. Patches 27-28 are completely optional and are not
meant to be included in the final patchset: they just add some invasive BUG_ONs
that assert some preconditions expected on some function calls.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Testing v4:

We are still using the tests published in version 3 for the evaluation of the
patchset (refer to the v3 cover letter for more details). For HCBS v4 the so
called "Taskset" tests have been updated to use rt-app as a runner, while the
tasksets + configurations themselves are now generated using a completely new
open source tool: EVA-rt-Engine (https://github.com/Yurand2000/EVA-rt-Engine).

The tests are available at https://github.com/Yurand2000/HCBS-Test-Suite . Refer
to the README of the repository for more details.

Follow these steps to test HCBS v4:
- Get the HCBS patch up and running. Any kernel/disto should work effortlessly.
- Get, compile and _install_ the tests.
- Run the `go_rt.sh` script to set the frequency of the CPUs to a fixed value
  and disable hyperthreading and power saving features.
- Run the `run_tests.sh full` script, to run the whole test suite.

Notes:
While you may have rt-app installed in your system, the testing suite comes with
its own rt-app version bundled.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Future Work:

While we wait for more comments, and expect stuff to break, we will work on
completing the currently partial/untested, implementation of HCBS with different
runtimes per CPU, instead of having the same runtime allocated on all CPUs, to
include it in a future RCF.

Future patches:
 - HCBS with different runtimes per CPU.
 - capacity aware bandwidth reservation.
 - enable/disable dl_servers when a CPU goes online/offline.

Have a nice day,
Yuri

v1: https://lore.kernel.org/all/20250605071412.139240-1-yurand2000@gmail.com/
v2: https://lore.kernel.org/all/20250731105543.40832-1-yurand2000@gmail.com/
v3: https://lore.kernel.org/all/20250929092221.10947-1-yurand2000@gmail.com/

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Yuri Andriaccio (11):
  sched/rt: Disable RT_GROUP_SCHED
  sched/rt: Remove rq field in struct rt_rq
  sched/rt: Implement dl-server operations for rt-cgroups.
  sched/rt: Update task event callbacks for HCBS scheduling
  sched/rt: Allow zeroing the runtime of the root control group
  sched/rt: Remove support for cgroups-v1
  sched/deadline: Introduce dl_server_try_pull_f
  sched/core: Execute enqueued balance callbacks when migrating task
    betweeen cgroups
  Documentation: Update documentation for real-time cgroups
  [DEBUG] sched/rt: Add debug BUG_ONs for pre-migration code
  [DEBUG] sched/rt: Add debug BUG_ONs in migration code.

luca abeni (17):
  sched/deadline: Do not access dl_se->rq directly
  sched/deadline: Distinct between dl_rq and my_q
  sched/rt: Pass an rt_rq instead of an rq where needed
  sched/rt: Move some functions from rt.c to sched.h
  sched/rt: Introduce HCBS specific structs in task_group
  sched/core: Initialize HCBS specific structures.
  sched/deadline: Add dl_init_tg
  sched/rt: Add {alloc/free}_rt_sched_group
  sched/deadline: Account rt-cgroups bandwidth in deadline tasks
    schedulability tests.
  sched/rt: Update rt-cgroup schedulability checks
  sched/rt: Remove old RT_GROUP_SCHED data structures
  sched/core: Cgroup v2 support
  sched/deadline: Allow deeper hierarchies of RT cgroups
  sched/rt: Add rt-cgroup migration
  sched/rt: Add HCBS migration related checks and function calls
  sched/deadline: Fix HCBS migrations on server stop
  sched/core: Execute enqueued balance callbacks when changing allowed
    CPUs

 Documentation/scheduler/sched-rt-group.rst |  500 +++-
 include/linux/cleanup.h                    |    3 +
 include/linux/sched.h                      |   13 +-
 kernel/sched/autogroup.c                   |    4 +-
 kernel/sched/core.c                        |   63 +-
 kernel/sched/deadline.c                    |  257 +-
 kernel/sched/debug.c                       |    6 -
 kernel/sched/fair.c                        |   10 +-
 kernel/sched/rt.c                          | 3097 ++++++++++----------
 kernel/sched/sched.h                       |  176 +-
 kernel/sched/syscalls.c                    |    6 +-
 11 files changed, 2346 insertions(+), 1789 deletions(-)


base-commit: 6a23ae0a96a600d1d12557add110e0bb6e32730c
--
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ