lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250605071412.139240-1-yurand2000@gmail.com>
Date: Thu,  5 Jun 2025 09:14:03 +0200
From: Yuri Andriaccio <yurand2000@...il.com>
To: Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Cc: linux-kernel@...r.kernel.org,
	Luca Abeni <luca.abeni@...tannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@...tannapisa.it>
Subject: [RFC PATCH 0/9] Hierarchical Constant Bandwidth Server

Hi,

This is the first set of patches that implements Hierarchical RT scheduling,
aimed at replacing the current RT_GROUP_SCHED implementation with something more
robust and theoretically sound. The patchset has been presented at OSPM25
(https://retis.sssup.it/ospm-summit/), and a summary of its inner workings can
be found at https://lwn.net/Articles/1021332/ .

Summary of the patches:
 1-4) Preparation patches, so that the RT classes' code can be used both
      for normal and cgroup scheduling.
 5) Basic HCBS, no migration and only one level hierarchy.
 6) Remove old RT_GROUP_SCHED code.
 7) Add support for cgroup v2
 8) Remove support for cgroup v1
 9) HCBS with deeper hierarchies.

The patchset allows to create bandwidth reservations for cgroups which run
SCHED_FIFO/SCHED_RR tasks. Whenever a cgroup is created, N cgroup's local
runqueues and N dl_servers are allocated, one for each CPU.  The local runqueues
emulate standard scheduling for the FIFO/RR classes, as rt.c code is reused on
these local runqueues without excessive modifications. Through the cgroup's
virtual files it is possible to setup the cgroup's reservation. The dl_servers
are started only when there are active tasks, and invoke the RT clasess'
scheduler when they are deemed runnable.

Example usage (cgroups v2):
  // create the cgroup
  mkdir /sys/fs/cgroup/g0

  // request a 10/100ms reservation
  echo 100000 > /sys/fs/cgroup/g0/cpu.rt_period_us
  echo 10000 > /sys/fs/cgroup/g0/cpu.rt_runtime_us

  // move any process in the cgroup
  echo $PID > /sys/fs/cgroup/g0/cgroup.procs

  // if not already an RT process
  // set scheduling class to FIFO or RR
  chrt [-r] -p $PRIORITY $PID


Testing:

The HCBS mechanism has been evaluated on several synthetic tests with RT groups,
containing RT tasks with different priorities and the groups bandwidths are
limited as expected.

The tests can be found at https://github.com/Yurand2000/HCBS-rust-initrd . They
are written in C and Rust and should support any distro without issues, as they
are statically compiled. These executables test both functional features and
timing guarantees of the HCBS scheduler, but most are tailored to work with
future patches which introduce task migration between CPUs.

The Makefile will compile the test suite to a fully functional initramfs, ready
to use in qemu, but of course the tests' executables can also be loaded in a
fully fledged distro to run tests there. Further comments on the test suite will
follow in a future RFC.


Additional comments:

1) As pointed out by Peter Zijlstra at OSPM25, we removed support for cgroups v1
(patch 0008), but the mechanism should work also with cgroups v1 if the control
files (i.e. cpu.rt_period_us and cpu.rt_runtime_us) are re-enabled in the legacy
controller.

2) In response to
https://lwn.net/ml/all/20250310170442.504716-1-mkoutny@suse.com/, the new
RT_GROUP_SCHED mechanism solves the issues which are discussed there:
- RT group scheduling for cgroup v2
- RT tasks can be created in the root control group without having to touch 
  any of the hierarchy or reservations altogheter.


Future work:

As we already mentioned at OSPM25, we already have (partially) complete patches
for task migration among CPUs and assignment of different runtimes for each
individual CPU. The aim of this RFC is also to validate the current foundations
before completing and submitting the patchset.

Future patches:
 - HCBS with task migration.
 - HCBS with different runtimes per CPU.
 - capacity aware bandwidth reservation.
 - enable/disable dl_servers when a CPU goes online/offline.

Have a nice day,
Yuri

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Yuri Andriaccio (1):
  sched/rt: Remove support for cgroups-v1

luca abeni (8):
  sched/deadline: Do not access dl_se->rq directly
  sched/deadline: Make a distinction between dl_rq and my_q
  sched/rt: Pass an rt_rq instead of an rq where needed
  sched/rt: Move some inline functions from rt.c to sched.h
  sched/deadline: Hierarchical scheduling with DL on top of RT
  sched/rt: Remove unused code
  sched/core: Cgroup v2 support
  sched/deadline: Allow deeper hierarchies of RT cgroups

 include/linux/sched.h    |   10 +-
 kernel/sched/autogroup.c |    4 +-
 kernel/sched/core.c      |   44 +-
 kernel/sched/deadline.c  |  256 ++++++--
 kernel/sched/debug.c     |    6 -
 kernel/sched/fair.c      |    6 +-
 kernel/sched/rt.c        | 1266 +++++++++++---------------------------
 kernel/sched/sched.h     |  133 ++--
 kernel/sched/syscalls.c  |    8 +-
 9 files changed, 672 insertions(+), 1061 deletions(-)

-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ