lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Jun 2019 16:49:03 -0400
From:   Rik van Riel <riel@...riel.com>
To:     linux-kernel@...r.kernel.org
Cc:     kernel-team@...com, pjt@...gle.com, dietmar.eggemann@....com,
        peterz@...radead.org, mingo@...hat.com, morten.rasmussen@....com,
        tglx@...utronix.de, mgorman@...hsingularity.net,
        vincent.guittot@...aro.org
Subject: [PATCH RFC v2 0/10] sched,fair: flatten CPU controller runqueues

The current implementation of the CPU controller uses hierarchical
runqueues, where on wakeup a task is enqueued on its group's runqueue,
the group is enqueued on the runqueue of the group above it, etc.

This increases a fairly large amount of overhead for workloads that
do a lot of wakeups a second, especially given that the default systemd
hierarchy is 2 or 3 levels deep.

This patch series is an attempt at reducing that overhead, by placing
all the tasks on the same runqueue, and scaling the task priority by
the priority of the group, which is calculated periodically.

This patch series still has a number of TODO items:
- Clean up the code, and fix compilation without CONFIG_FAIR_GROUP_SCHED.
- Remove some more now unused code.
- Figure out a performance regression with our web server workload.
  I have fixed the schbench issue, now I need to find the less obvious
  stuff causing an increased number of involuntary preemptions...
- Reimplement CONFIG_CFS_BANDWIDTH.

Plan for the CONFIG_CFS_BANDWIDTH reimplementation:
- When a cgroup gets throttled, mark the cgroup and its children
  as throttled.
- When pick_next_entity finds a task that is on a throttled cgroup,
  stash it on the cgroup runqueue (which is not used for runnable
  tasks any more). Leave the vruntime unchanged, and adjust that
  runqueue's vruntime to be that of the left-most task.
- When a cgroup gets unthrottled, and has tasks on it, place it on
  a vruntime ordered heap separate from the main runqueue.
- Have pick_next_task_fair grab one task off that heap every time it
  is called, and the min vruntime of that heap is lower than the
  vruntime of the CPU's cfs_rq (or the CPU has no other runnable tasks).
- Place that selected task on the CPU's cfs_rq, renormalizing its
  vruntime with the GENTLE_FAIR_SLEEPERS logic. That should help
  interleave the already runnable tasks with the recently unthrottled
  group, and prevent thundering herd issues.
- If the group gets throttled again before all of its task had a chance
  to run, vruntime sorting ensures all the tasks in the throttled cgroup
  get a chance to run over time.

Changes from v1:
- use task_se_h_weight instead of task_se_h_load in calc_delta_fair
  and sched_slice, this seems to improve performance a little, but
  I still have some remaining regression to chase with our web server
  workload
- implement a number of the changes suggested by Dietmar Eggemann
  (still holding out for a better name for group_cfs_rq_of_parent)

This series applies on top of 5.2-rc6.

 include/linux/sched.h |    6 
 kernel/sched/core.c   |    2 
 kernel/sched/debug.c  |   15 -
 kernel/sched/fair.c   |  746 +++++++++++++++++++++-----------------------------
 kernel/sched/pelt.c   |   53 +--
 kernel/sched/pelt.h   |    2 
 kernel/sched/sched.h  |   10 
 7 files changed, 346 insertions(+), 488 deletions(-)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ