[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1583332764.git.vpillai@digitalocean.com>
Date: Wed, 4 Mar 2020 16:59:50 +0000
From: vpillai <vpillai@...italocean.com>
To: Nishanth Aravamudan <naravamudan@...italocean.com>,
Julien Desfossez <jdesfossez@...italocean.com>,
Peter Zijlstra <peterz@...radead.org>,
Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org
Cc: vpillai <vpillai@...italocean.com>, linux-kernel@...r.kernel.org,
fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
Aubrey Li <aubrey.intel@...il.com>, aubrey.li@...ux.intel.com,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Joel Fernandes <joelaf@...gle.com>, joel@...lfernandes.org
Subject: [RFC PATCH 00/13] Core scheduling v5
Fifth iteration of the Core-Scheduling feature.
Core scheduling is a feature that only allows trusted tasks to run
concurrently on cpus sharing compute resources(eg: hyperthreads on a
core). The goal is to mitigate the core-level side-channel attacks
without requiring to disable SMT (which has a significant impact on
performance in some situations). So far, the feature mitigates user-space
to user-space attacks but not user-space to kernel attack, when one of
the hardware thread enters the kernel (syscall, interrupt etc).
By default, the feature doesn't change any of the current scheduler
behavior. The user decides which tasks can run simultaneously on the
same core (for now by having them in the same tagged cgroup). When
a tag is enabled in a cgroup and a task from that cgroup is running
on a hardware thread, the scheduler ensures that only idle or trusted
tasks run on the other sibling(s). Besides security concerns, this
feature can also be beneficial for RT and performance applications
where we want to control how tasks make use of SMT dynamically.
This version was focusing on performance and stability. Couple of
crashes related to task tagging and cpu hotplug path were fixed.
This version also improves the performance considerably by making
task migration and load balancing coresched aware.
In terms of performance, the major difference since the last iteration
is that now even IO-heavy and mixed-resources workloads are less
impacted by core-scheduling than by disabling SMT. Both host-level and
VM-level benchmarks were performed. Details in:
https://lkml.org/lkml/2020/2/12/1194
https://lkml.org/lkml/2019/11/1/269
v5 is rebased on top of 5.5.5(449718782a46)
https://github.com/digitalocean/linux-coresched/tree/coresched/v5-v5.5.y
Changes in v5
-------------
- Fixes for cgroup/process tagging during corner cases like cgroup
destroy, task moving across cgroups etc
- Tim Chen
- Coresched aware task migrations
- Aubrey Li
- Other minor stability fixes.
Changes in v4
-------------
- Implement a core wide min_vruntime for vruntime comparison of tasks
across cpus in a core.
- Aaron Lu
- Fixes a typo bug in setting the forced_idle cpu.
- Aaron Lu
Changes in v3
-------------
- Fixes the issue of sibling picking up an incompatible task
- Aaron Lu
- Vineeth Pillai
- Julien Desfossez
- Fixes the issue of starving threads due to forced idle
- Peter Zijlstra
- Fixes the refcounting issue when deleting a cgroup with tag
- Julien Desfossez
- Fixes a crash during cpu offline/online with coresched enabled
- Vineeth Pillai
- Fixes a comparison logic issue in sched_core_find
- Aaron Lu
Changes in v2
-------------
- Fixes for couple of NULL pointer dereference crashes
- Subhra Mazumdar
- Tim Chen
- Improves priority comparison logic for process in different cpus
- Peter Zijlstra
- Aaron Lu
- Fixes a hard lockup in rq locking
- Vineeth Pillai
- Julien Desfossez
- Fixes a performance issue seen on IO heavy workloads
- Vineeth Pillai
- Julien Desfossez
- Fix for 32bit build
- Aubrey Li
ISSUES
------
- Aaron(Intel) found an issue with load balancing when the tasks have
different weights(nice or cgroup shares). Task weight is not considered
in coresched aware load balancing and causes those higher weights task
to starve.
- Joel(ChromeOS) found an issue where RT task may be preempted by a
lower class task.
- Joel(ChromeOS) found a deadlock and crash on PREEMPT kernel in the
coreshed idle balance logic
TODO
----
- Work on merging patches that are ready to be merged
- Decide on the API for exposing the feature to userland
- Experiment with adding synchronization points in VMEXIT to mitigate
the VM-to-host-kernel leaking
- Investigate the source of the overhead even when no tasks are tagged:
https://lkml.org/lkml/2019/10/29/242
---
Aaron Lu (2):
sched/fair: wrapper for cfs_rq->min_vruntime
sched/fair: core wide vruntime comparison
Aubrey Li (1):
sched: migration changes for core scheduling
Peter Zijlstra (9):
sched: Wrap rq::lock access
sched: Introduce sched_class::pick_task()
sched: Core-wide rq->lock
sched/fair: Add a few assertions
sched: Basic tracking of matching tasks
sched: Add core wide task selection and scheduling.
sched: Trivial forced-newidle balancer
sched: cgroup tagging interface for core scheduling
sched: Debug bits...
Tim Chen (1):
sched: Update core scheduler queue when taking cpu online/offline
include/linux/sched.h | 9 +-
kernel/Kconfig.preempt | 6 +
kernel/sched/core.c | 1037 +++++++++++++++++++++++++++++++++++++-
kernel/sched/cpuacct.c | 12 +-
kernel/sched/deadline.c | 69 ++-
kernel/sched/debug.c | 4 +-
kernel/sched/fair.c | 387 +++++++++++---
kernel/sched/idle.c | 11 +-
kernel/sched/pelt.h | 2 +-
kernel/sched/rt.c | 65 ++-
kernel/sched/sched.h | 248 +++++++--
kernel/sched/stop_task.c | 13 +-
kernel/sched/topology.c | 4 +-
13 files changed, 1672 insertions(+), 195 deletions(-)
--
2.17.1
Powered by blists - more mailing lists