linux-kernel - Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190415165937.GA26890@sinkpad>
Date:   Mon, 15 Apr 2019 12:59:37 -0400
From:   Julien Desfossez <jdesfossez@...italocean.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Aaron Lu <aaron.lu@...ux.alibaba.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Aubrey Li <aubrey.intel@...il.com>
Subject: Re: [RFC][PATCH 13/16] sched: Add core wide task selection and
 scheduling.

On 10-Apr-2019 10:06:30 AM, Peter Zijlstra wrote:
> while you're all having fun playing with this, I've not yet had answers
> to the important questions of how L1TF complete we want to be and if all
> this crud actually matters one way or the other.
> 
> Also, I still don't see this stuff working for high context switch rate
> workloads, and that is exactly what some people were aiming for..

We have been running scaling tests on highly loaded systems (with all
the fixes and suggestions applied) and here are the results.

On a system with 2x6 cores (12 hardware threads per NUMA node), with one
12-vcpus-32gb VM per NUMA node running a CPU-intensive workload
(linpack):
- Baseline: 864 gflops
- Core scheduling: 864 gflops
- nosmt (switch to 6 hardware threads per node): 298 gflops (-65%)

In this test, the VMs are basically alone on their own NUMA node, so
they are only competing with themselves, so for the next test we moved
the 2 VMs to the same node:
- Baseline: 340 gflops, about 586k context switches/sec
- Core scheduling: 322 gflops (-5%), about 575k context switches/sec
- nosmt: 146 gflops (-57%), about 284k context switches/sec

In terms of isolation, CPU-intensive VMs share their core with a
"foreign process" (not tagged or tagged with a different tag) less than
2% of the time (sum of the time spent with a lot of different
processes). For reference, this could add up to 60% without core
scheduling and smt on. We are working on identifying the various cases
where there is unwanted co-scheduling so we can address those.

With a more heterogeneous benchmark (MySQL benchmark with a remote
client, 1 12-vcpus MySQL VM on each NUMA node), we don’t measure any
performance degradation when there is more hardware threads available
than vcpus (same with nosmt), but when we add noise VMs (sleep(15);
collect metrics; send them over a VPN; repeat) with an overcommit ratio
of 3 vcpus to 1 hardware thread, core scheduling can have up to 25%
performance degradation, whereas nosmt has 15% impact.

So the performance impact varies depending on the type of workload, but
since the CPU-intensive workloads are the ones most impacted when we
disable SMT, this is very encouraging and is a worthwhile effort.

Thanks,

Julien