linux-kernel - Re: [RFC PATCH v2 00/17] Core scheduling v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190425143644.GA13531@sinkpad>
Date:   Thu, 25 Apr 2019 10:36:44 -0400
From:   Julien Desfossez <jdesfossez@...italocean.com>
To:     Vineeth Remanan Pillai <vpillai@...italocean.com>
Cc:     Nishanth Aravamudan <naravamudan@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, subhra.mazumdar@...cle.com,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2

On 23-Apr-2019 04:18:05 PM, Vineeth Remanan Pillai wrote:
> Second iteration of the core-scheduling feature.
> 
> This version fixes apparent bugs and performance issues in v1. This
> doesn't fully address the issue of core sharing between processes
> with different tags. Core sharing still happens 1% to 5% of the time
> based on the nature of workload and timing of the runnable processes.
> 
> Changes in v2
> -------------
> - rebased on mainline commit: 6d906f99817951e2257d577656899da02bb33105

Here are our benchmark results.

Environment setup:
------------------
Skylake server, 2 numa nodes, total 72 CPUs with HT on
Workload in KVM virtual machines, one cpu cgroup per VM (including qemu
and vhost threads)


Case 1: MySQL TPC-C
-------------------
1 12-vcpus-32gb MySQL server per numa node (clients on another physical
machine)
96 semi-idle 1-vcpu-512mb VM per numa node (sending metrics over a VPN
every 15 seconds)
--> 3 vcpus per physical CPU
Average of 10 5-minutes runs.

- baseline:
  - avg tps: 1878
  - stdev tps: 47
- nosmt:
  - avg tps: 959 (-49% from baseline)
  - stdev tps: 35
- core scheduling:
  - avg tps: 1406 (-25% from baseline)
  - stdev tps: 48
  - Co-scheduling stats (5 minutes sample):
    - 48.9% VM threads
    - 49.6% idle
    - 1.3% foreign threads

So in the v2, the case with a very noisy test, benefits from core
scheduling (the baseline is also better compared to v1 so we probably
benefit from other changes in the kernel).


Case 2: linpack with enough room
--------------------------------
2 12-vcpus-32gb linpack VMs both pinned on the same NUMA node (36
hardware threads with SMT on).
100k context switches/sec.
Average of 5 15-minutes runs.

- baseline:
  - avg gflops: 403
  - stdev: 20
- nosmt:
  - avg gflops: 355 (-12% from baseline)
  - stdev: 28
- core scheduling:
  - avg gflops: 364 (-9% from baseline)
  - stdev: 59
  - Co-scheduling stats (5 minutes sample):
    - 39.3% VM threads
    - 59.3% idle
    - 0.07% foreign threads

No real difference between nosmt and core scheduling when there is
enough room to run a cpu-intensive workload even with smt off.


Case 3: full node linpack
-------------------------
3 12-vcpus-32gb linpack VMs all pinned on the same NUMA node (36
hardware threads with SMT on).
155k context switches/sec
Average of 5 15-minutes runs.

- baseline:
  - avg gflops: 270
  - stdev: 5
- nosmt (switching to 2:1 ratio of vcpu to hardware threads):
  - avg gflops: 209 (-22.46% from baseline)
  - stdev: 6.2
- core scheduling
  - avg gflops: 269 (-0.11% from baseline)
  - stdev: 5.7
  - Co-scheduling stats (5 minutes sample):
    - 93.7% VM threads
    - 6.3% idle
    - 0.04% foreign threads

Here the core scheduling is a major improvement in terms of performance
compared to nosmt.

Julien