lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 31 Oct 2019 19:42:53 +0800
From:   "Li, Aubrey" <aubrey.li@...ux.intel.com>
To:     Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>, mingo@...nel.org,
        tglx@...utronix.de, pjt@...gle.com, torvalds@...ux-foundation.org
Cc:     linux-kernel@...r.kernel.org, Dario Faggioli <dfaggioli@...e.com>,
        fweisbec@...il.com, keescook@...omium.org, kerrnel@...gle.com,
        Phil Auld <pauld@...hat.com>, Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4

On 2019/10/31 2:33, Vineeth Remanan Pillai wrote:
> Fourth iteration of the Core-Scheduling feature.
> 
> This version was aiming mostly at addressing the vruntime comparison
> issues with v3. The main issue seen in v3 was the starvation of
> interactive tasks when competing with cpu intensive tasks. This issue
> is mitigated to a large extent.
> 
> We have tested and verified that incompatible processes are not
> selected during schedule. In terms of performance, the impact
> depends on the workload:
> - on CPU intensive applications that use all the logical CPUs with
>   SMT enabled, enabling core scheduling performs better than nosmt.
> - on mixed workloads with considerable io compared to cpu usage,
>   nosmt seems to perform better than core scheduling.
> 
> v4 is rebased on top of 5.3.5(dc073f193b70):
> https://github.com/digitalocean/linux-coresched/tree/coresched/v4-v5.3.5

Thanks to post V4 out. Refresh the data on my side. Since we have played
with Aaron's core vruntime patch for a while, no surprise in the result.

Thanks,
-Aubrey

Environment setup
--------------------------
Skylake 8170 server, 2 NUMA nodes, 52 cores, 104 CPUs (HT on)

Case 1:
-------
cgroup1 workload, sysbench CPU mode (non AVX workload)
cgroup2 workload, gemmbench (AVX512 workload)

sysbench throughput result:
.--------------------------------------------------------------------------------------------------------------------------------------.
|NA/AVX	vanilla-SMT	[std% / sem%]	  cpu% |coresched-SMT	[std% / sem%]	  +/-	  cpu% |  no-SMT [std% / sem%]	 +/-	  cpu% |
|--------------------------------------------------------------------------------------------------------------------------------------|
|  1/1	     1269.1	[ 0.1%/ 0.0%]	  1.9% |       1272.4	[ 0.1%/ 0.0%]	  0.3%	  1.9% |  1272.0 [ 0.1%/ 0.0%]   0.2%	  3.9% |
|  2/2	     2466.9	[ 0.6%/ 0.1%]	  3.9% |       2534.2	[ 0.6%/ 0.1%]	  2.7%	  3.8% |  2511.9 [ 0.2%/ 0.0%]   1.8%	  7.7% |
|  4/4	     4725.2	[ 0.3%/ 0.0%]	  7.7% |       4806.3	[ 0.2%/ 0.0%]	  1.7%	  7.7% |  4786.7 [ 0.9%/ 0.1%]   1.3%	 14.6% |
|  8/8	     9353.4	[ 0.1%/ 0.0%]	 14.6% |       9357.4	[ 0.1%/ 0.0%]	  0.0%	 14.6% |  9352.3 [ 0.1%/ 0.0%]  -0.0%	 30.0% |
| 16/16	    17543.1	[ 1.0%/ 0.1%]	 30.1% |      18120.7	[ 0.2%/ 0.0%]	  3.3%	 30.1% | 17864.8 [ 1.2%/ 0.1%]   1.8%	 60.1% |
| 32/32	    26968.8	[ 3.9%/ 0.4%]	 60.1% |      29448.9	[ 3.5%/ 0.3%]	  9.2%	 59.9% | 25308.1 [10.7%/ 0.9%]  -6.2%	 97.7% |
| 48/48	    30466.2	[10.4%/ 1.0%]	 89.3% |      38624.4	[ 4.2%/ 0.4%]	 26.8%	 89.1% | 26891.2 [14.8%/ 1.0%] -11.7%	 99.5% |
| 64/64	    37909.3	[11.1%/ 1.1%]	 97.7% |      41671.7	[ 8.7%/ 0.9%]	  9.9%	 97.6% | 25898.3 [16.2%/ 1.0%] -31.7%	100.0% |
|128/128    39479.4	[24.6%/ 2.5%]	100.0% |      42119.6	[ 6.3%/ 0.6%]	  6.7%	 99.5% | 26830.1 [16.5%/ 1.1%] -32.0%	100.0% |
|256/256    42602.1	[16.4%/ 1.6%]	100.0% |      40041.3	[ 7.0%/ 0.7%]	 -6.0%	 99.7% | 27634.7 [15.4%/ 1.1%] -35.1%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'

Case 2
------
cgroup1 workload, sysbench MySQL (non AVX workload)
cgroup2 workload, gemmbench (AVX512 workload)

sysbench throughput result:
.--------------------------------------------------------------------------------------------------------------------------------------.
|NA/AVX	vanilla-SMT	[std% / sem%]	  cpu% |coresched-SMT	[std% / sem%]	  +/-	  cpu% |  no-SMT [std% / sem%]	 +/-	  cpu% |
|--------------------------------------------------------------------------------------------------------------------------------------|
|  1/1	     1018.2	[ 1.0%/ 0.1%]	  1.9% |        915.8	[ 0.9%/ 0.1%]	-10.1%	  1.9% |   994.0 [ 1.4%/ 0.2%]  -2.4%	  3.9% |
|  2/2	     1941.2	[ 0.7%/ 0.1%]	  3.9% |       1746.0	[ 0.5%/ 0.1%]	-10.1%	  3.9% |  1946.2 [ 0.8%/ 0.1%]   0.3%	  7.8% |
|  4/4	     3763.9	[ 0.5%/ 0.0%]	  7.8% |       3426.0	[ 1.5%/ 0.2%]	 -9.0%	  7.8% |  3745.1 [ 1.1%/ 0.1%]  -0.5%	 15.6% |
|  8/8	     7375.5	[ 1.3%/ 0.1%]	 15.5% |       6647.1	[ 1.1%/ 0.1%]	 -9.9%	 16.1% |  7368.4 [ 0.8%/ 0.1%]  -0.1%	 31.1% |
| 16/16	    12990.3	[ 0.6%/ 0.1%]	 31.1% |      10903.7	[ 1.9%/ 0.2%]	-16.1%	 32.0% | 12082.6 [ 4.7%/ 0.5%]  -7.0%	 62.9% |
| 32/32	    18238.1	[ 6.1%/ 0.6%]	 62.1% |      16580.8	[ 3.0%/ 0.3%]	 -9.1%	 62.8% | 21193.6 [ 4.9%/ 0.6%]  16.2%	 97.8% |
| 48/48	    21708.6	[ 8.3%/ 0.8%]	 90.3% |      17064.1	[ 9.5%/ 0.9%]	-21.4%	 90.4% | 18531.4 [16.6%/ 1.8%] -14.6%	 99.5% |
| 64/64	    18636.9	[13.1%/ 1.3%]	 97.9% |      12376.1	[20.9%/ 2.1%]	-33.6%	 96.8% | 20025.8 [14.9%/ 2.4%]   7.5%	100.0% |
|128/128    16204.2	[16.8%/ 1.7%]	 99.4% |       3776.1	[88.7%/ 8.9%]	-76.7%	 97.6% | 20263.5 [12.7%/ 6.8%]  25.1%	100.0% |
|256/256    16730.5	[17.9%/ 1.8%]	 98.9% |       1499.7	[210.3%/21.0%]	-91.0%	 98.4% | 17633.1 [ 7.5%/ 8.9%]   5.4%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'

And for this case, we care about sysbench MySQL latency(ms):
.--------------------------------------------------------------------------------------------------------------------------------------.
|NA/AVX	vanilla-SMT	[std% / sem%]	  cpu% |coresched-SMT	[std% / sem%]	  +/-	  cpu% |  no-SMT [std% / sem%]	 +/-	  cpu% |
|--------------------------------------------------------------------------------------------------------------------------------------|
|  1/1	        1.1	[ 3.7%/ 0.4%]	  1.9% |          1.1	[ 1.0%/ 0.1%]	 -8.9%	  1.9% |     1.1 [ 4.1%/ 0.4%]  -2.0%	  3.9% |
|  2/2	        1.1	[ 0.7%/ 0.1%]	  3.9% |          1.2	[ 0.8%/ 0.1%]	-10.8%	  3.9% |     1.1 [ 0.8%/ 0.1%]   0.2%	  7.8% |
|  4/4	        1.1	[ 0.7%/ 0.1%]	  7.8% |          1.3	[ 3.8%/ 0.4%]	-11.8%	  7.8% |     1.2 [ 2.2%/ 0.2%]  -1.1%	 15.6% |
|  8/8	        1.2	[ 2.2%/ 0.2%]	 15.5% |          1.3	[ 3.0%/ 0.3%]	-11.7%	 16.1% |     1.2 [ 1.8%/ 0.2%]   0.4%	 31.1% |
| 16/16	        1.4	[ 1.5%/ 0.1%]	 31.1% |          2.0	[ 8.2%/ 0.8%]	-45.8%	 32.0% |     1.9 [18.2%/ 1.7%] -33.2%	 62.9% |
| 32/32	        2.4	[ 6.6%/ 0.7%]	 62.1% |          2.6	[ 3.1%/ 0.3%]	 -6.2%	 62.8% |     2.2 [23.5%/ 2.0%]   8.5%	 97.8% |
| 48/48	        2.7	[ 5.3%/ 0.5%]	 90.3% |          3.4	[ 3.5%/ 0.4%]	-26.1%	 90.4% |     6.2 [19.3%/ 3.5%] -128.0%	 99.5% |
| 64/64	        5.9	[13.0%/ 1.3%]	 97.9% |          8.3	[ 9.8%/ 1.0%]	-40.1%	 96.8% |     7.4 [16.6%/ 1.5%] -25.1%	100.0% |
|128/128       17.4	[46.8%/ 4.7%]	 99.4% |        248.0	[146.9%/14.7%]	-1327.8% 97.6% |    11.0 [10.5%/ 0.0%]  36.7%	100.0% |
|256/256       33.5	[67.1%/ 6.7%]	 98.9% |       1279.5	[245.6%/24.6%]	-3716.6% 98.4% |    21.5 [21.5%/ 0.0%]  36.0%	100.0% |
'--------------------------------------------------------------------------------------------------------------------------------------'

Note:
----
64/64:		64 sysbench threads(in one cgroup) and 64 gemmbench threads(in other cgroup) run simultaneously.
Vanilla-SMT:	baseline with HT on
coresched-SMT:	core scheduling enabled
no-SMT:		HT off thru /sys/devices/system/cpu/smt/control
std%:		standard deviation
sem%:		standard error of the mean
±:		improvement/regression against baseline
cpu%:		derived by vmstat.idle and vmstat.iowait

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ