linux-kernel - [RFC]vcpu scattering make impove VM performance, it worth to realize?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <EC9759BC1E3E98429B5DE9A03DF86D8B59D7774D@DGGEML503-MBX.china.huawei.com>
Date:   Wed, 25 Oct 2017 09:29:30 +0000
From:   Zhuangyanying <ann.zhuangyanying@...wei.com>
To:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Liuxiaojian (alex)" <liuxiaojian6@...wei.com>
CC:     Xiexiangyou <xiexiangyou@...wei.com>,
        "Gonglei (Arei)" <arei.gonglei@...wei.com>,
        Zhanghailiang <zhang.zhanghailiang@...wei.com>,
        "zhouchengming (A)" <zhouchengming1@...wei.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>
Subject: [RFC]vcpu scattering make impove VM performance, it worth to
 realize?

recently, We tested the performance of VM(Guest) and CPU QoS, 
while the number of vCPUs are overcommit (More than the pCPU of hosts),
we found that if we place the vCPUs of the same VM to different pCPUs（SCATTEREING）, 
The performance of VM will be improved, and the effect of CPU QoS is also improved. 
After a little deeper dig, we found that at the overcommit scene, The vCPUs of the same VM may 
be placed to the same pCPU, because CFS could not reconginze a normal thread and a vcpu thread, 
this may bring some problems. 

1)Intel MKL linpack
the board(40U), 4VMs(30U, pressed with linpack sigle-mode)
                     vm0(Gflops)         vm1          vm2          vm3
SCATTEREING:         3.46755e+01  3.25317e+01   3.31906e+01   3.80649e+01
close SCATTEREING:   3.19633e+01  3.15148e+01   3.23613e+01   3.56430e+01
SCATTEREING:         3.33773e+01  3.30371e+01   3.55295e+01   3.78067e+01

test environment：linux-4.13.9(Host)+qemu-2.10.0+centos7.2(Guest), Intel(R) Xeon(R) CPU E5-2658 v2 @ 2.40GHz

2) When the CPU share and CPU limit conflict, the board's calculate ability can not be fully exploited
the board(40U), 3vms(40U, pressed with 40 yes inside) , share is 1024: 2048: 4096. Total CPU usage of the 3vms close to 4000%.
The vm, whose cpu share is set to 4096, has its Calculate ability set to more than 20U, while its upper limit set to 8U through quota. 
We find that the Total CPU usage of the 3vms is 
default, before scattering: less than 3900%.
After SCATTEREING: up to 4000%.

3) Improve the fluctuation range of vm shares
the board(40U), 3vms(40U, press with 40 yes inside) , share is 1024: 1024: 1024.
we observed the utmost fluctuation range of vm CPU is:
default, before scattering: 10%
After SCATTEREING: 3%

4) In scene 2), further more, we start a vm(20U, centos6.5), and run test: ./hackbench 200 process 4000, 
we get the time cost value as follow:
default, before scattering: 115s
After SCATTEREING: 95s

Whether it should be offer an sched strategy or not: do not affect the load balance, while supporting vcpu scattering?
Would this lead to some undesirable consequences?

Regards,
Zhuangyanying