[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <598D0BB0.2040901@huawei.com>
Date: Fri, 11 Aug 2017 09:43:12 +0800
From: "Longpeng (Mike)" <longpeng2@...wei.com>
To: Eric Farman <farman@...ux.vnet.ibm.com>
CC: Cornelia Huck <cohuck@...hat.com>, <pbonzini@...hat.com>,
<rkrcmar@...hat.com>, <agraf@...e.com>, <borntraeger@...ibm.com>,
<christoffer.dall@...aro.org>, <marc.zyngier@....com>,
<james.hogan@...tec.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <weidong.huang@...wei.com>,
<arei.gonglei@...wei.com>, <wangxinxin.wang@...wei.com>,
<longpeng.mike@...il.com>, <david@...hat.com>
Subject: Re: [PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin
On 2017/8/10 21:18, Eric Farman wrote:
>
>
> On 08/08/2017 04:14 AM, Longpeng (Mike) wrote:
>>
>>
>> On 2017/8/8 15:41, Cornelia Huck wrote:
>>
>>> On Tue, 8 Aug 2017 12:05:31 +0800
>>> "Longpeng(Mike)" <longpeng2@...wei.com> wrote:
>>>
>>>> This is a simple optimization for kvm_vcpu_on_spin, the
>>>> main idea is described in patch-1's commit msg.
>>>
>>> I think this generally looks good now.
>>>
>>>>
>>>> I did some tests base on the RFC version, the result shows
>>>> that it can improves the performance slightly.
>>>
>>> Did you re-run tests on this version?
>>
>>
>> Hi Cornelia,
>>
>> I didn't re-run tests on V2. But the major difference between RFC and V2
>> is that V2 only cache result for X86 (s390/arm needn't) and V2 saves a
>> expensive operation ( 440-1400 cycles on my test machine ) for X86/VMX.
>>
>> So I think V2's performance is at least the same as RFC or even slightly
>> better. :)
>>
>>>
>>> I would also like to see some s390 numbers; unfortunately I only have a
>>> z/VM environment and any performance numbers would be nearly useless
>>> there. Maybe somebody within IBM with a better setup can run a quick
>>> test?
>
> Won't swear I didn't screw something up, but here's some quick numbers. Host was
> 4.12.0 with and without this series, running QEMU 2.10.0-rc0. Created 4 guests,
> each with 4 CPU (unpinned) and 4GB RAM. VM1 did full kernel compiles with
> kernbench, which took averages of 5 runs of different job sizes (I threw away
> the "-j 1" numbers). VM2-VM4 ran cpu burners on 2 of their 4 cpus.
>
> Numbers from VM1 kernbench output, and the delta between runs:
>
> load -j 3 before after delta
> Elapsed Time 183.178 182.58 -0.598
> User Time 534.19 531.52 -2.67
> System Time 32.538 33.37 0.832
> Percent CPU 308.8 309 0.2
> Context Switches 98484.6 99001 516.4
> Sleeps 227347 228752 1405
>
> load -j 16 before after delta
> Elapsed Time 153.352 147.59 -5.762
> User Time 545.829 533.41 -12.419
> System Time 34.289 34.85 0.561
> Percent CPU 347.6 348 0.4
> Context Switches 160518 159120 -1398
> Sleeps 240740 240536 -204
>
Thanks Eric!
The `Elapsed Time` is smaller with this series , the result is the same as my
numbers in cover-letter.
>
> - Eric
>
>
> .
>
--
Regards,
Longpeng(Mike)
Powered by blists - more mailing lists