[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM6PR03MB41405C84C56FCEE7110263CECD34A@DM6PR03MB4140.namprd03.prod.outlook.com>
Date: Fri, 14 Jul 2023 09:29:55 +0800
From: Wang Jianchao <jianchwa@...look.com>
To: Xiaoyao Li <xiaoyao.li@...el.com>,
Zhi Wang <zhi.wang.linux@...il.com>
Cc: seanjc@...gle.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
hpa@...or.com, kvm@...r.kernel.org, arkinjob@...look.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/3] KVM: x86: introduce pv feature lazy tscdeadline
On 2023.07.13 21:32, Xiaoyao Li wrote:
> On 7/13/2023 10:50 AM, Wang Jianchao wrote:
>>
>>
>> On 2023.07.13 02:14, Zhi Wang wrote:
>>> On Fri, 7 Jul 2023 14:17:58 +0800
>>> Wang Jianchao <jianchwa@...look.com> wrote:
>>>
>>>> Hi
>>>>
>>>> This patchset attemps to introduce a new pv feature, lazy tscdeadline.
>>>> Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs
>>>> and host side handle it. However, a lot of the vm-exit is unnecessary
>>>> because the timer is often over-written before it expires.
>>>>
>>>> v : write to msr of tsc deadline
>>>> | : timer armed by tsc deadline
>>>>
>>>> v v v v v | | | | |
>>>> ---------------------------------------> Time
>>>>
>>>> The timer armed by msr write is over-written before expires and the
>>>> vm-exit caused by it are wasted. The lazy tscdeadline works as following,
>>>>
>>>> v v v v v | |
>>>> ---------------------------------------> Time
>>>> '- arm -'
>>>>
>>>
>>> Interesting patch.
>>>
>>> I am a little bit confused of the chart above. It seems the write of MSR,
>>> which is said to cause VM exit, is not reduced in the chart of lazy
>>> tscdeadline, only the times of arm are getting less. And the benefit of
>>> lazy tscdeadline is said coming from "less vm exit". Maybe it is better
>>> to imporve the chart a little bit to help people jump into the idea
>>> easily?
>>
>> Thanks so much for you comment and sorry for my poor chart.
>>
>> Let me try to rework the chart.
>>
>> Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline,
>> a vm-exit occurs and host arms a hv or sw timer for it.
>>
>>
>> w: write msr
>> x: vm-exit
>> t: hv or sw timer
>>
>>
>> Guest
>> w
>> ---------------------------------------> Time
>> Host x t
>>
>> However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten
>> many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs
>>
>>
>> 1. write to msr with t0
>>
>> Guest
>> w0
>> ----------------------------------------> Time
>> Host x0 t0
>>
>> 2. write to msr with t1
>> Guest
>> w1
>> ------------------------------------------> Time
>> Host x1 t0->t1
>>
>>
>> 2. write to msr with t2
>> Guest
>> w2
>> ------------------------------------------> Time
>> Host x2 t1->t2
>>
>> 3. write to msr with t3
>> Guest
>> w3
>> ------------------------------------------> Time
>> Host x3 t2->t3
>>
>>
>>
>> What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following,
>>
>>
>> Firstly, we have two fields shared between guest and host as other pv features, saying,
>> - armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side
>> - pending, the next value of tscdeadline, only updated by __guest__ side
>>
>>
>> 1. write to msr with t0
>>
>> armed : t0
>> pending : t0
>> Guest
>> w0
>> ----------------------------------------> Time
>> Host x0 t0
>>
>> vm-exit occurs and arms a timer for t0 in host side
>
> What's the initial value of @armed and @pending?
Both of them are zero.
@armed is only updated by host
@pending is updated by guest
Guest side will check @armed, it it is zero, jumps to wrmsrl
>
>> 2. write to msr with t1
>>
>> armed : t0
>> pending : t1
>>
>> Guest
>> w1
>> ------------------------------------------> Time
>> Host t0
>>
>> the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write
>> to msr but just update pending
>
> if t1 < t0, then it triggers the vm exit, right?
Yes. If new tsc deadline value is smaller than @armed, namely t1 here, it jumps to wrmsrl
> And in this case, I think @armed will be updated to t1. What about pending? will it get updated to t1 or not?
Yes, the guest jumps to wrmsrl and causes a vm-exit, the host side will update the @armed and re-arm the timer
Thanks
Jianchao
>
>>
>> 3. write to msr with t2
>>
>> armed : t0
>> pending : t2
>> Guest
>> w2
>> ------------------------------------------> Time
>> Host t0
>> Similar with step 2, just update pending field with t2, no vm-exit
>>
>>
>> 4. write to msr with t3
>>
>> armed : t0
>> pending : t3
>>
>> Guest
>> w3
>> ------------------------------------------> Time
>> Host t0
>> Similar with step 2, just update pending field with t3, no vm-exit
>>
>>
>> 5. t0 expires, arm t3
>>
>> armed : t3
>> pending : t3
>>
>>
>> Guest
>> ------------------------------------------> Time
>> Host t0 ------> t3
>>
>> t0 is fired, it checks the pending field and re-arm a timer based on it.
>>
>>
>> Here is the core ideal of this patch ;)
>>
>>
>> Thanks
>> Jianchao
>>
>>>
>>>> The 1st timer is responsible for arming the next timer. When the armed
>>>> timer is expired, it will check pending and arm a new timer.
>>>>
>>>> In the netperf test with TCP_RR on loopback, this lazy_tscdeadline can
>>>> reduce vm-exit obviously.
>>>>
>>>> Close Open
>>>> --------------------------------------------------------
>>>> VM-Exit
>>>> sum 12617503 5815737
>>>> intr 0% 37023 0% 33002
>>>> cpuid 0% 1 0% 0
>>>> halt 19% 2503932 47% 2780683
>>>> msr-write 79% 10046340 51% 2966824
>>>> pause 0% 90 0% 84
>>>> ept-violation 0% 584 0% 336
>>>> ept-misconfig 0% 0 0% 2
>>>> preemption-timer 0% 29518 0% 34800
>>>> -------------------------------------------------------
>>>> MSR-Write
>>>> sum 10046455 2966864
>>>> apic-icr 25% 2533498 93% 2781235
>>>> tsc-deadline 74% 7512945 6% 185629
>>>>
>>>> This patchset is made and tested on 6.4.0, includes 3 patches,
>>>>
>>>> The 1st one adds necessary data structures for this feature
>>>> The 2nd one adds the specific msr operations between guest and host
>>>> The 3rd one are the one make this feature works.
>>>>
>>>> Any comment is welcome.
>>>>
>>>> Thanks
>>>> Jianchao
>>>>
>>>> Wang Jianchao (3)
>>>> KVM: x86: add msr register and data structure for lazy tscdeadline
>>>> KVM: x86: exchange info about lazy_tscdeadline with msr
>>>> KVM: X86: add lazy tscdeadline support to reduce vm-exit of msr-write
>>>>
>>>>
>>>> arch/x86/include/asm/kvm_host.h | 10 ++++++++
>>>> arch/x86/include/uapi/asm/kvm_para.h | 9 +++++++
>>>> arch/x86/kernel/apic/apic.c | 47 ++++++++++++++++++++++++++++++++++-
>>>> arch/x86/kernel/kvm.c | 13 ++++++++++
>>>> arch/x86/kvm/cpuid.c | 1 +
>>>> arch/x86/kvm/lapic.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>> arch/x86/kvm/lapic.h | 4 +++
>>>> arch/x86/kvm/x86.c | 26 ++++++++++++++++++++
>>>> 8 files changed, 229 insertions(+), 9 deletions(-)
>>>
>
Powered by blists - more mailing lists