lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Jul 2023 21:32:11 +0800
From:   Xiaoyao Li <xiaoyao.li@...el.com>
To:     Wang Jianchao <jianchwa@...look.com>,
        Zhi Wang <zhi.wang.linux@...il.com>
Cc:     seanjc@...gle.com, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
        hpa@...or.com, kvm@...r.kernel.org, arkinjob@...look.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/3] KVM: x86: introduce pv feature lazy tscdeadline

On 7/13/2023 10:50 AM, Wang Jianchao wrote:
> 
> 
> On 2023.07.13 02:14, Zhi Wang wrote:
>> On Fri,  7 Jul 2023 14:17:58 +0800
>> Wang Jianchao <jianchwa@...look.com> wrote:
>>
>>> Hi
>>>
>>> This patchset attemps to introduce a new pv feature, lazy tscdeadline.
>>> Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs
>>> and host side handle it. However, a lot of the vm-exit is unnecessary
>>> because the timer is often over-written before it expires.
>>>
>>> v : write to msr of tsc deadline
>>> | : timer armed by tsc deadline
>>>
>>>           v v v v v        | | | | |
>>> --------------------------------------->  Time
>>>
>>> The timer armed by msr write is over-written before expires and the
>>> vm-exit caused by it are wasted. The lazy tscdeadline works as following,
>>>
>>>           v v v v v        |       |
>>> --------------------------------------->  Time
>>>                            '- arm -'
>>>
>>
>> Interesting patch.
>>
>> I am a little bit confused of the chart above. It seems the write of MSR,
>> which is said to cause VM exit, is not reduced in the chart of lazy
>> tscdeadline, only the times of arm are getting less. And the benefit of
>> lazy tscdeadline is said coming from "less vm exit". Maybe it is better
>> to imporve the chart a little bit to help people jump into the idea
>> easily?
> 
> Thanks so much for you comment and sorry for my poor chart.
> 
> Let me try to rework the chart.
> 
> Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline,
> a vm-exit occurs and host arms a hv or sw timer for it.
> 
> 
> w: write msr
> x: vm-exit
> t: hv or sw timer
> 
> 
> Guest
>           w
> --------------------------------------->  Time
> Host     x              t
>   
> 
> However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten
> many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs
> 
> 
> 1. write to msr with t0
> 
> Guest
>           w0
> ---------------------------------------->  Time
> Host     x0             t0
> 
>   
> 2. write to msr with t1
> Guest
>               w1
> ------------------------------------------>  Time
> Host         x1          t0->t1
> 
> 
> 2. write to msr with t2
> Guest
>                  w2
> ------------------------------------------>  Time
> Host            x2          t1->t2
>   
> 
> 3. write to msr with t3
> Guest
>                      w3
> ------------------------------------------>  Time
> Host                x3           t2->t3
> 
> 
> 
> What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following,
> 
> 
> Firstly, we have two fields shared between guest and host as other pv features, saying,
>   - armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side
>   - pending, the next value of tscdeadline, only updated by __guest__ side
> 
> 
> 1. write to msr with t0
> 
>               armed   : t0
>               pending : t0
> Guest
>           w0
> ---------------------------------------->  Time
> Host     x0             t0
> 
> vm-exit occurs and arms a timer for t0 in host side

What's the initial value of @armed and @pending?

>   
> 2. write to msr with t1
> 
>               armed   : t0
>               pending : t1
> 
> Guest
>               w1
> ------------------------------------------>  Time
> Host                     t0
> 
> the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write
> to msr but just update pending

if t1 < t0, then it triggers the vm exit, right?
And in this case, I think @armed will be updated to t1. What about 
pending? will it get updated to t1 or not?

> 
> 3. write to msr with t2
> 
>               armed   : t0
>               pending : t2
>   
> Guest
>                  w2
> ------------------------------------------>  Time
> Host                      t0
>   
> Similar with step 2, just update pending field with t2, no vm-exit
> 
> 
> 4.  write to msr with t3
> 
>               armed   : t0
>               pending : t3
> 
> Guest
>                      w3
> ------------------------------------------>  Time
> Host                       t0
> Similar with step 2, just update pending field with t3, no vm-exit
> 
> 
> 5.  t0 expires, arm t3
> 
>               armed   : t3
>               pending : t3
> 
> 
> Guest
>                              
> ------------------------------------------>  Time
> Host                       t0  ------> t3
> 
> t0 is fired, it checks the pending field and re-arm a timer based on it.
> 
> 
> Here is the core ideal of this patch ;)
> 
> 
> Thanks
> Jianchao
> 
>>
>>> The 1st timer is responsible for arming the next timer. When the armed
>>> timer is expired, it will check pending and arm a new timer.
>>>
>>> In the netperf test with TCP_RR on loopback, this lazy_tscdeadline can
>>> reduce vm-exit obviously.
>>>
>>>                           Close               Open
>>> --------------------------------------------------------
>>> VM-Exit
>>>               sum         12617503            5815737
>>>              intr      0% 37023            0% 33002
>>>             cpuid      0% 1                0% 0
>>>              halt     19% 2503932         47% 2780683
>>>         msr-write     79% 10046340        51% 2966824
>>>             pause      0% 90               0% 84
>>>     ept-violation      0% 584              0% 336
>>>     ept-misconfig      0% 0                0% 2
>>> preemption-timer      0% 29518            0% 34800
>>> -------------------------------------------------------
>>> MSR-Write
>>>              sum          10046455            2966864
>>>          apic-icr     25% 2533498         93% 2781235
>>>      tsc-deadline     74% 7512945          6% 185629
>>>
>>> This patchset is made and tested on 6.4.0, includes 3 patches,
>>>
>>> The 1st one adds necessary data structures for this feature
>>> The 2nd one adds the specific msr operations between guest and host
>>> The 3rd one are the one make this feature works.
>>>
>>> Any comment is welcome.
>>>
>>> Thanks
>>> Jianchao
>>>
>>> Wang Jianchao (3)
>>> 	KVM: x86: add msr register and data structure for lazy tscdeadline
>>> 	KVM: x86: exchange info about lazy_tscdeadline with msr
>>> 	KVM: X86: add lazy tscdeadline support to reduce vm-exit of msr-write
>>>
>>>
>>>   arch/x86/include/asm/kvm_host.h      |  10 ++++++++
>>>   arch/x86/include/uapi/asm/kvm_para.h |   9 +++++++
>>>   arch/x86/kernel/apic/apic.c          |  47 ++++++++++++++++++++++++++++++++++-
>>>   arch/x86/kernel/kvm.c                |  13 ++++++++++
>>>   arch/x86/kvm/cpuid.c                 |   1 +
>>>   arch/x86/kvm/lapic.c                 | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>   arch/x86/kvm/lapic.h                 |   4 +++
>>>   arch/x86/kvm/x86.c                   |  26 ++++++++++++++++++++
>>>   8 files changed, 229 insertions(+), 9 deletions(-)
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ