linux-kernel - Re: [PATCH v2] KVM: halt-polling: poll if emulated lapic timer will fire soon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CzWVsUYZ55+w+iFrW+tGuTtOB5vL3iZqwQ41BTYk4RJOw@mail.gmail.com>
Date:	Fri, 20 May 2016 13:53:18 +0800
From:	Wanpeng Li <kernellwp@...il.com>
To:	Yang Zhang <yang.zhang.wz@...il.com>
Cc:	David Matlack <dmatlack@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	kvm list <kvm@...r.kernel.org>,
	Wanpeng Li <wanpeng.li@...mail.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Radim Krčmář <rkrcmar@...hat.com>,
	Christian Borntraeger <borntraeger@...ibm.com>
Subject: Re: [PATCH v2] KVM: halt-polling: poll if emulated lapic timer will
 fire soon

2016-05-20 10:04 GMT+08:00 Yang Zhang <yang.zhang.wz@...il.com>:
> On 2016/5/20 2:36, David Matlack wrote:
>>
>> On Thu, May 19, 2016 at 11:01 AM, David Matlack <dmatlack@...gle.com>
>> wrote:
>>>
>>> On Thu, May 19, 2016 at 6:27 AM, Wanpeng Li <kernellwp@...il.com> wrote:
>>>>
>>>> From: Wanpeng Li <wanpeng.li@...mail.com>
>>>>
>>>> If an emulated lapic timer will fire soon(in the scope of 10us the
>>>> base of dynamic halt-polling, lower-end of message passing workload
>>>> latency TCP_RR's poll time < 10us) we can treat it as a short halt,
>>>> and poll to wait it fire, the fire callback apic_timer_fn() will set
>>>> KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll.
>>>> This can avoid context switch overhead and the latency which we wake
>>>> up vCPU.
>>>
>>>
>>> If I understand correctly, your patch aims to reduce the latency of
>>> (APIC Timer expires) -> (Guest resumes execution) using halt-polling.
>>> Let me know if I'm misunderstanding.
>>>
>>> In general, I don't think it makes sense to poll for timer interrupts.
>>> We know when the timer interrupt is going to arrive. If we care about
>>> the latency of delivering that interrupt to the guest, we should
>>> program the hrtimer to wake us up slightly early, and then deliver the
>>> virtual timer interrupt right on time (I think KVM's TSC Deadline
>>> Timer emulation already does this).
>>
>>
>> (It looks like the way to enable this feature is to set the module
>> parameter lapic_timer_advance_ns and make sure your guest is using the
>> TSC Deadline timer instead of the APIC Timer.)
>
>
> This feature is slightly different from current advance expiration way.
> Advance expiration rely on the VCPU is running(do polling before vmentry).
> But in some cases, the timer interrupt may be blocked by other thread(i.e.,
> IF bit is clear) and VCPU cannot be scheduled to run immediately. So even
> advance the timer early, VCPU may still see the latency. But polling is
> different, it ensures the VCPU to aware the timer expiration before schedule
> out.

Great explanation, Yang! I prefer to include this statement in my
patch description.

>
>>
>>> I'm curious to know if this scheme
>>> would give the same performance improvement to iperf as your patch.
>>>
>>> We discussed this a bit before on the mailing list before
>>> (https://lkml.org/lkml/2016/3/29/680). I'd like to see halt-polling
>>> and timer interrupts go in the opposite direction: if the next timer
>>> event (from any timer) is less than vcpu->halt_poll_ns, don't poll at
>>> all.
>>>
>>>>
>>>> iperf TCP get ~6% bandwidth improvement.
>>>
>>>
>>> Can you explain why your patch results in this bandwidth improvement?
>
>
> It should be reasonable. I have seen the same improvement with ctx switch
> benchmark: The latency is reduce from ~2600ns to ~2300ns with the similar
> mechanism.(The same idea but different implementation)

Good to know it. ;-)

Regards,
Wanpeng Li

>
>>>
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@...hat.com>
>>>> Cc: Radim Krčmář <rkrcmar@...hat.com>
>>>> Cc: David Matlack <dmatlack@...gle.com>
>>>> Cc: Christian Borntraeger <borntraeger@...ibm.com>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@...mail.com>
>>>> ---
>>>> v1 -> v2:
>>>>  * add return statement to non-x86 archs
>>>>  * capture never expire case for x86 (hrtimer is not started)
>>>>
>>>>  arch/arm/include/asm/kvm_host.h     |  4 ++++
>>>>  arch/arm64/include/asm/kvm_host.h   |  4 ++++
>>>>  arch/mips/include/asm/kvm_host.h    |  4 ++++
>>>>  arch/powerpc/include/asm/kvm_host.h |  4 ++++
>>>>  arch/s390/include/asm/kvm_host.h    |  4 ++++
>>>>  arch/x86/kvm/lapic.c                | 11 +++++++++++
>>>>  arch/x86/kvm/lapic.h                |  1 +
>>>>  arch/x86/kvm/x86.c                  |  5 +++++
>>>>  include/linux/kvm_host.h            |  1 +
>>>>  virt/kvm/kvm_main.c                 | 14 ++++++++++----
>>>>  10 files changed, 48 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h
>>>> b/arch/arm/include/asm/kvm_host.h
>>>> index 4cd8732..a5fd858 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -284,6 +284,10 @@ static inline void kvm_arch_sync_events(struct kvm
>>>> *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return -1ULL;
>>>> +}
>>>>
>>>>  static inline void kvm_arm_init_debug(void) {}
>>>>  static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index d49399d..94e227a 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -359,6 +359,10 @@ static inline void kvm_arch_sync_events(struct kvm
>>>> *kvm) {}
>>>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return -1ULL;
>>>> +}
>>>>
>>>>  void kvm_arm_init_debug(void);
>>>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/mips/include/asm/kvm_host.h
>>>> b/arch/mips/include/asm/kvm_host.h
>>>> index 9a37a10..456bc42 100644
>>>> --- a/arch/mips/include/asm/kvm_host.h
>>>> +++ b/arch/mips/include/asm/kvm_host.h
>>>> @@ -813,6 +813,10 @@ static inline void kvm_arch_vcpu_uninit(struct
>>>> kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return -1ULL;
>>>> +}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>>
>>>>  #endif /* __MIPS_KVM_HOST_H__ */
>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h
>>>> b/arch/powerpc/include/asm/kvm_host.h
>>>> index ec35af3..5986c79 100644
>>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>>> @@ -729,5 +729,9 @@ static inline void kvm_arch_exit(void) {}
>>>>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return -1ULL;
>>>> +}
>>>>
>>>>  #endif /* __POWERPC_KVM_HOST_H__ */
>>>> diff --git a/arch/s390/include/asm/kvm_host.h
>>>> b/arch/s390/include/asm/kvm_host.h
>>>> index 37b9017..bdb01a1 100644
>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>> @@ -696,6 +696,10 @@ static inline void
>>>> kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>>>                 struct kvm_memory_slot *slot) {}
>>>>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return -1ULL;
>>>> +}
>>>>
>>>>  void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu);
>>>>
>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>> index bbb5b28..cfeeac3 100644
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -256,6 +256,17 @@ static inline int apic_lvtt_tscdeadline(struct
>>>> kvm_lapic *apic)
>>>>         return apic->lapic_timer.timer_mode ==
>>>> APIC_LVT_TIMER_TSCDEADLINE;
>>>>  }
>>>>
>>>> +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       struct kvm_lapic *apic = vcpu->arch.apic;
>>>> +       struct hrtimer *timer = &apic->lapic_timer.timer;
>>>> +
>>>> +       if (!hrtimer_active(timer))
>>>> +               return -1ULL;
>>>> +       else
>>>> +               return ktime_to_ns(hrtimer_get_remaining(timer));
>>>> +}
>>>> +
>>>>  static inline int apic_lvt_nmi_mode(u32 lvt_val)
>>>>  {
>>>>         return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) ==
>>>> APIC_DM_NMI;
>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>> index 891c6da..ee4da6c 100644
>>>> --- a/arch/x86/kvm/lapic.h
>>>> +++ b/arch/x86/kvm/lapic.h
>>>> @@ -212,4 +212,5 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
>>>> struct kvm_lapic_irq *irq,
>>>>                         struct kvm_vcpu **dest_vcpu);
>>>>  int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
>>>>                         const unsigned long *bitmap, u32 bitmap_size);
>>>> +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu);
>>>>  #endif
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index a8c7ca3..9b5ad99 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7623,6 +7623,11 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
>>>>  struct static_key kvm_no_apic_vcpu __read_mostly;
>>>>  EXPORT_SYMBOL_GPL(kvm_no_apic_vcpu);
>>>>
>>>> +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +       return apic_get_timer_expire(vcpu);
>>>> +}
>>>> +
>>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>>  {
>>>>         struct page *page;
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index b1fa8f1..14d6c23 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -663,6 +663,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>>>>  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
>>>>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>>>>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>>>> +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu);
>>>>
>>>>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>>>>  void kvm_reload_remote_mmus(struct kvm *kvm);
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index dd4ac9d..e4bb30b 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -78,6 +78,9 @@ module_param(halt_poll_ns_grow, uint, S_IRUGO |
>>>> S_IWUSR);
>>>>  static unsigned int halt_poll_ns_shrink;
>>>>  module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR);
>>>>
>>>> +/* lower-end of message passing workload latency TCP_RR's poll time <
>>>> 10us */
>>>> +static unsigned int halt_poll_ns_base = 10000;
>>>> +
>>>>  /*
>>>>   * Ordering of locks:
>>>>   *
>>>> @@ -1966,7 +1969,7 @@ static void grow_halt_poll_ns(struct kvm_vcpu
>>>> *vcpu)
>>>>         grow = READ_ONCE(halt_poll_ns_grow);
>>>>         /* 10us base */
>>>>         if (val == 0 && grow)
>>>> -               val = 10000;
>>>> +               val = halt_poll_ns_base;
>>>>         else
>>>>                 val *= grow;
>>>>
>>>> @@ -2014,12 +2017,15 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>>>>         ktime_t start, cur;
>>>>         DECLARE_SWAITQUEUE(wait);
>>>>         bool waited = false;
>>>> -       u64 block_ns;
>>>> +       u64 block_ns, delta, remaining;
>>>>
>>>> +       remaining = kvm_arch_timer_remaining(vcpu);
>>>>         start = cur = ktime_get();
>>>> -       if (vcpu->halt_poll_ns) {
>>>> -               ktime_t stop = ktime_add_ns(ktime_get(),
>>>> vcpu->halt_poll_ns);
>>>> +       if (vcpu->halt_poll_ns || remaining < halt_poll_ns_base) {
>>>> +               ktime_t stop;
>>>>
>>>> +               delta = vcpu->halt_poll_ns ? vcpu->halt_poll_ns :
>>>> remaining;
>>>> +               stop = ktime_add_ns(ktime_get(), delta);
>>>>                 ++vcpu->stat.halt_attempted_poll;
>>>>                 do {
>>>>                         /*
>>>> --
>>>> 1.9.1
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> best regards
> yang



-- 
Regards,
Wanpeng Li