[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0e290a9d-9d26-d24a-ba01-9fda4826a5ac@redhat.com>
Date: Wed, 16 Oct 2019 19:01:25 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCH 12/14] KVM: retpolines: x86: eliminate retpoline from
vmx.c exit handlers
On 16/10/19 18:50, Andrea Arcangeli wrote:
>> It still doesn't add up. 0.3ms / 5 is 1/15000th of a second; 43us is
>> 1/25000th of a second. Do you have multiple vCPU perhaps?
>
> Why would I run any test on UP guests? Rather then spending time doing
> the math on my results, it's probably quicker that you run it yourself:
I don't know, but if you don't say how many vCPUs you have, I cannot do
the math and review the patch.
>> The number of vmexits doesn't count (for HLT). What counts is how long
>> they take to be serviced, and as long as it's 1us or more the
>> optimization is pointless.
>
> Please note the single_task_running() check which immediately breaks
> the kvm_vcpu_check_block() loop if there's even a single other task
> that can be scheduled in the runqueue of the host CPU.
>
> What happen when the host is not idle is quoted below:
>
> w/o optimization with optimization
> ---------------------- -------------------------
> 0us vmexit vmexit
> 500ns retpoline call vmexit handler directly
> 600ns retpoline kvm_vcpu_check_block()
> 700ns retpoline schedule()
> 800ns kvm_vcpu_check_block()
> 900ns schedule()
> ...
>
> Disclaimer: the numbers on the left are arbitrary and I just cut and
> pasted them from yours, no idea how far off they are.
Yes, of course. But the idea is the same: yes, because of the retpoline
you run the guest for perhaps 300ns more before schedule()ing, but does
that really matter? 300ns * 20000 times/second is a 0.6% performance
impact, and 300ns is already very generous. I am not sure it would be
measurable at all.
Paolo
> To be clear, I would find it very reasonable to be requested to proof
> the benefit of the HLT optimization with benchmarks specifics for that
> single one liner, but until then, the idea that we can drop the
> retpoline optimization from the HLT vmexit by just thinking about it,
> still doesn't make sense to me, because by thinking about it I come to
> the opposite conclusion.
>
> The lack of single_task_running() in the guest driver is also why the
> guest cpuidle haltpoll risks to waste some CPU with host overcommit or
> with the host loaded at full capacity and why we may not assume it to
> be universally enabled.
>
> Thanks,
> Andrea
>
Powered by blists - more mailing lists