linux-kernel - Re: [PATCH 1/5] KVM: x86: avoid atomic operations on APICv vmentry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161016060320-mutt-send-email-mst@kernel.org>
Date:   Sun, 16 Oct 2016 06:21:41 +0300
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        rkrcmar@...hat.com, yang.zhang.wz@...il.com, feng.wu@...el.com
Subject: Re: [PATCH 1/5] KVM: x86: avoid atomic operations on APICv vmentry

On Fri, Oct 14, 2016 at 08:21:27PM +0200, Paolo Bonzini wrote:
> On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
> posted interrupts turn out to be slower than interrupt injection via
> KVM_REQ_EVENT.
> 
> This patch optimizes a bit the IRR update, avoiding expensive atomic
> operations in the common case where PI.ON=0 at vmentry or the PIR vector
> is mostly zero.  This saves at least 20 cycles (1%) per vmexit, as
> measured by kvm-unit-tests' inl_from_qemu test (20 runs):
> 
>               | enable_apicv=1  |  enable_apicv=0
>               | mean     stdev  |  mean     stdev
>     ----------|-----------------|------------------
>     before    | 5826     32.65  |  5765     47.09
>     after     | 5809     43.42  |  5777     77.02
> 
> Of course, any change in the right column is just placebo effect. :)
> The savings are bigger if interrupts are frequent.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> ---
>  arch/x86/kvm/lapic.c | 6 ++++--
>  arch/x86/kvm/vmx.c   | 9 ++++++++-
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 23b99f305382..63a442aefc12 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs)
>  	u32 i, pir_val;
>  
>  	for (i = 0; i <= 7; i++) {
> -		pir_val = xchg(&pir[i], 0);
> -		if (pir_val)
> +		pir_val = READ_ONCE(pir[i]);
> +		if (pir_val) {
> +			pir_val = xchg(&pir[i], 0);
>  			*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
> +		}
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);

gcc doesn't seem to unroll this loop and it's
probably worth unrolling it

The following seems to do the trick for me on upstream - I didn't
benchmark it though. Is there a kvm unit test for interrupts?

--->

kvm: unroll the loop in __kvm_apic_update_irr.

This is hot data path in interrupt-rich workloads, worth unrolling.

Signed-off-by: Michael S. Tsirkin <mst@...hat.com>


diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b62c852..0c3462c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -337,7 +337,8 @@ static u8 count_vectors(void *bitmap)
 	return count;
 }
 
-void __kvm_apic_update_irr(u32 *pir, void *regs)
+void __attribute__((optimize("unroll-loops")))
+__kvm_apic_update_irr(u32 *pir, void *regs)
 {
 	u32 i, pir_val;