linux-kernel - Re: [RFC PATCH 1/1] KVM: VMX: Use Hyper-V EPT flush for local TLB flushes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca26fba1-c2bb-40a1-bb5e-92811c4a6fc6@linux.microsoft.com>
Date: Wed, 2 Jul 2025 18:11:43 +0200
From: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>,
 Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
 <pbonzini@...hat.com>, kvm@...r.kernel.org
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, linux-kernel@...r.kernel.org,
 alanjiang@...rosoft.com, chinang.ma@...rosoft.com,
 andrea.pellegrini@...rosoft.com, Kevin Tian <kevin.tian@...el.com>,
 "K. Y. Srinivasan" <kys@...rosoft.com>,
 Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>, linux-hyperv@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] KVM: VMX: Use Hyper-V EPT flush for local TLB
 flushes

On 27/06/2025 10:31, Vitaly Kuznetsov wrote:
> Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com> writes:
> 
>> Use Hyper-V's HvCallFlushGuestPhysicalAddressSpace for local TLB flushes.
>> This makes any KVM_REQ_TLB_FLUSH_CURRENT (such as on root alloc) visible to
>> all CPUs which means we no longer need to do a KVM_REQ_TLB_FLUSH on CPU
>> migration.
>>
>> The goal is to avoid invept-global in KVM_REQ_TLB_FLUSH. Hyper-V uses a
>> shadow page table for the nested hypervisor (KVM) and has to invalidate all
>> EPT roots when invept-global is issued. This has a performance impact on
>> all nested VMs.  KVM issues KVM_REQ_TLB_FLUSH on CPU migration, and under
>> load the performance hit causes vCPUs to use up more of their slice of CPU
>> time, leading to more CPU migrations. This has a snowball effect and causes
>> CPU usage spikes.
>>
>> By issuing the hypercall we are now guaranteed that any root modification
>> that requires a local TLB flush becomes visible to all CPUs. The same
>> hypercall is already used in kvm_arch_flush_remote_tlbs and
>> kvm_arch_flush_remote_tlbs_range.  The KVM expectation is that roots are
>> flushed locally on alloc and we achieve consistency on migration by
>> flushing all roots - the new behavior of achieving consistency on alloc on
>> Hyper-V is a superset of the expected guarantees. This makes the
>> KVM_REQ_TLB_FLUSH on CPU migration no longer necessary on Hyper-V.
> 
> Sounds reasonable overall, my only concern (not sure if valid or not) is
> that using the hypercall for local flushes is going to be more expensive
> than invept-context we do today and thus while the performance is
> improved for the scenario when vCPUs are migrating a lot, we will take a
> hit in other cases.
> 

Discussion below, I think the impact should be limited and will try to quantify it.

>>
>> Coincidentally - we now match the behavior of SVM on Hyper-V.
>>
>> Signed-off-by: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  1 +
>>  arch/x86/kvm/vmx/vmx.c          | 20 +++++++++++++++++---
>>  arch/x86/kvm/vmx/vmx_onhyperv.h |  6 ++++++
>>  arch/x86/kvm/x86.c              |  3 +++
>>  4 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index b4a391929cdb..d3acab19f425 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1077,6 +1077,7 @@ struct kvm_vcpu_arch {
>>  
>>  #if IS_ENABLED(CONFIG_HYPERV)
>>  	hpa_t hv_root_tdp;
>> +	bool hv_vmx_use_flush_guest_mapping;
>>  #endif
>>  };
>>  
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 4953846cb30d..f537e0df56fc 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -1485,8 +1485,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
>>  		/*
>>  		 * Flush all EPTP/VPID contexts, the new pCPU may have stale
>>  		 * TLB entries from its previous association with the vCPU.
>> +		 * Unless we are running on Hyper-V where we promotes local TLB
> 
> s,promotes,promote, or, as Sean doesn't like pronouns, 
> 
> "... where local TLB flushes are promoted ..."
> 

Will do.

>> +		 * flushes to be visible across all CPUs so no need to do again
>> +		 * on migration.
>>  		 */
>> -		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>> +		if (!vmx_hv_use_flush_guest_mapping(vcpu))
>> +			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>>  
>>  		/*
>>  		 * Linux uses per-cpu TSS and GDT, so set these when switching
>> @@ -3243,11 +3247,21 @@ void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
>>  	if (!VALID_PAGE(root_hpa))
>>  		return;
>>  
>> -	if (enable_ept)
>> +	if (enable_ept) {
>> +		/*
>> +		 * hyperv_flush_guest_mapping() has the semantics of
>> +		 * invept-single across all pCPUs. This makes root
>> +		 * modifications consistent across pCPUs, so an invept-global
>> +		 * on migration is no longer required.
>> +		 */
>> +		if (vmx_hv_use_flush_guest_mapping(vcpu))
>> +			return (void)WARN_ON_ONCE(hyperv_flush_guest_mapping(root_hpa));
>> +
> 
> HvCallFlushGuestPhysicalAddressSpace sounds like a heavy operation as it
> affects all processors. Is there any visible perfomance impact of this
> change when there are no migrations (e.g. with vCPU pinning)? Or do we
> believe that Hyper-V actually handles invept-context the exact same way?
> 
I'm going to have to do some more investigation to answer that - do you have an
idea of a workload that would be sensitive to tlb flushes that I could compare
this on?

In terms of cost, Hyper-V needs to invalidate the VMs shadow page table for a root
and do the tlb flush. The first part is CPU intensive but is the same in both cases
(hypercall and invept-single). The tlb flush part will require a bit more work for
the hypercall as it needs to happen on all cores, and the tlb will now be empty
for that root.

My assumption is that these local tlb flushes are rather rare as they will
only happen when:
- new root is allocated
- we need to switch to a special root

So not very frequent post vm boot (with or without pinning). And the effect of the
tlb being empty for that root on other CPUs should be a neutral, as users of the
root would have performed the same local flush at a later point in time (when using it).

All the other mmu updates use kvm_flush_remote_tlbs* which already go through the
hypercall.

Jeremi