[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKT_0Zuj2MBRrh6p@google.com>
Date: Tue, 19 Aug 2025 15:50:57 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>,
linux-kernel@...r.kernel.org, alanjiang@...rosoft.com,
chinang.ma@...rosoft.com, andrea.pellegrini@...rosoft.com,
Kevin Tian <kevin.tian@...el.com>, "K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>, linux-hyperv@...r.kernel.org,
Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] KVM: VMX: Use Hyper-V EPT flush for local TLB flushes
On Fri, Aug 15, 2025, Jeremi Piotrowski wrote:
> On Tue, Aug 05, 2025 at 04:42:46PM -0700, Sean Christopherson wrote:
> I started working on extending patch 5, wanted to post it here to make sure I'm
> on the right track.
>
> It works in testing so far and shows promising performance - it gets rid of all
> the pathological cases I saw before.
Nice :-)
> I haven't checked whether I broke SVM yet, and I need figure out a way to
> always keep the cpumask "offstack" so that we don't blow up every struct
> kvm_mmu_page instance with an inline cpumask - it needs to stay optional.
Doh, I meant to include an idea or two for this in my earlier response. /The
best I can come up with is
> I also came across kvm_mmu_is_dummy_root(), that check is included in
> root_to_sp(). Can you think of any other checks that we might need to handle?
Don't think so?
> @@ -3827,6 +3829,9 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
> sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
> ++sp->root_count;
>
> + if (level >= PT64_ROOT_4LEVEL)
Was this my code? If so, we should move this into the VMX code, because the fact
that PAE roots can be ignored is really a detail of nested EPT, not the overall
sceheme.
> + kvm_x86_call(alloc_root_cpu_mask)(sp);
Ah shoot. Allocating here won't work, because mmu_lock is held and allocating
might sleep. I don't want to force an atomic allocation, because that can dip
into pools that KVM really shouldn't use.
The "standard" way KVM deals with this is to utilize a kvm_mmu_memory_cache. If
we do that and add e.g kvm_vcpu_arch.mmu_roots_flushed_cache, then we trivially
do the allocation in mmu_topup_memory_caches(). That would eliminate the error
handling in vmx_alloc_root_cpu_mask(), and might make it slightly less awful to
deal with the "offstack" cpumask.
Hmm, and then instead of calling into VMX to do the allocation, maybe just have
a flag to communicate that vendor code wants per-root flush tracking? I haven't
thought hard about SVM, but I wouldn't be surprised if SVM ends up wanting the
same functionality after we switch to per-vCPU ASIDs.
> +
> return __pa(sp->spt);
> }
...
> @@ -3307,22 +3309,34 @@ void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
> vpid_sync_context(vmx_get_current_vpid(vcpu));
> }
>
> -static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa)
> +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root)
> {
This should be conditioned on enable_ept.
> + WARN_ON_ONCE(!zalloc_cpumask_var(&root->cpu_flushed_mask,
> + GFP_KERNEL_ACCOUNT));
> +}
> +
> +static void __vmx_flush_ept_on_pcpu_migration(hpa_t root_hpa, int cpu)
> +{
> + struct kvm_mmu_page *root;
> +
> if (!VALID_PAGE(root_hpa))
> return;
>
> + root = root_to_sp(root_hpa);
> + if (!root || cpumask_test_and_set_cpu(cpu, root->cpu_flushed_mask))
Hmm, this should flush if "root" is NULL, because the aforementioned "special"
roots don't have a shadow page.
But unless I'm missing an edge case (of an edge case), this particular code can
WARN_ON_ONCE() since EPT should never need to use any of the special roots. We
might need to filter out dummy roots somewhere to avoid false positives, but that
should be easy enough.
For the mask, it's probably worth splitting test_and_set into separate operations,
as the common case will likely be that the root has been used on this pCPU. The
test_and_set version will generate a LOCK BTS instruction, and so for the common
case where the bit is already set, KVM will generate an atomic access, which can
cause noise/bottlenecks
E.g.
if (WARN_ON_ONCE(!root))
goto flush;
if (cpumask_test_cpu(cpu, root->cpu_flushed_mask))
return;
cpumask_set_cpu(cpu, root->cpu_flushed_mask);
flush:
vmx_flush_tlb_ept_root(root_hpa);
> + return;
> +
> vmx_flush_tlb_ept_root(root_hpa);
> }
>
> -static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu)
> +static void vmx_flush_ept_on_pcpu_migration(struct kvm_mmu *mmu, int cpu)
> {
> int i;
>
> - __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa);
> + __vmx_flush_ept_on_pcpu_migration(mmu->root.hpa, cpu);
>
> for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> - __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa);
> + __vmx_flush_ept_on_pcpu_migration(mmu->prev_roots[i].hpa, cpu);
> }
>
> void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index b4596f651232..4406d53e6ebe 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -84,6 +84,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu);
> void vmx_flush_tlb_current(struct kvm_vcpu *vcpu);
> void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr);
> void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu);
> +void vmx_alloc_root_cpu_mask(struct kvm_mmu_page *root);
> void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask);
> u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu);
> void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall);
> --
> 2.39.5
>
Powered by blists - more mailing lists