linux-kernel - Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YQqzU26ok3MXCzs8@google.com>
Date:   Wed, 4 Aug 2021 15:33:39 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Lai Jiangshan <jiangshanlai+lkml@...il.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 01/37] KVM: VMX: Flush all EPTP/VPID contexts on
 remote TLB flush

On Wed, Aug 04, 2021, Lai Jiangshan wrote:
> The optimization I considered yesterday is "ept_sync_global() V.S.
> ept_sync_context(this_vcpu's)" in the case: when the VM is using EPT and
> doesn't allow nested VMs.  (And I failed to express it yesterday)
> 
> In this case, the vCPU uses only one single root_hpa,

This is not strictly guaranteed.  kvm_mmu_page_role tracks efer.NX, cr0.wp, and
cr4.SMEP/SMAP (if cr0.wp=0), which means that KVM will create a a different root
if the guest toggles any of those bits.  I'm pretty sure that can be changed and
will look into doing so in the near future[*], but even that wouldn't guarantee
a single root.

SMM is also incorporated in the page role and will result in a different roots
for SMM vs. non-SMM.  This is mandatory because SMM has its own memslot view.

A CPUID.MAXPHYADDR change can also change the role, but in this case zapping all
roots will always be the correct/desired behavior.

[*] https://lkml.kernel.org/r/YQGj8gj7fpWDdLg5@google.com

> and I think ept sync for single context is enough for both cases you listed below.
> 
> When the context is flushed, the TLB for the vCPU is clean to run.
> 
> If kvm changes the mmu->root_hpa, it is kvm's responsibility to request
> another flush which is implemented.

KVM needs to flush when it allocates a new root, largely because it has no way
of knowing if some other entity previously created a CR3/EPTP at that HPA, but
KVM isn't strictly required to flush when switching to a previous/cached root.

Currently this is a moot point because kvm_post_set_cr0(), kvm_post_set_cr4(),
set_efer(), and kvm_smm_changed() all do kvm_mmu_reset_context() instead of
attempting a fast PGD switch, but I am hoping to change this as well, at least
for the non-SMM cases.

> In other words, KVM_REQ_TLB_FLUSH == KVM_REQ_TLB_FLUSH_CURRENT in this case.
> And before this patch, kvm flush only the single context rather than global.
> 
> >
> > Use #1 is remote flushes from the MMU, which don't strictly require a global flush,
> > but KVM would need to propagate more information (mmu_role?) in order for responding
> > vCPUs to determine what contexts needs to be flushed.  And practically speaking,
> > for MMU flushes there's no meaningful difference when using TDP without nested
> > guests as the common case will be that each vCPU has a single active EPTP and
> > that EPTP will be affected by the MMU changes, i.e. needs to be flushed.
> 
> I don't see when we need "to determine what contexts" since the vcpu is
> using only one context in this case which is the assumption in my mind,
> could you please correct me if I'm wrong.

As it exists today, I believe you're correct that KVM will only ever have a
single reachable TDP root, but only because of overzealous kvm_mmu_reset_context()
usage.  The SMM case in particular could be optimized to not zap all roots (whether
or not it's worth optimizing is another question).

All that said, the easiest way to query the number of reachable roots would be to
check the previous/cached root.

But, even if we can guarantee there's exactly one reachable root, I would be
surprised if doing INVEPT.context instead of INVEPT.global actually provided any
meaningful performance benefit.  Using INVEPT.context is safe if and only if there
are no other TLB entries for this vCPU, and KVM must invalidate on pCPU migration,
so there can't be collateral damage in that sense.

That leaves the latency of INVEPT as the only possible performance delta, and that
will be uarch specific.  It's entirely possible INVEPT.global is slower, but again
I would be surprised if it is so much slower than INVEPT.context that it actually
impacts guest performance given that its use is limited to slow paths.