linux-kernel - Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZDhs+AnytF030DYe@google.com>
Date:   Thu, 13 Apr 2023 13:58:32 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     David Matlack <dmatlack@...gle.com>
Cc:     Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Tianyu Lan <ltykernel@...il.com>,
        Michael Kelley <mikelley@...rosoft.com>
Subject: Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

On Thu, Apr 13, 2023, David Matlack wrote:
> On Thu, Apr 13, 2023 at 12:10 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Thu, Apr 13, 2023, Sean Christopherson wrote:
> > > Aha!  Idea.  There are _at most_ 4 possible roots the TDP MMU can encounter.
> > > 4-level non-SMM, 4-level SMM, 5-level non-SMM, and 5-level SMM.  I.e. not keeping
> > > inactive roots on a per-VM basis is just monumentally stupid.
> >
> > One correction: there are 6 possible roots:
> >
> >   1. 4-level !SMM !guest_mode (i.e. not nested)
> >   2. 4-level SMM !guest_mode
> >   3. 5-level !SMM !guest_mode
> >   4. 5-level SMM !guest_mode
> >   5. 4-level !SMM guest_mode
> >   6. 5-level !SMM guest_mode
> >
> > I forgot that KVM still uses the TDP MMU when running L2 if L1 doesn't enable
> > EPT/TDP, i.e. if L1 is using shadow paging for L2.  But that really doesn't change
> > anything as each vCPU can already track 4 roots, i.e. userspace can saturate all
> > 6 roots anyways.  And in practice, no sane VMM will create a VM with both 4-level
> > and 5-level roots (KVM keys off of guest.MAXPHYADDR for the TDP root level).
> 
> Why do we create a new root for guest_mode=1 if L1 disables EPT/NPT?

Because "private", a.k.a. KVM-internal, memslots are visible to L1 but not L2.
Which for TDP means the APIC-access page.  From commit 3a2936dedd20:

    kvm: mmu: Don't expose private memslots to L2
    
    These private pages have special purposes in the virtualization of L1,
    but not in the virtualization of L2. In particular, L1's APIC access
    page should never be entered into L2's page tables, because this
    causes a great deal of confusion when the APIC virtualization hardware
    is being used to accelerate L2's accesses to its own APIC.

FWIW, I _think_ KVM could actually let L2 access the APIC-access page when L1 is
running without any APIC virtualization, i.e. when L1 is passing its APIC through
to L2.  E.g. something like the below, but I ain't touching that with a 10 foot pole
unless someone explicitly asks for it :-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 039fb16560a0..8aa12f5f2c30 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4370,10 +4370,13 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
        if (!kvm_is_visible_memslot(slot)) {
                /* Don't expose private memslots to L2. */
                if (is_guest_mode(vcpu)) {
-                       fault->slot = NULL;
-                       fault->pfn = KVM_PFN_NOSLOT;
-                       fault->map_writable = false;
-                       return RET_PF_CONTINUE;
+                       if (!slot || slot->id != APIC_ACCESS_PAGE_PRIVATE_MEMSLOT ||
+                           nested_cpu_has_virtual_apic(vcpu)) {
+                               fault->slot = NULL;
+                               fault->pfn = KVM_PFN_NOSLOT;
+                               fault->map_writable = false;
+                               return RET_PF_CONTINUE;
+                           }
                }
                /*
                 * If the APIC access page exists but is disabled, go directly