[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJhGHyBYaASEkB7FqGQ3FThgNDkOzvHmC5KFw9y0SEu3we8s3A@mail.gmail.com>
Date: Tue, 17 May 2022 09:02:09 +0800
From: Lai Jiangshan <jiangshanlai@...il.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Lai Jiangshan <jiangshan.ljs@...group.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Avi Kivity <avi@...hat.com>,
"open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)"
<kvm@...r.kernel.org>
Subject: Re: [PATCH] kvm: x86/svm/nested: Cache PDPTEs for nested NPT in PAE
paging mode
On Tue, May 17, 2022 at 4:45 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Fri, Apr 15, 2022, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@...group.com>
> >
> > When NPT enabled L1 is PAE paging, vcpu->arch.mmu->get_pdptrs() which
> > is nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from memroy
> > unconditionally for each call.
> >
> > The guest PAE root page is not write-protected.
> >
> > The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get different
> > values every time or it is different from the return value of
> > mmu->get_pdptrs() in mmu_alloc_shadow_roots().
> >
> > And it will cause FNAME(fetch) installs the spte in a wrong sp
> > or links a sp to a wrong parent since FNAME(gpte_changed) can't
> > check these kind of changes.
> >
> > Cache the PDPTEs and the problem is resolved. The guest is responsible
> > to info the host if its PAE root page is updated which will cause
> > nested vmexit and the host updates the cache when next nested run.
>
> Hmm, no, the guest is responsible for invalidating translations that can be
> cached in the TLB, but the guest is not responsible for a full reload of PDPTEs.
> Per the APM, the PDPTEs can be cached like regular PTEs:
>
> Under SVM, however, when the processor is in guest mode with PAE enabled, the
> guest PDPT entries are not cached or validated at this point, but instead are
> loaded and checked on demand in the normal course of address translation, just
> like page directory and page table entries. Any reserved bit violations ared
> etected at the point of use, and result in a page-fault (#PF) exception rather
> than a general-protection (#GP) exception.
>
> So if L1 modifies a PDPTE from !PRESENT (or RESERVED) to PRESENT (and valid), then
> any active L2 vCPUs should recognize the new PDPTE without a nested VM-Exit because
> the old entry can't have been cached in the TLB.
In this case, it is still !PRESENT in the shadow page, and it will cause
a vmexit when L2 tries to use the translation. I can't see anything wrong
in TLB or vTLB(shadow pages).
But I think some code is needed to reload the cached PDPTEs
when guest_mmu->get_pdptrs() returns !PRESENT and reload mmu if
the cache is changed. (and add code to avoid infinite loop)
The patch fails to reload mmu if the cache is changed which
leaves the problem described in the changelog partial resolved
only.
Maybe we need to add mmu parameter back in load_pdptrs() for it.
>
> In practice, snapshotting at nested VMRUN would likely work, but architecturally
> it's wrong and could cause problems if L1+L2 are engange in paravirt shenanigans,
> e.g. async #PF comes to mind.
>
> I believe the correct way to fix this is to write-protect nNPT PDPTEs like all other
> shadow pages, which shouldn't be too awful to do as part of your series to route
> PDPTEs through kvm_mmu_get_page().
In the one-off special shadow page (will be renamed to one-off local
shadow page)
patchsets, PAE PDPTEs is not write-protected. Wirte-protecting it causing nasty
code.
Powered by blists - more mailing lists