[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7df87b7f0b2e029b483d08611e70291aab4e4d0b.camel@redhat.com>
Date: Mon, 21 Jun 2021 01:25:18 +0300
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
open list <linux-kernel@...r.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
Wanpeng Li <wanpengli@...cent.com>,
Ingo Molnar <mingo@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>, Joerg Roedel <joro@...tes.org>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
Jim Mattson <jmattson@...gle.com>,
Jonathan Corbet <corbet@....net>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v3 8/8] KVM: x86: avoid loading PDPTRs after migration
when possible
On Fri, 2021-06-18 at 20:53 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 11260e83518f..eadfc9caf500 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -815,6 +815,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
> >
> > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
> > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> > + vcpu->arch.pdptrs_restored_oob = false;
> > +
> > out:
> >
> > return ret;
> > @@ -10113,6 +10115,7 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
> >
> > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
> > mmu_reset_needed = 1;
> > + vcpu->arch.pdptrs_restored_oob = true;
>
> Setting pdptrs_restored_oob[*] here and _only_ clearing it on successful
> load_pdptrs() is not robust. Potential problems once the flag is set:
Hi Sean Christopherson!
Thanks for the review!
I also thought about the exact same thing when I submitted the last version of
this patches (prior version didn't clear the flag at all but I noticed that
while doing self review of my patches).
>
> 1. Userspace calls KVM_SET_SREGS{,2} without valid PDPTRs. Flag is now stale.
True. It isn't that big issue though since the only way to this is to also disable PAE mode
in CR4 during the call or before it, thus PDPTRs becomes irrelevant.
Once PAE is enabled again, PDPTRs will be loaded again, resetting this flag.
Something to note is that we also don't clear available/dirty status of VCPU_EXREG_PDPTR
in this case.
> 2. kvm_check_nested_events() VM-Exits to L1 before the flag is processed.
> Flag is now stale.
Also true. However this means that we enter L1 now, and once we are ready to enter
L2 again, we will load PDPTRS from guest memory as thankfully VM entries in PAE mode
do load PDPTRS from guest memory (both on Intel and AMD).
>
> (2) might not be problematic in practice since the "normal" load_pdptrs()
> should reset the flag on the next VM-Enter, but it's really, really hard to tell.
> E.g. what if an SMI causes an exit and _that_ non-VM-Enter reload of L2 state
> is the first to trip the flag? The bool is essentially an extension of
> KVM_REQ_GET_NESTED_STATE_PAGES, I think it makes sense to clear the flag whenever
> KVM_REQ_GET_NESTED_STATE_PAGES is cleared.
Could you expalain a bit better about SMM case? When SMM entry is done, at least Intel
spec is silent on if PDPTRs are preserved in SMRAM (and it doesn't have any place allocated
for them).
We currently don't preserve PDTPRS on SMM entry and we reload them via CR3/CR0 write
when we exit SMM.
Ah I see now, on VMX the non VM-Enter code path is also used for returns from SMM,
and it sets the KVM_REQ_GET_NESTED_STATE_PAGES, so this is a real issue.
If SMM code enabled PAE, and then KVM_SET_SREGS2 was used, and followed by
RSM, we indeed have a risk of not loading the PDPTRs.
I think that this is best fixed by resetting this flag in vmx_leave_smm,
since RSM loads PDPTRS from memory always otherwise.
I also note that we don't do KVM_REQ_GET_NESTED_STATE_PAGES on SVM,
on return from SMM at all,
thus on SVM I think I broke the resume from SMM to a guest if the guest is PAE.
Oh well....
I will now extend the testing I usually do to SMM and prepare a patch to fix this.
>
> Another thing that's not obvious is the required ordering between KVM_SET_SREGS2
> and KVM_SET_NESTED_STATE. AFAICT it's not documented, but that may be PEBKAC on
> my end. E.g. what happens if walk_mmu == &root_mmu (L1 active in targte KVM)
> when SET_SREGS2 is called, and _then_ KVM_SET_NESTED_STATE is called?
Isn't that exactly the current ordering (and reason why I had to do this patch series)
First the KVM_SET_SREGS is called indeed prior to KVM_SET_NESTED_STATE and it can
potentially load wrong PDPTRS, and then KVM_SET_NESTED_STATE is called which used
to 'fix' this by reloading them always.
>
> [*] pdptrs_from_userspace in Paolo's tree.
>
I think that strictly speaking this flag should be cleared when PAE mode is disabled,
and together with clearing of availablity of VCPU_EXREG_PDPTR.
I don't agree that this flag is an extension of the KVM_REQ_GET_NESTED_STATE_PAGES.
I think this flag is more like an extra property of VCPU_EXREG_PDPTR.
In addition to being dirty/available, this "register" can be loaded from memory
or restored from migration stream.
Thanks again for the review,
Best regards,
Maxim Levitsky
Powered by blists - more mailing lists