[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <593cdae9-c49d-6977-24d5-191f723188d7@redhat.com>
Date: Thu, 21 Oct 2021 19:43:29 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Lai Jiangshan <jiangshanlai@...il.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: Lai Jiangshan <laijs@...ux.alibaba.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 2/4] KVM: X86: Cache CR3 in prev_roots when PCID is
disabled
On 19/10/21 13:01, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
>
> The commit 21823fbda5522 ("KVM: x86: Invalidate all PGDs for the
> current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific
> PCID and in the case of PCID is disabled, it includes all PGDs in the
> prev_roots and the commit made prev_roots totally unused in this case.
>
> Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1
> before the said commit:
> (CR4.PCIDE=0, CR3=cr3_a, the page for the guest
> kernel is global, cr3_b is cached in prev_roots)
>
> modify the user part of cr3_b
> the shadow root of cr3_b is unsync in kvm
> INVPCID single context
> the guest expects the TLB is clean for PCID=0
> change CR4.PCIDE 0 -> 1
> switch to cr3_b with PCID=0,NOFLUSH=1
> No sync in kvm, cr3_b is still unsync in kvm
> return to the user part (of cr3_b)
> the user accesses to wrong page
>
> It is a very unlikely case, but it shows that virtualizing guest TLB in
> prev_roots is not safe in this case and the said commit did fix the
> problem.
>
> But the said commit also disabled caching CR3 in prev_roots when PCID
> is disabled and NOT all CPUs have PCID, especially the PCID support
> for AMD CPUs is kind of recent. To restore the original optimization,
> we have to enable caching CR3 without re-introducing problems.
>
> Actually, in short, the said commit just ensures prev_roots not part of
> the virtualized TLB. So this change caches CR3 in prev_roots, and
> ensures prev_roots not part of the virtualized TLB by always flushing
> the virtualized TLB when CR3 is switched from prev_roots to current
> (it is already the current behavior) and by freeing prev_roots when
> CR4.PCIDE is changed 0 -> 1.
>
> Anyway:
> PCID enabled: vTLB includes root_hpa, prev_roots and hardware TLB.
> PCID disabled: vTLB includes root_hpa and hardware TLB, no prev_roots.
>
> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
> arch/x86/kvm/x86.c | 32 ++++++++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 06169ed08db0..13df3ca88e09 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1022,10 +1022,29 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>
> void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
> {
> + /*
> + * If any role bit is changed, the MMU needs to be reset.
> + *
> + * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the guest
> + * TLB per SDM, but the virtualized TLB doesn't include prev_roots when
> + * CR4.PCIDE is 0, so the prev_roots has to be freed to avoid later
> + * resuing without explicit flushing.
> + * If CR4.PCIDE is changed 1 -> 0, there is required to flush the guest
> + * TLB and KVM_REQ_MMU_RELOAD is fit for the both cases. Although
> + * KVM_REQ_MMU_RELOAD is slow, changing CR4.PCIDE is a rare case.
* If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
* If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
* according to the SDM; however, stale prev_roots could be reused
* reused incorrectly by MOV to CR3 with NOFLUSH=1, so we free them
* all. KVM_REQ_MMU_RELOAD is fit for the both cases; it
* is slow, but changing CR4.PCIDE is a rare case.
> + * If CR4.PGE is changed, there is required to just flush the guest TLB.
> + *
> + * Note: reseting MMU covers KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_RELOAD
> + * covers KVM_REQ_TLB_FLUSH_GUEST, so "else if" is used here and the
> + * check for later cases are skipped if the check for the preceding
> + * case is matched.
* Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
* KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
* the usage of "else if".
> + */
> if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> kvm_mmu_reset_context(vcpu);
> - else if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
> - (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> + else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
> + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> + else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
> kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
> @@ -1093,6 +1112,15 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
> kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> }
>
> + /*
> + * If PCID is disabled, there is no need to free prev_roots even the
> + * PCIDs for them are also 0. The prev_roots are just not included
> + * in the "clean" virtualized TLB and a resync will happen anyway
> + * before switching to any other CR3.
> + */
/*
* If PCID is disabled, there is no need to free prev_roots even if the
* PCIDs for them are also 0, because all moves to CR3 flush the TLB
* with PCID=0.
*/
> + if (!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
> + return;
> +
> for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
> roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>
Can you confirm the above comments are accurate?
Paolo
Powered by blists - more mailing lists