lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <593cdae9-c49d-6977-24d5-191f723188d7@redhat.com>
Date:   Thu, 21 Oct 2021 19:43:29 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Lai Jiangshan <jiangshanlai@...il.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     Lai Jiangshan <laijs@...ux.alibaba.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 2/4] KVM: X86: Cache CR3 in prev_roots when PCID is
 disabled

On 19/10/21 13:01, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@...ux.alibaba.com>
> 
> The commit 21823fbda5522 ("KVM: x86: Invalidate all PGDs for the
> current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific
> PCID and in the case of PCID is disabled, it includes all PGDs in the
> prev_roots and the commit made prev_roots totally unused in this case.
> 
> Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1
> before the said commit:
> 	(CR4.PCIDE=0, CR3=cr3_a, the page for the guest
> 	 kernel is global, cr3_b is cached in prev_roots)
> 
> 	modify the user part of cr3_b
> 		the shadow root of cr3_b is unsync in kvm
> 	INVPCID single context
> 		the guest expects the TLB is clean for PCID=0
> 	change CR4.PCIDE 0 -> 1
> 	switch to cr3_b with PCID=0,NOFLUSH=1
> 		No sync in kvm, cr3_b is still unsync in kvm
> 	return to the user part (of cr3_b)
> 		the user accesses to wrong page
> 
> It is a very unlikely case, but it shows that virtualizing guest TLB in
> prev_roots is not safe in this case and the said commit did fix the
> problem.
> 
> But the said commit also disabled caching CR3 in prev_roots when PCID
> is disabled and NOT all CPUs have PCID, especially the PCID support
> for AMD CPUs is kind of recent.  To restore the original optimization,
> we have to enable caching CR3 without re-introducing problems.
> 
> Actually, in short, the said commit just ensures prev_roots not part of
> the virtualized TLB.  So this change caches CR3 in prev_roots, and
> ensures prev_roots not part of the virtualized TLB by always flushing
> the virtualized TLB when CR3 is switched from prev_roots to current
> (it is already the current behavior) and by freeing prev_roots when
> CR4.PCIDE is changed 0 -> 1.
> 
> Anyway:
> PCID enabled: vTLB includes root_hpa, prev_roots and hardware TLB.
> PCID disabled: vTLB includes root_hpa and hardware TLB, no prev_roots.
> 
> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
>   arch/x86/kvm/x86.c | 32 ++++++++++++++++++++++++++++++--
>   1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 06169ed08db0..13df3ca88e09 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1022,10 +1022,29 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>   
>   void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
>   {
> +	/*
> +	 * If any role bit is changed, the MMU needs to be reset.
> +	 *
> +	 * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the guest
> +	 * TLB per SDM, but the virtualized TLB doesn't include prev_roots when
> +	 * CR4.PCIDE is 0, so the prev_roots has to be freed to avoid later
> +	 * resuing without explicit flushing.
> +	 * If CR4.PCIDE is changed 1 -> 0, there is required to flush the guest
> +	 * TLB and KVM_REQ_MMU_RELOAD is fit for the both cases.  Although
> +	 * KVM_REQ_MMU_RELOAD is slow, changing CR4.PCIDE is a rare case.

          * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
          * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
          * according to the SDM; however, stale prev_roots could be reused
          * reused incorrectly by MOV to CR3 with NOFLUSH=1, so we free them
          * all.  KVM_REQ_MMU_RELOAD is fit for the both cases; it
          * is slow, but changing CR4.PCIDE is a rare case.

> +	 * If CR4.PGE is changed, there is required to just flush the guest TLB.
> +	 *
> +	 * Note: reseting MMU covers KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_RELOAD
> +	 * covers KVM_REQ_TLB_FLUSH_GUEST, so "else if" is used here and the
> +	 * check for later cases are skipped if the check for the preceding
> +	 * case is matched.

          * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
          * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
          * the usage of "else if".

> +	 */
>   	if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
>   		kvm_mmu_reset_context(vcpu);
> -	else if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
> -		 (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> +	else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
> +		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> +	else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
>   		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>   }
>   EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
> @@ -1093,6 +1112,15 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
>   		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>   	}
>   
> +	/*
> +	 * If PCID is disabled, there is no need to free prev_roots even the
> +	 * PCIDs for them are also 0.  The prev_roots are just not included
> +	 * in the "clean" virtualized TLB and a resync will happen anyway
> +	 * before switching to any other CR3.
> +	 */

         /*
          * If PCID is disabled, there is no need to free prev_roots even if the
          * PCIDs for them are also 0, because all moves to CR3 flush the TLB
          * with PCID=0.
          */

> +	if (!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
> +		return;
> +
>   	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>   		if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid)
>   			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> 


Can you confirm the above comments are accurate?

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ