lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c47677bfde89d05264bf874286a899ff644986b.camel@intel.com>
Date:   Wed, 10 May 2023 11:59:50 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>
CC:     "robert.hu@...ux.intel.com" <robert.hu@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "Gao, Chao" <chao.gao@...el.com>
Subject: Re: [PATCH v8 3/6] KVM: x86: Virtualize CR3.LAM_{U48,U57}

On Wed, 2023-05-10 at 14:06 +0800, Binbin Wu wrote:
> From: Robert Hoo <robert.hu@...ux.intel.com>
> 
> Add support to allow guests to set two new CR3 non-address control bits for
> guests to enable the new Intel CPU feature Linear Address Masking (LAM) on user
> pointers.
> 
> LAM modifies the checking that is applied to 64-bit linear addresses, allowing
> software to use of the untranslated address bits for metadata and masks the
> metadata bits before using them as linear addresses to access memory. LAM uses
> two new CR3 non-address bits LAM_U48 (bit 62) and AM_U57 (bit 61) to configure
> LAM for user pointers. LAM also changes VMENTER to allow both bits to be set in
> VMCS's HOST_CR3 and GUEST_CR3 for virtualization.
> 
> When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set any of
> the two LAM control bits. However, when EPT is off, the actual CR3 used by the
> guest is generated from the shadow MMU root which is different from the CR3 that
> is *set* by the guest, and KVM needs to manually apply any active control bits
> to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest.
> 
> KVM manually checks guest's CR3 to make sure it points to a valid guest physical
> address (i.e. to support smaller MAXPHYSADDR in the guest). Extend this check
> to allow the two LAM control bits to be set. And to make such check generic,
> introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control bits
> that are allowed to be set by the guest. After check, non-address bits of guest
> CR3 will be stripped off to extract guest physical address.
> 
> In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 and
> GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 enters
> L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly. KVM also
> manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical address.
> Extend such check to allow the new LAM control bits too.
> 
> Note, LAM doesn't have a global control bit to turn on/off LAM completely, but
> purely depends on hardware's CPUID to determine it can be enabled or not. That
> means, when EPT is on, even when KVM doesn't expose LAM to guest, the guest can
> still set LAM control bits in CR3 w/o causing problem. This is an unfortunate
> virtualization hole. KVM could choose to intercept CR3 in this case and inject
> fault but this would hurt performance when running a normal VM w/o LAM support.
> This is undesirable. Just choose to let the guest do such illegal thing as the
> worst case is guest being killed when KVM eventually find out such illegal
> behaviour and that is the guest to blame.
> 
> Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.
> Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
> to provide a clear distinction b/t CR3 and GPA checks.
> 
> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Robert Hoo <robert.hu@...ux.intel.com>
> Co-developed-by: Binbin Wu <binbin.wu@...ux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@...ux.intel.com>
> Tested-by: Xuelian Guo <xuelian.guo@...el.com>
> 

[...]

> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	hpa_t root;
>  
>  	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> -	root_gfn = root_pgd >> PAGE_SHIFT;
> +	/*
> +	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
> +	 * additional control bits (e.g. LAM control bits). To be generic,
> +	 * unconditionally strip non-address bits when computing the GFN since
> +	 * the guest PGD has already been checked for validity.
> +	 */
> +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;

Looks it's better to mask off non-address bits based on guest's invalid physical
address bits, for instance, based on vcpu->arch.resereved_gpa_bits.  But looking
at the old discussion it appears Sean suggested this way:

https://lore.kernel.org/all/ZAuPPv8PUDX2RBQa@google.com/

So anyway:

Reviewed-by: Kai Huang <kai.huang@...el.com>





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ