linux-kernel - Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 9 Sep 2021 21:41:52 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Xiaoyao Li <xiaoyao.li@...el.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves

On Fri, Aug 27, 2021, Xiaoyao Li wrote:
> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
> number of PT ADDR ranges.
> 
> KVM needs to check that guest CPUID values set by userspace doesn't
> enable any bit which is not supported by bare metal. Otherwise,
> 1. it will trigger vm-entry failure if hardware unsupported bit is
>    exposed to guest and set by guest.
> 2. it triggers #GP when context switch PT MSRs if exposing more
>    RTIT_ADDR* MSRs than hardware capacity.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@...el.com>
> ---
> There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
> MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
> cause any vm-entry failure, but guest will parse the PT packet with
> wrong format.
> 
> I also check it to be same as host to ensure the virtualization correctness.
> 
> Changes in v2:
> - Call out that if configuring more PT ADDR MSRs than hardware, it can
>   cause #GP when context switch.
> ---
>  arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 739be5da3bca..0c8e06a24156 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>  static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>  {
>  	struct kvm_cpuid_entry2 *best;
> +	u32 eax, ebx, ecx, edx;
>  
>  	/*
>  	 * The existing code assumes virtual address is 48-bit or 57-bit in the
> @@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>  			return -EINVAL;
>  	}
>  
> +	/*
> +	 * CPUID 0xD leaves tell Intel PT capabilities, which decides

CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT.  This series needs tests...

> +	 * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
> +	 *
> +	 * pt_desc.ctl_bitmask decides the legal value for guest
> +	 * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
> +	 * otherwise it will trigger vm-entry failure if guest sets native
> +	 * unsupported bits in MSR_IA32_RTIT_CTL.
> +	 */
> +	best = cpuid_entry2_find(entries, nent, 0xD, 0);
> +	if (best) {
> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> +		if (best->ebx & ~ebx || best->ecx & ~ecx)
> +			return -EINVAL;
> +	}
> +	best = cpuid_entry2_find(entries, nent, 0xD, 1);
> +	if (best) {
> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> +		if (((best->eax & 0x7) > (eax & 0x7)) ||

Ugh, looking at the rest of the code, even this isn't sufficient because
pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
with >4 entries will lead to buffer overflows.

One option would be to bump that to the theoretical max of 15, which doesn't seem
too horrible, especially if pt_desc as a whole is allocated on-demand, which it
probably should be since it isn't exactly tiny (nor ubiquitous)

A different option would be to let userspace define whatever it wants for guest
CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).

Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
guest state (even if this is a VM-Fail condition, it's not a danger to KVM).

> +		    ((best->eax & ~eax) >> 16) ||
> +		    (best->ebx & ~ebx))
> +			return -EINVAL;
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.27.0
>