linux-kernel - Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a7988439-5a4c-3d5a-ea4a-0fad181ad733@intel.com>
Date:   Fri, 10 Sep 2021 09:59:22 +0800
From:   Xiaoyao Li <xiaoyao.li@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves

On 9/10/2021 5:41 AM, Sean Christopherson wrote:
> On Fri, Aug 27, 2021, Xiaoyao Li wrote:
>> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
>> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
>> number of PT ADDR ranges.
>>
>> KVM needs to check that guest CPUID values set by userspace doesn't
>> enable any bit which is not supported by bare metal. Otherwise,
>> 1. it will trigger vm-entry failure if hardware unsupported bit is
>>     exposed to guest and set by guest.
>> 2. it triggers #GP when context switch PT MSRs if exposing more
>>     RTIT_ADDR* MSRs than hardware capacity.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@...el.com>
>> ---
>> There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
>> MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
>> cause any vm-entry failure, but guest will parse the PT packet with
>> wrong format.
>>
>> I also check it to be same as host to ensure the virtualization correctness.
>>
>> Changes in v2:
>> - Call out that if configuring more PT ADDR MSRs than hardware, it can
>>    cause #GP when context switch.
>> ---
>>   arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 739be5da3bca..0c8e06a24156 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>>   static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>>   {
>>   	struct kvm_cpuid_entry2 *best;
>> +	u32 eax, ebx, ecx, edx;
>>   
>>   	/*
>>   	 * The existing code assumes virtual address is 48-bit or 57-bit in the
>> @@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>>   			return -EINVAL;
>>   	}
>>   
>> +	/*
>> +	 * CPUID 0xD leaves tell Intel PT capabilities, which decides
> 
> CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT.  This series needs tests...

My apologize.

>> +	 * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
>> +	 *
>> +	 * pt_desc.ctl_bitmask decides the legal value for guest
>> +	 * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
>> +	 * otherwise it will trigger vm-entry failure if guest sets native
>> +	 * unsupported bits in MSR_IA32_RTIT_CTL.
>> +	 */
>> +	best = cpuid_entry2_find(entries, nent, 0xD, 0);
>> +	if (best) {
>> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>> +		if (best->ebx & ~ebx || best->ecx & ~ecx)
>> +			return -EINVAL;
>> +	}
>> +	best = cpuid_entry2_find(entries, nent, 0xD, 1);
>> +	if (best) {
>> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>> +		if (((best->eax & 0x7) > (eax & 0x7)) ||
> 
> Ugh, looking at the rest of the code, even this isn't sufficient because
> pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
> with >4 entries will lead to buffer overflows.

it's hardcoded to 4 because there is a note of "no processors support 
more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table 31-11

> One option would be to bump that to the theoretical max of 15, which doesn't seem
> too horrible, especially if pt_desc as a whole is allocated on-demand, which it
> probably should be since it isn't exactly tiny (nor ubiquitous)
> 
> A different option would be to let userspace define whatever it wants for guest
> CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
> 
> Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
> plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
> guest state (even if this is a VM-Fail condition, it's not a danger to KVM).

I'm fine to only safe guard the nr_addr_range if VM-Entry failure 
doesn't matter.

> 
>> +		    ((best->eax & ~eax) >> 16) ||
>> +		    (best->ebx & ~ebx))
>> +			return -EINVAL;
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> -- 
>> 2.27.0
>>