linux-kernel - Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd236026-0bc9-424c-8d49-6bdc9daf5743@intel.com>
Date: Thu, 12 Sep 2024 15:48:20 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Paolo Bonzini <pbonzini@...hat.com>,
 Rick Edgecombe <rick.p.edgecombe@...el.com>, seanjc@...gle.com,
 kvm@...r.kernel.org
Cc: kai.huang@...el.com, isaku.yamahata@...il.com,
 tony.lindgren@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from
 KVM_GET_SUPPORTED_CPUID

On 9/11/2024 1:52 AM, Paolo Bonzini wrote:
> On 8/13/24 00:48, Rick Edgecombe wrote:
>> Originally, the plan was to filter the directly configurable CPUID bits
>> exposed by KVM_TDX_CAPABILITIES, and the final configured bit values
>> provided by KVM_TDX_GET_CPUID. However, several issues were found with
>> this. Both the filtering done with KVM_TDX_CAPABILITIES and
>> KVM_TDX_GET_CPUID had the issue that the get_supported_cpuid() provided
>> default values instead of supported masks for multi-bit fields (i.e. 
>> those
>> encoding a multi-bit number).
>>
>> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
>> actually supported by KVM, but missing from get_supported_cpuid() for one
>> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
>> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in 
>> QEMU by
>> adjusting which features are expected. 

I'm not sure what issuee/problem can be worked around in QEMU.

QEMU doesn't expect these bit are reported by KVM as supported for TDX. 
QEMU just accepts the result reported by KVM.

The problem is, TDX module and the hardware allow these bits be 
configured for TD guest, but KVM doesn't allow. It leads to users cannot 
create a TD with these bits on.

QEMU cannot work around this problem.

>> Some of these are going to be 
>> added
>> to get_supported_cpuid(), and that is probably the right long term fix.
> 
> There are several cases here:
> 
> - MWAIT is hidden because it's hard to virtualize its C-state parameters
> 
> - HT is hidden because it depends on the topology, and cannot be added 
> blindly.
> 
> - TSC_DEADLINE_TIMER is queried with KVM_CHECK_EXTENSION for historical 
> reasons
> 
> There are basically two kinds of userspace:
> 
> - those that fetch KVM_GET_SUPPORED_CPUID and pass it blindly to 
> KVM_SET_CPUID2.  These mostly work, though they may miss a feature or 
> three (e.g. the TSC deadline timer).
> 
> - those that know each bit and make an informed decision on what to 
> enable; for those, KVM_GET_SUPPORTED_CPUID is just guidance.
> 
> Because of this, KVM_GET_SUPPORTED_CPUID doesn't return bits that are 
> one; it returns a mix of:
> 
> - maximum supported values (e.g. CPUID[7,0].EAX)
> 
> - values from the host (e.g. FMS or model name)
> 
> - supported features
> 
> It's an awfully defined API but it is easier to use than it sounds (some 
> of the quirks are being documented in 
> Documentation/virt/kvm/x86/errata.rst and 
> Documentation/virt/kvm/x86/api.rst).  The idea is that, if userspace 
> manages individual CPUID bits, it already knows what can be one anyway.
> 
> This is the kind of API that we need to present for TDX, even if the 
> details on how to get the supported CPUID are different.  Not because 
> it's a great API, but rather because it's a known API.

However there are differences for TDX. For legacy VMs, the result of 
KVM_GET_SUPPORTED_CPUID isn't used to filter the input of KVM_SET_CPUID2.

But for TDX, it needs to filter the input of KVM_TDX_VM_INIT.CPUID[] 
because TDX module only allows the bits that are reported as 
configurable to be set to 1.

> The difference between this and KVM_GET_SUPPORTED_CPUID are small, but 
> the main one is X86_FEATURE_HYPERVISOR (I am not sure whether to make it 
> different with respect to X86_FEATURE_TSC_DEADLINE_TIMER; leaning 
> towards no).
> 
> We may also need a second ioctl specifically to return the fixed-1 bits. 
>   Asking Xiaoyao for input with regard to what he'd like to have in QEMU.

With current designed API, QEMU can only know which bits are 
configurable before KVM_TDX_VM_INIT, i.e., which bits can be set to 1 or 
0 freely.

For other bits not reported as configurable, QEMU can know the exact 
value of them via KVM_TDX_GET_CPUID, after KVM_TDX_VM_INIT and before 
TD's running. With it, QEMU can validate the return value is matched 
with what QEMU wants to set that determined by users input. If not 
matched, QEMU can provide some warnings like what for legacy VMs:

   - TDX doesn't support requested feature: CPUID.01H.ECX.tsc-deadline 
[bit 24]
   - TDX forcibly sets features: CPUID.01H:ECX.hypervisor [bit 31]

If there are ioctls to report the fixed0 bits and fixed1 bits for TDX, 
QEMU can validate the user's configuration earlier.

>> +    entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 
>> 0x0, 0);
>> +    if (WARN_ON(!entry))
>> +        goto err;
>> +    /* Fixup of maximum basic leaf */
>> +    entry->eax |= 0x000000FF;
>> +
>> +    entry = kvm_find_cpuid_entry2((*cpuid)->entries, (*cpuid)->nent, 
>> 0x1, 0);
>> +    if (WARN_ON(!entry))
>> +        goto err;
>> +    /* Fixup of FMS */
>> +    entry->eax |= 0x0fff3fff;
>> +    /* Fixup of maximum logical processors per package */
>> +    entry->ebx |= 0x00ff0000;
>> +
> 
> I see now why you could blindly AND things in patch 24.
> 
> However, the right mode of operation is still to pick manually which 
> bits to AND.
> 
> Paolo
>