[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f52c362-ac1b-4435-92c8-81ec97992b02@intel.com>
Date: Fri, 6 Dec 2024 11:37:30 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Adrian Hunter <adrian.hunter@...el.com>, Chao Gao <chao.gao@...el.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"seanjc@...gle.com" <seanjc@...gle.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Huang, Kai"
<kai.huang@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Yang, Weijiang" <weijiang.yang@...el.com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
"dmatlack@...gle.com" <dmatlack@...gle.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 7/7] KVM: TDX: Add TSX_CTRL msr into uret_msrs list
On 12/6/2024 1:31 AM, Adrian Hunter wrote:
> On 4/12/24 17:33, Xiaoyao Li wrote:
>> On 12/4/2024 7:55 PM, Adrian Hunter wrote:
>>> On 4/12/24 13:13, Chao Gao wrote:
>>>> On Wed, Dec 04, 2024 at 08:57:23AM +0200, Adrian Hunter wrote:
>>>>> On 4/12/24 08:37, Chao Gao wrote:
>>>>>> On Wed, Dec 04, 2024 at 08:18:32AM +0200, Adrian Hunter wrote:
>>>>>>> On 4/12/24 03:25, Chao Gao wrote:
>>>>>>>>> +#define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM))
>>>>>>>>> +
>>>>>>>>> +static bool has_tsx(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + return entry->function == 7 && entry->index == 0 &&
>>>>>>>>> + (entry->ebx & TDX_FEATURE_TSX);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void clear_tsx(struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + entry->ebx &= ~TDX_FEATURE_TSX;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + return entry->function == 7 && entry->index == 0 &&
>>>>>>>>> + (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG));
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + if (has_tsx(entry))
>>>>>>>>> + clear_tsx(entry);
>>>>>>>>> +
>>>>>>>>> + if (has_waitpkg(entry))
>>>>>>>>> + clear_waitpkg(entry);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>> +{
>>>>>>>>> + return has_tsx(entry) || has_waitpkg(entry);
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> No need to check TSX/WAITPKG explicitly because setup_tdparams_cpuids() already
>>>>>>>> ensures that unconfigurable bits are not set by userspace.
>>>>>>>
>>>>>>> Aren't they configurable?
>>>>>>
>>>>>> They are cleared from the configurable bitmap by tdx_clear_unsupported_cpuid(),
>>>>>> so they are not configurable from a userspace perspective. Did I miss anything?
>>>>>> KVM should check user inputs against its adjusted configurable bitmap, right?
>>>>>
>>>>> Maybe I misunderstand but we rely on the TDX module to reject
>>>>> invalid configuration. We don't check exactly what is configurable
>>>>> for the TDX Module.
>>>>
>>>> Ok, this is what I missed. I thought KVM validated user input and masked
>>>> out all unsupported features. sorry for this.
>>>>
>>>>>
>>>>> TSX and WAITPKG are not invalid for the TDX Module, but KVM
>>>>> must either support them by restoring their MSRs, or disallow
>>>>> them. This patch disallows them for now.
>>>>
>>>> Yes. I agree. what if a new feature (supported by a future TDX module) also
>>>> needs KVM to restore some MSRs? current KVM will allow it to be exposed (since
>>>> only TSX/WAITPKG are checked); then some MSRs may get corrupted. I may think
>>>> this is not a good design. Current KVM should work with future TDX modules.
>>>
>>> With respect to CPUID, I gather this kind of thing has been
>>> discussed, such as here:
>>>
>>> https://lore.kernel.org/all/ZhVsHVqaff7AKagu@google.com/
>>>
>>> and Rick and Xiaoyao are working on something.
>>>
>>> In general, I would expect a new TDX Module would advertise support for
>>> new features, but KVM would have to opt in to use them.
>>>
>>
>> There were discussion[1] on whether KVM to gatekeep the configurable/supported CPUIDs for TDX. I stand by Sean that KVM needs to do so.
>>
>> Regarding KVM opt in the new feature, KVM gatekeeps the CPUID bit that can be set by userspace is exactly the behavior of opt-in. i.e., for a given KVM, it only allows a CPUID set {S} to be configured by userspace, if new TDX module supports new feature X, it needs KVM to opt-in X by adding X to {S} so that X is allowed to be configured by userspace.
>>
>> Besides, I find current interface between KVM and userspace lacks the ability to tell userspace what bits are not supported by KVM. KVM_TDX_CAPABILITIES.cpuid doesn't work because it represents the configurable CPUIDs, not supported CPUIDs (I think we might rename it to configurable_cpuid to better reflect its meaning). So userspace has to hardcode that TSX and WAITPKG is not support itself.
>
> I don't follow why hardcoding would be necessary.
>
> If the leaf is represented in KVM_TDX_CAPABILITIES.cpuid, and
> the bits are 0 there, why would userspace try to set them to 1?
Userspace doesn't set the bit to 1 in kvm_tdx_init_vm.cpuid, doesn't
mean userspace wants the bit to be 0.
Note, KVM_TDX_CAPABILITIES.cpuid reports the configurable bits. The
value 0 of a bit in KVM_TDX_CAPABILITIES.cpuid means the bit is not
configurable, not means the bit is unsupported.
For kvm_tdx_init_vm.cpuid,
- if the corresponding bit is reported as 1 in
KVM_TDX_CAPABILITIES.cpuid, then a value 0 in kvm_tdx_init_vm.cpuid
means userspace wants to configure it as 0.
- if the corresponding bit is reported as 0 in
KVM_TDX_CAPABILITIES.cpuid, then userspace has to pass a value 0 in
kvm_tdx_init_vm.cpuid. But it doesn't mean the value of the bit will be 0.
e.g., X2APIC bit is 0 in KVM_TDX_CAPABILITIES.cpuid, and it's also 0 in
kvm_tdx_init_vm.cpuid, but TD guest sees a value of 1. In the view of
QEMU, it maintains the bit of X2APIC as 1, and QEMU filters X2APIC bit
when calling KVM_TDX_INIT_VM because X2APIC is not configurable.
So when it comes to TSX and WAITPKG, QEMU also needs an interface to be
informed that they are unsupported. Without the interface of fixed0 bits
reported by KVM, QEMU needs to hardcode itself like [1]. The problem of
hardcode is that it will conflict when future KVM allows them to be
configurable.
In the future, if we have interface from KVM to report the fixed0 and
fixed1 bit (on top of the proposal [2]), userspace can drop the
hardcoded one it maintains. At that time, KVM can ensure no conflict by
removing the bits from fixed0/1 array when allowing them to be
configurable.
[1]
https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@intel.com/
[2]
https://lore.kernel.org/all/43b26df1-4c27-41ff-a482-e258f872cc31@intel.com/
>>
>> [1] https://lore.kernel.org/all/ZuM12EFbOXmpHHVQ@google.com/
>>
>
Powered by blists - more mailing lists