lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0e9a17e-e94c-45f3-8869-8595c7aa58f0@intel.com>
Date: Tue, 10 Dec 2024 10:45:16 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Adrian Hunter <adrian.hunter@...el.com>, Chao Gao <chao.gao@...el.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
 "seanjc@...gle.com" <seanjc@...gle.com>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Huang, Kai"
 <kai.huang@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "Yang, Weijiang" <weijiang.yang@...el.com>,
 "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
 "dmatlack@...gle.com" <dmatlack@...gle.com>,
 "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
 "nik.borisov@...e.com" <nik.borisov@...e.com>,
 "Chatre, Reinette" <reinette.chatre@...el.com>,
 "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 7/7] KVM: TDX: Add TSX_CTRL msr into uret_msrs list

On 12/9/2024 3:08 PM, Adrian Hunter wrote:
> On 9/12/24 04:46, Xiaoyao Li wrote:
>> On 12/6/2024 10:40 PM, Adrian Hunter wrote:
>>> On 6/12/24 05:37, Xiaoyao Li wrote:
>>>> On 12/6/2024 1:31 AM, Adrian Hunter wrote:
>>>>> On 4/12/24 17:33, Xiaoyao Li wrote:
>>>>>> On 12/4/2024 7:55 PM, Adrian Hunter wrote:
>>>>>>> On 4/12/24 13:13, Chao Gao wrote:
>>>>>>>> On Wed, Dec 04, 2024 at 08:57:23AM +0200, Adrian Hunter wrote:
>>>>>>>>> On 4/12/24 08:37, Chao Gao wrote:
>>>>>>>>>> On Wed, Dec 04, 2024 at 08:18:32AM +0200, Adrian Hunter wrote:
>>>>>>>>>>> On 4/12/24 03:25, Chao Gao wrote:
>>>>>>>>>>>>> +#define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM))
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static bool has_tsx(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    return entry->function == 7 && entry->index == 0 &&
>>>>>>>>>>>>> +           (entry->ebx & TDX_FEATURE_TSX);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static void clear_tsx(struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    entry->ebx &= ~TDX_FEATURE_TSX;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    return entry->function == 7 && entry->index == 0 &&
>>>>>>>>>>>>> +           (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG));
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    if (has_tsx(entry))
>>>>>>>>>>>>> +        clear_tsx(entry);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    if (has_waitpkg(entry))
>>>>>>>>>>>>> +        clear_waitpkg(entry);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +    return has_tsx(entry) || has_waitpkg(entry);
>>>>>>>>>>>>> +}
>>>>>>>>>>>>
>>>>>>>>>>>> No need to check TSX/WAITPKG explicitly because setup_tdparams_cpuids() already
>>>>>>>>>>>> ensures that unconfigurable bits are not set by userspace.
>>>>>>>>>>>
>>>>>>>>>>> Aren't they configurable?
>>>>>>>>>>
>>>>>>>>>> They are cleared from the configurable bitmap by tdx_clear_unsupported_cpuid(),
>>>>>>>>>> so they are not configurable from a userspace perspective. Did I miss anything?
>>>>>>>>>> KVM should check user inputs against its adjusted configurable bitmap, right?
>>>>>>>>>
>>>>>>>>> Maybe I misunderstand but we rely on the TDX module to reject
>>>>>>>>> invalid configuration.  We don't check exactly what is configurable
>>>>>>>>> for the TDX Module.
>>>>>>>>
>>>>>>>> Ok, this is what I missed. I thought KVM validated user input and masked
>>>>>>>> out all unsupported features. sorry for this.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> TSX and WAITPKG are not invalid for the TDX Module, but KVM
>>>>>>>>> must either support them by restoring their MSRs, or disallow
>>>>>>>>> them.  This patch disallows them for now.
>>>>>>>>
>>>>>>>> Yes. I agree. what if a new feature (supported by a future TDX module) also
>>>>>>>> needs KVM to restore some MSRs? current KVM will allow it to be exposed (since
>>>>>>>> only TSX/WAITPKG are checked); then some MSRs may get corrupted. I may think
>>>>>>>> this is not a good design. Current KVM should work with future TDX modules.
>>>>>>>
>>>>>>> With respect to CPUID, I gather this kind of thing has been
>>>>>>> discussed, such as here:
>>>>>>>
>>>>>>>        https://lore.kernel.org/all/ZhVsHVqaff7AKagu@google.com/
>>>>>>>
>>>>>>> and Rick and Xiaoyao are working on something.
>>>>>>>
>>>>>>> In general, I would expect a new TDX Module would advertise support for
>>>>>>> new features, but KVM would have to opt in to use them.
>>>>>>>
>>>>>>
>>>>>> There were discussion[1] on whether KVM to gatekeep the configurable/supported CPUIDs for TDX. I stand by Sean that KVM needs to do so.
>>>>>>
>>>>>> Regarding KVM opt in the new feature, KVM gatekeeps the CPUID bit that can be set by userspace is exactly the behavior of opt-in. i.e., for a given KVM, it only allows a CPUID set {S} to be configured by userspace, if new TDX module supports new feature X, it needs KVM to opt-in X by adding X to {S} so that X is allowed to be configured by userspace.
>>>>>>
>>>>>> Besides, I find current interface between KVM and userspace lacks the ability to tell userspace what bits are not supported by KVM. KVM_TDX_CAPABILITIES.cpuid doesn't work because it represents the configurable CPUIDs, not supported CPUIDs (I think we might rename it to configurable_cpuid to better reflect its meaning). So userspace has to hardcode that TSX and WAITPKG is not support itself.
>>>>>
>>>>> I don't follow why hardcoding would be necessary.
>>>>>
>>>>> If the leaf is represented in KVM_TDX_CAPABILITIES.cpuid, and
>>>>> the bits are 0 there, why would userspace try to set them to 1?
>>>>
>>>> Userspace doesn't set the bit to 1 in kvm_tdx_init_vm.cpuid, doesn't mean userspace wants the bit to be 0.
>>>>
>>>> Note, KVM_TDX_CAPABILITIES.cpuid reports the configurable bits. The value 0 of a bit in KVM_TDX_CAPABILITIES.cpuid means the bit is not configurable, not means the bit is unsupported.
>>>
>>> For features configurable by CPUID like TSX and WAITPKG,
>>> a value of 0 does mean unsupported, because the value
>>> has to be 1 to enable the feature.
>>>
>>>   From the TDX Module Base spec:
>>>
>>>      11.18. Transactional Synchronization Extensions (TSX)
>>>      The host VMM can enable TSX for a TD by configuring the following CPUID bits as enabled in the TD_PARAMS input to
>>>      TDH.MNG.INIT:
>>>      - CPUID(7,0).EBX[4] (HLE)
>>>      - CPUID(7,0).EBX[11] (RTM)
>>>      etc
>>>
>>>      11.19.4. WAITPKG: TPAUSE, UMONITOR and UMWAIT Instructions
>>>      The host VMM may allow guest TDs to use the TPAUSE, UMONITOR and UMWAIT instructions, if the CPU supports them,
>>>      by configuring the virtual value of CPUID(7,0).ECX[5] (WAITPKG) to 1 using the CPUID configuration table which is part
>>>      the TD_PARAMS input to TDH.MNG.INIT. Enabling CPUID(7,0).ECX[5] also enables TD access to IA32_UMWAIT_CONTROL
>>>      (MSR 0xE1).
>>>      If not allowed, then TD execution of TPAUSE, UMONITOR or UMWAIT results in a #UD, and access to
>>>      IA32_UMWAIT_CONTROL results in a #GP(0).
>>>
>>>> For kvm_tdx_init_vm.cpuid,
>>>>    - if the corresponding bit is reported as 1 in KVM_TDX_CAPABILITIES.cpuid, then a value 0 in kvm_tdx_init_vm.cpuid means userspace wants to configure it as 0.
>>>>    - if the corresponding bit is reported as 0 in KVM_TDX_CAPABILITIES.cpuid, then userspace has to pass a value 0 in kvm_tdx_init_vm.cpuid. But it doesn't mean the value of the bit will be 0.
>>>>
>>>> e.g., X2APIC bit is 0 in KVM_TDX_CAPABILITIES.cpuid, and it's also 0 in kvm_tdx_init_vm.cpuid, but TD guest sees a value of 1. In the view of QEMU, it maintains the bit of X2APIC as 1, and QEMU filters X2APIC bit when calling KVM_TDX_INIT_VM because X2APIC is not configurable.
>>>>
>>>> So when it comes to TSX and WAITPKG, QEMU also needs an interface to be informed that they are unsupported. Without the interface of fixed0 bits reported by KVM, QEMU needs to hardcode itself like [1]. The problem of hardcode is that it will conflict when future KVM allows them to be configurable.
>>>
>>> So TSX and WAITPKG support should be based on KVM_TDX_CAPABILITIES.cpuid, and not hardcoded.
>>
>> This requires Userspace to have the knowledge that "TSX and WAITPKG is supposed to be configurable and reported as 1 in KVM_TDX_CAPABILITIES.cpuid in normal case". And if userspace gets a value 0 of TSX and Waitpkg in KVM_TDX_CAPABILITIES.cpuid, it means either KVM doesn't support/allow it or the underlying TDX module doesn't.
>>
>> I think it's an implicit form of hardcode. From the perspective of userspace, I would like to avoid all the special handling.
> 
> It is consistent with what is done for TD attributes and XFAM.  Presumably feature names must be mapped to capability bits and configuration bits, and that information has to be represented somewhere.

Attrbutes and XFAM are different to cpuid either in KVM_TDX_CAPABILITIES 
or in struct kvm_tdx_init_vm.

KVM_TDX_CAPABILITIES reports the *supported* bits for Attributes and 
XFAM while it reports the configurable bits for CPUID.

The bits passed in struct kvm_tdx_init_vm for Attributes and XFAM are a 
whole of the final value that will be set for TD guest. While the bits 
passed in struct kvm_tdx_init_vm for cpuid are only the value of 
configurable bits, the final value that will be set for TD guest needs 
to be OR'ed with fixed1 bits.

 From the perspective of userspace,the information in 
KVM_TDX_CAPABILITIES.supported_attributes and 
KVM_TDX_CAPABILITIES.supported_xfam are enough, bits reported as 0 in 
them means they are not supported. While for cpuid, in general, the bits 
reported as 0 in KVM_TDX_CAPABILITIES.cpuid doesn't mean they are not 
supported.

Anyway, my point is, in general, value 0 of a bit in 
KVM_TDX_CAPABILITIES.cpuid doesn't mean the bit is not supported for TD 
guest. It's better to have KVM expose an interface to report the 
unsupported bits. Without it, userspace has to hardcode some information 
and maintain it itself.

QEMU already hardcodes the fixed0 and fixed1 bits itself[1] according to 
a specific version of TDX spec, while the tweak to TSX and WAITPKG in 
KVM will require more specific handling in QEMU. It's not the fault of 
the change to TSX/WAITPKG in KVM_TDX_CAPABILITIES.cpuid. It somehow 
reveals it's getting messier when KVM cannot report enough information 
to userspace.

[1] 
https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@intel.com/
>>
>>>>
>>>> In the future, if we have interface from KVM to report the fixed0 and fixed1 bit (on top of the proposal [2]), userspace can drop the hardcoded one it maintains. At that time, KVM can ensure no conflict by removing the bits from fixed0/1 array when allowing them to be configurable.
>>>>
>>>> [1] https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@intel.com/
>>>> [2] https://lore.kernel.org/all/43b26df1-4c27-41ff-a482-e258f872cc31@intel.com/
>>>>
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/ZuM12EFbOXmpHHVQ@google.com/
>>>>>>
>>>>>
>>>>
>>>
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ