[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50e09676-4dfc-473f-8b34-7f7a98ab5228@intel.com>
Date: Tue, 14 May 2024 14:01:12 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, Isaku Yamahata
<isaku.yamahata@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, Paolo Bonzini
<pbonzini@...hat.com>, Erdem Aktas <erdemaktas@...gle.com>, Sagi Shahar
<sagis@...gle.com>, Bo2 Chen <chen.bo@...el.com>, Hang Yuan
<hang.yuan@...el.com>, Tina Zhang <tina.zhang@...el.com>,
<isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend
specific
On 11/05/2024 2:04 am, Sean Christopherson wrote:
> On Thu, May 09, 2024, Isaku Yamahata wrote:
>> On Fri, May 10, 2024 at 11:19:44AM +1200, Kai Huang <kai.huang@...el.com> wrote:
>>> On 10/05/2024 10:52 am, Sean Christopherson wrote:
>>>> On Fri, May 10, 2024, Kai Huang wrote:
>>>>> On 10/05/2024 4:35 am, Sean Christopherson wrote:
>>>>>> KVM x86 limits KVM_MAX_VCPUS to 4096:
>>>>>>
>>>>>> config KVM_MAX_NR_VCPUS
>>>>>> int "Maximum number of vCPUs per KVM guest"
>>>>>> depends on KVM
>>>>>> range 1024 4096
>>>>>> default 4096 if MAXSMP
>>>>>> default 1024
>>>>>> help
>>>>>>
>>>>>> whereas the limitation from TDX is apprarently simply due to TD_PARAMS taking
>>>>>> a 16-bit unsigned value:
>>>>>>
>>>>>> #define TDX_MAX_VCPUS (~(u16)0)
>>>>>>
>>>>>> i.e. it will likely be _years_ before TDX's limitation matters, if it ever does.
>>>>>> And _if_ it becomes a problem, we don't necessarily need to have a different
>>>>>> _runtime_ limit for TDX, e.g. TDX support could be conditioned on KVM_MAX_NR_VCPUS
>>>>>> being <= 64k.
>>>>>
>>>>> Actually later versions of TDX module (starting from 1.5 AFAICT), the module
>>>>> has a metadata field to report the maximum vCPUs that the module can support
>>>>> for all TDX guests.
>>>>
>>>> My quick glance at the 1.5 source shows that the limit is still effectively
>>>> 0xffff, so again, who cares? Assert on 0xffff compile time, and on the reported
>>>> max at runtime and simply refuse to use a TDX module that has dropped the minimum
>>>> below 0xffff.
>>>
>>> I need to double check why this metadata field was added. My concern is in
>>> future module versions they may just low down the value.
>>
>> TD partitioning would reduce it much.
>
> That's still not a reason to plumb in what is effectively dead code. Either
> partitioning is opt-in, at which I suspect KVM will need yet more uAPI to express
> the limitations to userspace, or the TDX-module is potentially breaking existing
> use cases.
The 'max_vcpus_per_td' global metadata fields is static for the TDX
module. If the module supports the TD partitioning, it just reports
some smaller value regardless whether we opt-in TDX partitioning or not.
I think the point is this 'max_vcpus_per_td' is TDX architectural thing
and kernel should not make any assumption of the value of it. The
architectural behaviour is:
If the module has this 'max_vcpus_per_td', software should read and
use it; Otherwise software should treat it as U16_MAX.
Thus I don't think we will need a new uAPI (TDX specific presumably)
just for TD partitioning. And this doesn't break existing use cases.
In fact, this doesn't prevent us from making the KVM_CAP_MAX_VCPUS code
generic, e.g., we can do below:
1) In tdx_vm_init() (called via KVM_VM_CREATE -> vt_vm_init()), we do:
kvm->max_vcpus = min(kvm->max_vcpus,
tdx_info->max_vcpus_per_td);
2) In kvm_vm_ioctl_enable_cap_generic(), we add support to handle
KVM_CAP_MAX_VCPUS to have the generic code to do:
if (new_max_vcpus > kvm->max_vcpus)
return -EINVAL;
kvm->max_vcpus = new_max_vcpus;
However this means we only allow "lowing down" the kvm->max_vcpus in the
kvm_vm_ioctl_enable_cap_generic(KVM_CAP_MAX_VCPUS), but I think this is
acceptable?
If it is a concern, alternatively, we can add a new
'kvm->hard_max_vcpus' (or whatever makes sense), and set it in
kvm_create_vm() right after kvm_arch_init_vm():
r = kvm_arch_init_vm(kvm, type);
if (r)
goto out_err_no_arch_destroy_vm;
kvm->hard_max_vcpus = kvm->max_vcpus;
So it always contains "the max_vcpus limited by the ARCH
hardware/firmware etc".
And in kvm_vm_ioctl_enable_cap_generic(), we check against
kvm->hard_max_vcpus instead to get rid of the limitation of only
allowing lowing down the kvm->max_vcpus.
But I don't think this is necessary at this stage.
Powered by blists - more mailing lists