linux-kernel - Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZmzaqRy2zjvlsDfL@google.com>
Date: Fri, 14 Jun 2024 17:04:57 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Kai Huang <kai.huang@...el.com>
Cc: Tina Zhang <tina.zhang@...el.com>, Hang Yuan <hang.yuan@...el.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, Bo2 Chen <chen.bo@...el.com>, 
	"sagis@...gle.com" <sagis@...gle.com>, 
	"isaku.yamahata@...ux.intel.com" <isaku.yamahata@...ux.intel.com>, Erdem Aktas <erdemaktas@...gle.com>, 
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	Isaku Yamahata <isaku.yamahata@...el.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

On Fri, Jun 14, 2024, Kai Huang wrote:
> On Tue, 2024-06-04 at 10:48 +0000, Huang, Kai wrote:
> > On Thu, 2024-05-30 at 16:12 -0700, Sean Christopherson wrote:
> > > On Thu, May 30, 2024, Kai Huang wrote:
> > > > On Wed, 2024-05-29 at 16:15 -0700, Sean Christopherson wrote:
> > > > > In the unlikely event there is a legitimate reason for max_vcpus_per_td being
> > > > > less than KVM's minimum, then we can update KVM's minimum as needed.  But AFAICT,
> > > > > that's purely theoretical at this point, i.e. this is all much ado about nothing.
> > > > 
> > > > I am afraid we already have a legitimate case: TD partitioning.  Isaku
> > > > told me the 'max_vcpus_per_td' is lowed to 512 for the modules with TD
> > > > partitioning supported.  And again this is static, i.e., doesn't require
> > > > TD partitioning to be opt-in to low to 512.
> > > 
> > > So what's Intel's plan for use cases that creates TDs with >512 vCPUs?
> > 
> > I checked with TDX module guys.  Turns out the 'max_vcpus_per_td' wasn't
> > introduced because of TD partitioning, and they are not actually related.
> > 
> > They introduced this to support "topology virtualization", which requires
> > a table to record the X2APIC IDs for all vcpus for each TD.  In practice,
> > given a TDX module, the 'max_vcpus_per_td', a.k.a, the X2APIC ID table
> > size reflects the physical logical cpus that *ALL* platforms that the
> > module supports can possibly have.
> > 
> > The reason of this design is TDX guys don't believe there's sense in
> > supporting the case where the 'max_vcpus' for one single TD needs to
> > exceed the physical logical cpus.
> > 
> > So in short:
> > 
> > - The "max_vcpus_per_td" can be different depending on module versions. In
> > practice it reflects the maximum physical logical cpus that all the
> > platforms (that the module supports) can possibly have.
> > 
> > - Before CSPs deploy/migrate TD on a TDX machine, they must be aware of
> > the "max_vcpus_per_td" the module supports, and only deploy/migrate TD to
> > it when it can support.
> > 
> > - For TDX 1.5.xx modules, the value is 576 (the previous number 512 isn't
> > correct); For TDX 2.0.xx modules, the value is larger (>1000).  For future
> > module versions, it could have a smaller number, depending on what
> > platforms that module needs to support.  Also, if TDX ever gets supported
> > on client platforms, we can image the number could be much smaller due to
> > the "vcpus per td no need to exceed physical logical cpus".
> > 
> > We may ask them to support the case where 'max_vcpus' for single TD
> > exceeds the physical logical cpus, or at least not to low down the value
> > any further for future modules (> 2.0.xx modules).  We may also ask them
> > to give promise to not low the number to below some certain value for any
> > future modules.  But I am not sure there's any concrete reason to do so?
> > 
> > What's your thinking?

It's a reasonable restriction, e.g. KVM_CAP_NR_VCPUS is already capped at number
of online CPUs, although userspace is obviously allowed to create oversubscribed
VMs.

I think the sane thing to do is document that TDX VMs are restricted to the number
of logical CPUs in the system, have KVM_CAP_MAX_VCPUS enumerate exactly that, and
then sanity check that max_vcpus_per_td is greater than or equal to what KVM
reports for KVM_CAP_MAX_VCPUS.

Stating that the maximum number of vCPUs depends on the whims TDX module doesn't
provide a predictable ABI for KVM, i.e. I don't want to simply forward TDX's
max_vcpus_per_td to userspace.