[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d607a51ebc9eecae2030a63f7870456e8ac72b45.camel@intel.com>
Date: Thu, 20 Feb 2025 23:45:39 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Li, Xiaoyao" <xiaoyao.li@...el.com>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"seanjc@...gle.com" <seanjc@...gle.com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>
CC: "Gao, Chao" <chao.gao@...el.com>, "Edgecombe, Rick P"
<rick.p.edgecombe@...el.com>, "Chatre, Reinette" <reinette.chatre@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Zhao, Yan Y"
<yan.y.zhao@...el.com>, "Hunter, Adrian" <adrian.hunter@...el.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>, "Yamahata,
Isaku" <isaku.yamahata@...el.com>
Subject: Re: [PATCH 18/18] Documentation/virt/kvm: Document on Trust Domain
Extensions(TDX)
On Thu, 2025-02-20 at 13:50 +0800, Xiaoyao Li wrote:
> On 2/19/2025 6:23 PM, Huang, Kai wrote:
> >
> >
> > On 10/12/2024 1:49 pm, Binbin Wu wrote:
> ...
> > > +
> > > +
> > > +API description
> > > +===============
> > > +
> > > +KVM_MEMORY_ENCRYPT_OP
> > > +---------------------
> > > +:Type: vm ioctl, vcpu ioctl
> > > +
> > > +For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
> > > +ioctl with TDX specific sub ioctl command.
> >
> > command -> commands.
> >
> > > +
> > > +::
> > > +
> > > + /* Trust Domain eXtension sub-ioctl() commands. */
> >
> > I think "Extensions" is used in every place in the kernel, so
> > "eXtension" -> "Extensions".
> >
> > And lack of consistency between "sub ioctl commands" and "sub-ioctl()
> > commands". Perhaps just use "sub-commands" for all the places.
>
> It's copied from the kernel header file, we need to fix there at first.
I see. So let's keep the code and the doc synced.
I'll leave to Rick to decide whether we should fix in the code.
>
> > > + enum kvm_tdx_cmd_id {
> > > + KVM_TDX_CAPABILITIES = 0,
> > > + KVM_TDX_INIT_VM,
> > > + KVM_TDX_INIT_VCPU,
> > > + KVM_TDX_INIT_MEM_REGION,
> > > + KVM_TDX_FINALIZE_VM,
> > > + KVM_TDX_GET_CPUID,
> > > +
> > > + KVM_TDX_CMD_NR_MAX,
> > > + };
> > > +
> > > + struct kvm_tdx_cmd {
> > > + /* enum kvm_tdx_cmd_id */
> > > + __u32 id;
> > > + /* flags for sub-commend. If sub-command doesn't use this,
> > > set zero. */
> >
> > commend -> command.
>
> Ditto.
Ditto.
>
> > > + __u32 flags;
> > > + /*
> > > + * data for each sub-command. An immediate or a pointer to
> > > the actual
> > > + * data in process virtual address. If sub-command doesn't
> > > use it,
> > > + * set zero.
> > > + */
> > > + __u64 data;
> > > + /*
> > > + * Auxiliary error code. The sub-command may return TDX
> > > SEAMCALL
> > > + * status code in addition to -Exxx.
> > > + * Defined for consistency with struct kvm_sev_cmd.
> > > + */
> >
> > "Defined for consistency with struct kvm_sev_cmd" got removed in the
> > code. It should be removed here too.
This can be done since I see it got removed in the code.
> >
> > > + __u64 hw_error;
> > > + };
> > > +
> > > +KVM_TDX_CAPABILITIES
> > > +--------------------
> > > +:Type: vm ioctl
> > > +
> > > +Subset of TDSYSINFO_STRUCT retrieved by TDH.SYS.INFO TDX SEAM call
> > > will be
> > > +returned. It describes the Intel TDX module.
> >
> > We are not using TDH.SYS.INFO and TDSYSINFO_STRUCT anymore. Perhaps:
> >
> > Retrive TDX moduel global capabilities for running TDX guests.
>
> (some typos: Retrive => Retrieve, moduel => module)
>
> It's not TDX module global capabilities. It has been adjuested by KVM to
> mask off the bits that are not supported by the KVM.
>
> How about:
>
> Return the TDX capabilities that current KVM supports with the specific
> TDX module loaeded in the system. It reports what features/capabilities
> are allowed to be configured to the TDX guest.
Fine to me. Thanks.
>
> > > +
> > > +- id: KVM_TDX_CAPABILITIES
> > > +- flags: must be 0
> > > +- data: pointer to struct kvm_tdx_capabilities
> > > +- error: must be 0
> > > +- unused: must be 0
> >
> > @error should be @hw_error.
> >
> > And I don't see @unused anyware in the 'struct kvm_tdx_cmd'. Should be
> > removed.
> >
> > The same to all below sub-commands.
> >
> > > +
> > > +::
> > > +
> > > + struct kvm_tdx_capabilities {
> > > + __u64 supported_attrs;
> > > + __u64 supported_xfam;
> > > + __u64 reserved[254];
> > > + struct kvm_cpuid2 cpuid;
> > > + };
> > > +
> > > +
> > > +KVM_TDX_INIT_VM
> > > +---------------
> > > +:Type: vm ioctl
> >
> > I would add description for return values:
> >
> > :Returns: 0 on success, <0 on error.
>
> +1,
>
> I have added it for KVM_TDX_GET_CPUID internally.
Let's add it for all sub-commands then.
>
> > We can also repeat this in all sub-commands, but I am not sure it's
> > necessary.
> >
> > > +
> > > +Does additional VM initialization specific to TDX which corresponds to
> > > +TDH.MNG.INIT TDX SEAM call.
> >
> > "SEAM call" -> "SEAMCALL" for consistency. And the same to below all
> > sub-commands.
> >
> > Nit:
> >
> > I am not sure whether we need to, or should, mention the detailed
> > SEAMCALL here. To me the ABI doesn't need to document the detailed
> > implementation (i.e., which SEAMCALL is used) in the underneath kernel.
>
> Agreed to drop mentioning the detailed SEAMCALL.
>
> Maybe, we can mention the sequence requirement as well, that it needs to
> be called after KVM_CREATE_VM and before creating any VCPUs.
Yeah agreed.
Then perhaps we can drop the "KVM TDX creation flow" section entirely to avoid
the potential debate.
>
> > > +
> > > +- id: KVM_TDX_INIT_VM
> > > +- flags: must be 0
> > > +- data: pointer to struct kvm_tdx_init_vm
> > > +- error: must be 0
> > > +- unused: must be 0
> > > +
> > > +::
> > > +
> > > + struct kvm_tdx_init_vm {
> > > + __u64 attributes;
> > > + __u64 xfam;
> > > + __u64 mrconfigid[6]; /* sha384 digest */
> > > + __u64 mrowner[6]; /* sha384 digest */
> > > + __u64 mrownerconfig[6]; /* sha384 digest */
> > > +
> > > + /* The total space for TD_PARAMS before the CPUIDs is 256
> > > bytes */
> > > + __u64 reserved[12];
> > > +
> > > + /*
> > > + * Call KVM_TDX_INIT_VM before vcpu creation, thus before
> > > + * KVM_SET_CPUID2.
> > > + * This configuration supersedes KVM_SET_CPUID2s for VCPUs
> > > because the
> > > + * TDX module directly virtualizes those CPUIDs without VMM.
> > > The user
> > > + * space VMM, e.g. qemu, should make KVM_SET_CPUID2
> > > consistent with
> > > + * those values. If it doesn't, KVM may have wrong idea of
> > > vCPUIDs of
> > > + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that
> > > the TDX
> > > + * module doesn't virtualize.
> > > + */
> > > + struct kvm_cpuid2 cpuid;
> > > + };
> > > +
> > > +
> > > +KVM_TDX_INIT_VCPU
> > > +-----------------
> > > +:Type: vcpu ioctl
> > > +
> > > +Does additional VCPU initialization specific to TDX which corresponds to
> > > +TDH.VP.INIT TDX SEAM call.
>
> Same for the TDH.VP.INIT SEAMCALL, I don't think we need to mention it.
> And KVM_TDX_INIT_VCPU does more things other than TDH.VP.INIT.
>
> How about just:
>
> Perform TDX specific VCPU initialization
Fine to me.
>
> > > +- id: KVM_TDX_INIT_VCPU
> > > +- flags: must be 0
> > > +- data: initial value of the guest TD VCPU RCX
> > > +- error: must be 0
> > > +- unused: must be 0
> > > +
> > > +KVM_TDX_INIT_MEM_REGION
> > > +-----------------------
> > > +:Type: vcpu ioctl
> > > +
> > > +Encrypt a memory continuous region which corresponding to
> > > TDH.MEM.PAGE.ADD
> > > +TDX SEAM call.
> >
> > "a contiguous guest memory region"?
> >
> > And "which corresponding to .." has grammar issue.
> >
> > How about:
> >
> > Load and encrypt a contiguous memory region from the source
> > memory which corresponds to the TDH.MEM.PAGE.ADD TDX SEAMCALL.
>
> As sugguested above, I prefer to drop mentionting the detailed SEAMCALL.
>
> Besides, it's better to call out the dependence that the gpa needs to be
> set to private attribute before calling KVM_TDX_INIT_MEM_REGION to
> initialize the priviate memory.
>
> How about:
>
> Initialize @nr_pages TDX guest private memory starting from @gpa with
> userspace provided data from @source_addr.
>
> Note, before calling this sub command, memory attribute of the range
> [gpa, gpa + nr_pages] needs to be private. Userspace can use
> KVM_SET_MEMORY_ATTRIBUTES to set the attribute.
Fine to me.
>
>
> > > +If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends
> > > measurement
> > > +which corresponds to TDH.MR.EXTEND TDX SEAM call.
> > > +
> > > +- id: KVM_TDX_INIT_MEM_REGION
> > > +- flags: flags
> > > + currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
> > > +- data: pointer to struct kvm_tdx_init_mem_region
> > > +- error: must be 0
> > > +- unused: must be 0
> > > +
> > > +::
> > > +
> > > + #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0)
> > > +
> > > + struct kvm_tdx_init_mem_region {
> > > + __u64 source_addr;
> > > + __u64 gpa;
> > > + __u64 nr_pages;
> > > + };
> > > +
> > > +
> > > +KVM_TDX_FINALIZE_VM
> > > +-------------------
> > > +:Type: vm ioctl
> > > +
> > > +Complete measurement of the initial TD contents and mark it ready to run
> > > +which corresponds to TDH.MR.FINALIZE
> >
> > Missing period at the end of the sentence.
> >
> > And nit again: I don't like the "which corresponds to TDH.MR.FINALIZE".
> >
> > > +
> > > +- id: KVM_TDX_FINALIZE_VM
> > > +- flags: must be 0
> > > +- data: must be 0
> > > +- error: must be 0
> > > +- unused: must be 0
> >
> >
> > This patch doesn't contain KVM_TDX_GET_CPUID. I saw in internal dev
> > branch we have it.
> >
>
> I added it in the internal branch. Next version post should have it.
Great. Perhaps you can send out a fixup patch internally.
>
> > > +
> > > +KVM TDX creation flow
> > > +=====================
> > > +In addition to KVM normal flow, new TDX ioctls need to be called.
> > > The control flow
> > > +looks like as follows.
> > > +
> > > +#. system wide capability check
> >
> > To make all consistent:
> >
> > Check system wide capability
> >
> > > +
> > > + * KVM_CAP_VM_TYPES: check if VM type is supported and if
> > > KVM_X86_TDX_VM
> > > + is supported.
> > > +
> > > +#. creating VM
> >
> > Create VM
> > > +
> > > + * KVM_CREATE_VM
> > > + * KVM_TDX_CAPABILITIES: query if TDX is supported on the platform.
> >
> > "TDX is supported or not" is already checked in step 1.
> >
> > I think we should say:
> >
> > query TDX global capabilities for creating TDX guests.
>
> I don't like the "global" term and I don't think it's correct.
>
> KVM_TDX_CAPABILITIES was requested to change from the platform-scope
> ioctl to VM-scope. It should only report the per-TD capabilities, though
> it's identical for all TDs currently.
Oh I didn't know this. Thanks for pointing out. Could you suggest the
preferred words?
(Btw, as mentioned above if we document the calling sequence of the commands in
each command then perhaps we can get rid of this entire "TDX TDX creation flow"
in which case we won't need to fix those up).
>
> > > + * KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS): set max_vcpus.
> > > KVM_MAX_VCPUS by
> > > + default. KVM_MAX_VCPUS is not a part of ABI, but kernel
> > > internal constant
> > > + that is subject to change. Because max vcpus is a part of
> > > attestation, max
> > > + vcpus should be explicitly set.
> >
> > This is out-of-date.
> >
> > * KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): query maximum vcpus the
> > TDX guest can support (TDX has its own limitation on this).
>
> More precisely, KVM_CHECK_EXTESION(KVM_CAP_MAX_VCPUS) at vm level
Sure:
KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum vCPUs the TD can support
at vm level (TDX has its own limitation on this).
>
> > > + * KVM_SET_TSC_KHZ for vm. optional
> >
> > For consistency:
> >
> > * KVM_SET_TSC_KHZ: optional
>
> More precisely, KVM_SET_TSC_KHZ at VM-level if userspace desires a
> different TSC frequency than the host. Otherwise, host's TSC frequency
> will be configured for the TD.
Sure:
KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency than
host is desired. This is Optional.
>
> >
> > > + * KVM_TDX_INIT_VM: pass TDX specific VM parameters.
> > > +
> > > +#. creating VCPU
> >
> > Create vCPUs
> >
> > > +
> > > + * KVM_CREATE_VCPU
> > > + * KVM_TDX_INIT_VCPU: pass TDX specific VCPU parameters.
> > > + * KVM_SET_CPUID2: Enable CPUID[0x1].ECX.X2APIC(bit 21)=1 so that
> > > the following
> > > + setting of MSR_IA32_APIC_BASE success. Without this,
> > > + KVM_SET_MSRS(MSR_IA32_APIC_BASE) fails.
> >
> > I would prefer to put X2APIC specific to a note:
> >
> > * KVM_SET_CPUID2: configure guest's CPUIDs. Note: Enable ...
> >
> > > + * KVM_SET_MSRS: Set the initial reset value of MSR_IA32_APIC_BASE to
> > > + APIC_DEFAULT_ADDRESS(0xfee00000) | XAPIC_ENABLE(bit 10) |
> > > + X2APIC_ENABLE(bit 11) [| MSR_IA32_APICBASE_BSP(bit 8) optional]
>
> This is not true now. MSR_IA32_APICBASE is read-only MSR for TDX. KVM
> disallows userspace to set MSR_IA32_APICBASE and
> ioctl(KVM_TDX_INIT_VCPU) initialize a correct value for
> MSR_IA32_APICBASE internally.
Perhaps we can just say:
Configure TD's CPUIDs.
>
> > Ditto, I believe there are other MSRs to be set too.
>
> It seems no must-to-be-set MSR for TDX guest.
We can just say: Configure TD's MSRs.
>
> > > +
> > > +#. initializing guest memory
> >
> > Initialize initial guest memory
> >
> > > +
> > > + * allocate guest memory and initialize page same to normal KVM case
> >
> > Cannot parse this.
> >
> > > + In TDX case, parse and load TDVF into guest memory in addition.
> >
> > Don't understand "parse TDVF" either.
> >
> > > + * KVM_TDX_INIT_MEM_REGION to add and measure guest pages.
> > > + If the pages has contents above, those pages need to be added.
> > > + Otherwise the contents will be lost and guest sees zero pages.
> > > + * KVM_TDX_FINALIAZE_VM: Finalize VM and measurement
> > > + This must be after KVM_TDX_INIT_MEM_REGION.
> >
> > Perhaps refine the above to:
> >
> >
> > * Allocate guest memory in the same way as allocating memory for
> > normal VMs.
> > * KVM_TDX_INIT_MEM_REGION to add initial guest memory. Note for
> > now TDX guests only works with TDVF, thus the TDVF needs to be
> > included in the initial guest memory.
> > * KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest.
>
> I'm not sure if we need to mention TDVF here. I think KVM doesn't care
> about it. People can always create a new virtual bios for TDX guest
> themselves.
Fine to remove it.
Powered by blists - more mailing lists