[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <850e0899-d54e-6a49-851e-56f4d353905c@intel.com>
Date: Tue, 22 Nov 2022 15:39:24 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org
Cc: linux-mm@...ck.org, seanjc@...gle.com, pbonzini@...hat.com,
dan.j.williams@...el.com, rafael.j.wysocki@...el.com,
kirill.shutemov@...ux.intel.com, ying.huang@...el.com,
reinette.chatre@...el.com, len.brown@...el.com,
tony.luck@...el.com, peterz@...radead.org, ak@...ux.intel.com,
isaku.yamahata@...el.com, chao.gao@...el.com,
sathyanarayanan.kuppuswamy@...ux.intel.com, bagasdotme@...il.com,
sagis@...gle.com, imammedo@...hat.com
Subject: Re: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module
and TDX-capable memory
On 11/20/22 16:26, Kai Huang wrote:
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums. Not all memory
> satisfies these requirements.
>
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR). During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees. The list of these
> ranges, along with TDX module information, is available to the kernel by
> querying the TDX module via TDH.SYS.INFO SEAMCALL.
I think the last sentence goes too far. What does it matter what the
name of the SEAMCALL is? Who cares at this point? It's in the patch.
Scroll down two pages if you really care.
> The host kernel can choose whether or not to use all convertible memory
> regions as TDX-usable memory. Before the TDX module is ready to create
> any TDX guests, the kernel needs to configure the TDX-usable memory
> regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
> module. Constructing the TDMR array requires information of both the
> TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call
> TDH.SYS.INFO to get this information as a preparation.
That last sentece is kinda goofy. I think there's a way to distill this
whole thing down more effecively.
CMRs tell the kernel which memory is TDX compatible. The kernel
takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs
let the kernel grante TDX protections to some or all of the CMR
areas.
> Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
I find it very useful to be precise when referring to code. Your code
says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the
difference?
> having to pass them as function arguments when constructing the TDMR
> array. And they are too big to be put to the stack anyway. Also, KVM
> needs to use the TDSYSINFO_STRUCT to create TDX guests.
This is also a great place to mention that the tdsysinfo_struct contains
a *lot* of gunk which will not be used for a bit or that may never get
used.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2cf7090667aa..43227af25e44 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -15,6 +15,7 @@
> #include <linux/cpumask.h>
> #include <linux/smp.h>
> #include <linux/atomic.h>
> +#include <linux/align.h>
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/apic.h>
> @@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status;
> /* Prevent concurrent attempts on TDX detection and initialization */
> static DEFINE_MUTEX(tdx_module_lock);
>
> +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */
> +static struct tdsysinfo_struct tdx_sysinfo;
> +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
> +static int tdx_cmr_num;
> +
> /*
> * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> * BIOS. Both initializing the TDX module and running TDX guest require
> @@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void)
> return atomic_read(&sc.err);
> }
>
> +static inline bool is_cmr_empty(struct cmr_info *cmr)
> +{
> + return !cmr->size;
> +}
> +
> +static inline bool is_cmr_ok(struct cmr_info *cmr)
> +{
> + /* CMR must be page aligned */
> + return IS_ALIGNED(cmr->base, PAGE_SIZE) &&
> + IS_ALIGNED(cmr->size, PAGE_SIZE);
> +}
> +
> +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num,
> + const char *name)
> +{
> + int i;
> +
> + for (i = 0; i < cmr_num; i++) {
> + struct cmr_info *cmr = &cmr_array[i];
> +
> + pr_info("%s : [0x%llx, 0x%llx)\n", name,
> + cmr->base, cmr->base + cmr->size);
> + }
> +}
> +
> +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
> +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
> +{
> + struct cmr_info *cmr;
> + int i, cmr_num;
> +
> + /*
> + * Intel TDX module spec, 20.7.3 CMR_INFO:
> + *
> + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
> + * array of CMR_INFO entries. The CMRs are sorted from the
> + * lowest base address to the highest base address, and they
> + * are non-overlapping.
> + *
> + * This implies that BIOS may generate invalid empty entries
> + * if total CMRs are less than 32. Need to skip them manually.
> + *
> + * CMR also must be 4K aligned. TDX doesn't trust BIOS. TDX
> + * actually verifies CMRs before it gets enabled, so anything
> + * doesn't meet above means kernel bug (or TDX is broken).
> + */
I dislike comments like this that describe all the code below. Can't
you simply put the comment near the code that implements it?
> + cmr = &cmr_array[0];
> + /* There must be at least one valid CMR */
> + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> + goto err;
> +
> + cmr_num = *actual_cmr_num;
> + for (i = 1; i < cmr_num; i++) {
> + struct cmr_info *cmr = &cmr_array[i];
> + struct cmr_info *prev_cmr = NULL;
> +
> + /* Skip further empty CMRs */
> + if (is_cmr_empty(cmr))
> + break;
> +
> + /*
> + * Do sanity check anyway to make sure CMRs:
> + * - are 4K aligned
> + * - don't overlap
> + * - are in address ascending order.
> + */
> + if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> + goto err;
Why does cmr_array[0] get a pass on the empty and sanity checks?
> + prev_cmr = &cmr_array[i - 1];
> + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> + cmr->base))
> + goto err;
> + }
> +
> + /* Update the actual number of CMRs */
> + *actual_cmr_num = i;
That comment is not helpful. Yes, this is literally updating the number
of CMRs. Literally. That's the "what". But, the "why" is important.
Why is it doing this?
> + /* Print kernel checked CMRs */
> + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");
This is the point where I start to lose patience with these comments.
These are just a waste of space.
Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now,
it'll print an 'actual_cmr_num=1' number of CMRs as being
"kernel-checked". Why? That makes zero sense.
> + return 0;
> +err:
> + pr_info("[TDX broken ?]: Invalid CMRs detected\n");
> + print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
> + return -EINVAL;
> +}
> +
> +static int tdx_get_sysinfo(void)
> +{
> + struct tdx_module_output out;
> + int ret;
> +
> + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
> +
> + ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
> + __pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
> + if (ret)
> + return ret;
> +
> + /* R9 contains the actual entries written the CMR array. */
> + tdx_cmr_num = out.r9;
> +
> + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> + tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> + tdx_sysinfo.build_date, tdx_sysinfo.build_num);
This is a case where a little bit of vertical alignment will go a long way:
> + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> + tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> + tdx_sysinfo.build_date, tdx_sysinfo.build_num);
> +
> + /*
> + * trim_empty_cmrs() updates the actual number of CMRs by
> + * dropping all tail empty CMRs.
> + */
> + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> +}
Why does this both need to respect the "tdx_cmr_num = out.r9" value
*and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num =
out.r9" value and just trim the empty ones either way? It's not like
there is a billion of them. It would simplify the code for sure.
> /*
> * Detect and initialize the TDX module.
> *
> @@ -232,6 +353,10 @@ static int init_tdx_module(void)
> if (ret)
> goto out;
>
> + ret = tdx_get_sysinfo();
> + if (ret)
> + goto out;
> +
> /*
> * Return -EINVAL until all steps of TDX module initialization
> * process are done.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 9ba11808bd45..8e273756098c 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -15,10 +15,71 @@
> /*
> * TDX module SEAMCALL leaf functions
> */
> +#define TDH_SYS_INFO 32
> #define TDH_SYS_INIT 33
> #define TDH_SYS_LP_INIT 35
> #define TDH_SYS_LP_SHUTDOWN 44
>
> +struct cmr_info {
> + u64 base;
> + u64 size;
> +} __packed;
> +
> +#define MAX_CMRS 32
> +#define CMR_INFO_ARRAY_ALIGNMENT 512
> +
> +struct cpuid_config {
> + u32 leaf;
> + u32 sub_leaf;
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +
> +#define TDSYSINFO_STRUCT_SIZE 1024
> +#define TDSYSINFO_STRUCT_ALIGNMENT 1024
> +
> +struct tdsysinfo_struct {
> + /* TDX-SEAM Module Info */
> + u32 attributes;
> + u32 vendor_id;
> + u32 build_date;
> + u16 build_num;
> + u16 minor_version;
> + u16 major_version;
> + u8 reserved0[14];
> + /* Memory Info */
> + u16 max_tdmrs;
> + u16 max_reserved_per_tdmr;
> + u16 pamt_entry_size;
> + u8 reserved1[10];
> + /* Control Struct Info */
> + u16 tdcs_base_size;
> + u8 reserved2[2];
> + u16 tdvps_base_size;
> + u8 tdvps_xfam_dependent_size;
> + u8 reserved3[9];
> + /* TD Capabilities */
> + u64 attributes_fixed0;
> + u64 attributes_fixed1;
> + u64 xfam_fixed0;
> + u64 xfam_fixed1;
> + u8 reserved4[32];
> + u32 num_cpuid_config;
> + /*
> + * The actual number of CPUID_CONFIG depends on above
> + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct'
> + * is 1024B defined by TDX architecture. Use a union with
> + * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> + * equal to 1024.
> + */
> + union {
> + struct cpuid_config cpuid_configs[0];
> + u8 reserved5[892];
> + };
Can you double check what the "right" way to do variable arrays is these
days? I thought the [0] method was discouraged.
Also, it isn't *really* 892 bytes of reserved space, right? Anything
that's not cpuid_configs[] is reserved, I presume. Could you try to be
more precise there?
> +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
> +
> /*
> * Do not put any hardware-defined TDX structure representations below
> * this comment!
Powered by blists - more mailing lists