lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <850e0899-d54e-6a49-851e-56f4d353905c@intel.com>
Date:   Tue, 22 Nov 2022 15:39:24 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     linux-mm@...ck.org, seanjc@...gle.com, pbonzini@...hat.com,
        dan.j.williams@...el.com, rafael.j.wysocki@...el.com,
        kirill.shutemov@...ux.intel.com, ying.huang@...el.com,
        reinette.chatre@...el.com, len.brown@...el.com,
        tony.luck@...el.com, peterz@...radead.org, ak@...ux.intel.com,
        isaku.yamahata@...el.com, chao.gao@...el.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, bagasdotme@...il.com,
        sagis@...gle.com, imammedo@...hat.com
Subject: Re: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module
 and TDX-capable memory

On 11/20/22 16:26, Kai Huang wrote:
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums.  Not all memory
> satisfies these requirements.
> 
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR).  During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.  The list of these
> ranges, along with TDX module information, is available to the kernel by
> querying the TDX module via TDH.SYS.INFO SEAMCALL.

I think the last sentence goes too far.  What does it matter what the
name of the SEAMCALL is?  Who cares at this point?  It's in the patch.
Scroll down two pages if you really care.

> The host kernel can choose whether or not to use all convertible memory
> regions as TDX-usable memory.  Before the TDX module is ready to create
> any TDX guests, the kernel needs to configure the TDX-usable memory
> regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
> module.  Constructing the TDMR array requires information of both the
> TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions.  Call
> TDH.SYS.INFO to get this information as a preparation.

That last sentece is kinda goofy.  I think there's a way to distill this
whole thing down more effecively.

	CMRs tell the kernel which memory is TDX compatible.  The kernel
	takes CMRs and constructs  "TD Memory Regions" (TDMRs).  TDMRs
	let the kernel grante TDX protections to some or all of the CMR
	areas.
	
> Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid

I find it very useful to be precise when referring to code.  Your code
says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'.  Why the
difference?

> having to pass them as function arguments when constructing the TDMR
> array.  And they are too big to be put to the stack anyway.  Also, KVM
> needs to use the TDSYSINFO_STRUCT to create TDX guests.

This is also a great place to mention that the tdsysinfo_struct contains
a *lot* of gunk which will not be used for a bit or that may never get
used.

> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2cf7090667aa..43227af25e44 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -15,6 +15,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/smp.h>
>  #include <linux/atomic.h>
> +#include <linux/align.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/apic.h>
> @@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status;
>  /* Prevent concurrent attempts on TDX detection and initialization */
>  static DEFINE_MUTEX(tdx_module_lock);
>  
> +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */
> +static struct tdsysinfo_struct tdx_sysinfo;
> +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
> +static int tdx_cmr_num;
> +
>  /*
>   * Detect TDX private KeyIDs to see whether TDX has been enabled by the
>   * BIOS.  Both initializing the TDX module and running TDX guest require
> @@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void)
>  	return atomic_read(&sc.err);
>  }
>  
> +static inline bool is_cmr_empty(struct cmr_info *cmr)
> +{
> +	return !cmr->size;
> +}
> +
> +static inline bool is_cmr_ok(struct cmr_info *cmr)
> +{
> +	/* CMR must be page aligned */
> +	return IS_ALIGNED(cmr->base, PAGE_SIZE) &&
> +		IS_ALIGNED(cmr->size, PAGE_SIZE);
> +}
> +
> +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num,
> +		       const char *name)
> +{
> +	int i;
> +
> +	for (i = 0; i < cmr_num; i++) {
> +		struct cmr_info *cmr = &cmr_array[i];
> +
> +		pr_info("%s : [0x%llx, 0x%llx)\n", name,
> +				cmr->base, cmr->base + cmr->size);
> +	}
> +}
> +
> +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
> +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
> +{
> +	struct cmr_info *cmr;
> +	int i, cmr_num;
> +
> +	/*
> +	 * Intel TDX module spec, 20.7.3 CMR_INFO:
> +	 *
> +	 *   TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
> +	 *   array of CMR_INFO entries. The CMRs are sorted from the
> +	 *   lowest base address to the highest base address, and they
> +	 *   are non-overlapping.
> +	 *
> +	 * This implies that BIOS may generate invalid empty entries
> +	 * if total CMRs are less than 32.  Need to skip them manually.
> +	 *
> +	 * CMR also must be 4K aligned.  TDX doesn't trust BIOS.  TDX
> +	 * actually verifies CMRs before it gets enabled, so anything
> +	 * doesn't meet above means kernel bug (or TDX is broken).
> +	 */

I dislike comments like this that describe all the code below.  Can't
you simply put the comment near the code that implements it?

> +	cmr = &cmr_array[0];
> +	/* There must be at least one valid CMR */
> +	if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> +		goto err;
> +
> +	cmr_num = *actual_cmr_num;
> +	for (i = 1; i < cmr_num; i++) {
> +		struct cmr_info *cmr = &cmr_array[i];
> +		struct cmr_info *prev_cmr = NULL;
> +
> +		/* Skip further empty CMRs */
> +		if (is_cmr_empty(cmr))
> +			break;
> +
> +		/*
> +		 * Do sanity check anyway to make sure CMRs:
> +		 *  - are 4K aligned
> +		 *  - don't overlap
> +		 *  - are in address ascending order.
> +		 */
> +		if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> +			goto err;

Why does cmr_array[0] get a pass on the empty and sanity checks?

> +		prev_cmr = &cmr_array[i - 1];
> +		if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> +					cmr->base))
> +			goto err;
> +	}
> +
> +	/* Update the actual number of CMRs */
> +	*actual_cmr_num = i;

That comment is not helpful.  Yes, this is literally updating the number
of CMRs.  Literally.  That's the "what".  But, the "why" is important.
Why is it doing this?

> +	/* Print kernel checked CMRs */
> +	print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");

This is the point where I start to lose patience with these comments.
These are just a waste of space.

Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok().  Now,
it'll print an 'actual_cmr_num=1' number of CMRs as being
"kernel-checked".  Why?  That makes zero sense.

> +	return 0;
> +err:
> +	pr_info("[TDX broken ?]: Invalid CMRs detected\n");
> +	print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
> +	return -EINVAL;
> +}
> +
> +static int tdx_get_sysinfo(void)
> +{
> +	struct tdx_module_output out;
> +	int ret;
> +
> +	BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
> +
> +	ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
> +			__pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
> +	if (ret)
> +		return ret;
> +
> +	/* R9 contains the actual entries written the CMR array. */
> +	tdx_cmr_num = out.r9;
> +
> +	pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> +		tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> +		tdx_sysinfo.build_date, tdx_sysinfo.build_num);

This is a case where a little bit of vertical alignment will go a long way:

> +		tdx_sysinfo.attributes,    tdx_sysinfo.vendor_id,
> +		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> +		tdx_sysinfo.build_date,    tdx_sysinfo.build_num);

> +
> +	/*
> +	 * trim_empty_cmrs() updates the actual number of CMRs by
> +	 * dropping all tail empty CMRs.
> +	 */
> +	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> +}

Why does this both need to respect the "tdx_cmr_num = out.r9" value
*and* trim the empty ones?  Couldn't it just ignore the "tdx_cmr_num =
out.r9" value and just trim the empty ones either way?  It's not like
there is a billion of them.  It would simplify the code for sure.

>  /*
>   * Detect and initialize the TDX module.
>   *
> @@ -232,6 +353,10 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out;
>  
> +	ret = tdx_get_sysinfo();
> +	if (ret)
> +		goto out;
> +
>  	/*
>  	 * Return -EINVAL until all steps of TDX module initialization
>  	 * process are done.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 9ba11808bd45..8e273756098c 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -15,10 +15,71 @@
>  /*
>   * TDX module SEAMCALL leaf functions
>   */
> +#define TDH_SYS_INFO		32
>  #define TDH_SYS_INIT		33
>  #define TDH_SYS_LP_INIT		35
>  #define TDH_SYS_LP_SHUTDOWN	44
>  
> +struct cmr_info {
> +	u64	base;
> +	u64	size;
> +} __packed;
> +
> +#define MAX_CMRS			32
> +#define CMR_INFO_ARRAY_ALIGNMENT	512
> +
> +struct cpuid_config {
> +	u32	leaf;
> +	u32	sub_leaf;
> +	u32	eax;
> +	u32	ebx;
> +	u32	ecx;
> +	u32	edx;
> +} __packed;
> +
> +#define TDSYSINFO_STRUCT_SIZE		1024
> +#define TDSYSINFO_STRUCT_ALIGNMENT	1024
> +
> +struct tdsysinfo_struct {
> +	/* TDX-SEAM Module Info */
> +	u32	attributes;
> +	u32	vendor_id;
> +	u32	build_date;
> +	u16	build_num;
> +	u16	minor_version;
> +	u16	major_version;
> +	u8	reserved0[14];
> +	/* Memory Info */
> +	u16	max_tdmrs;
> +	u16	max_reserved_per_tdmr;
> +	u16	pamt_entry_size;
> +	u8	reserved1[10];
> +	/* Control Struct Info */
> +	u16	tdcs_base_size;
> +	u8	reserved2[2];
> +	u16	tdvps_base_size;
> +	u8	tdvps_xfam_dependent_size;
> +	u8	reserved3[9];
> +	/* TD Capabilities */
> +	u64	attributes_fixed0;
> +	u64	attributes_fixed1;
> +	u64	xfam_fixed0;
> +	u64	xfam_fixed1;
> +	u8	reserved4[32];
> +	u32	num_cpuid_config;
> +	/*
> +	 * The actual number of CPUID_CONFIG depends on above
> +	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
> +	 * is 1024B defined by TDX architecture.  Use a union with
> +	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> +	 * equal to 1024.
> +	 */
> +	union {
> +		struct cpuid_config	cpuid_configs[0];
> +		u8			reserved5[892];
> +	};

Can you double check what the "right" way to do variable arrays is these
days?  I thought the [0] method was discouraged.

Also, it isn't *really* 892 bytes of reserved space, right?  Anything
that's not cpuid_configs[] is reserved, I presume.  Could you try to be
more precise there?

> +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
> +
>  /*
>   * Do not put any hardware-defined TDX structure representations below
>   * this comment!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ