lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Jun 2024 18:10:20 +0300
From: Nikolay Borisov <nik.borisov@...e.com>
To: Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, dave.hansen@...el.com, dan.j.williams@...el.com,
 kirill.shutemov@...ux.intel.com, rick.p.edgecombe@...el.com,
 peterz@...radead.org, tglx@...utronix.de, bp@...en8.de, mingo@...hat.com,
 hpa@...or.com, seanjc@...gle.com, pbonzini@...hat.com, kvm@...r.kernel.org,
 isaku.yamahata@...el.com, binbin.wu@...ux.intel.com
Subject: Re: [PATCH 8/9] x86/virt/tdx: Exclude memory region hole within CMR
 as TDMR's reserved area



On 16.06.24 г. 15:01 ч., Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
> 
>    virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>    virt/tdx: module initialization failed (-28)
> 
> As a step of initializing the TDX module, the kernel tells the TDX
> module all the "TDX-usable memory regions" via a set of TDX architecture
> defined structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB
> aligned and in 1GB granularity, and all "non-TDX-usable memory holes" in
> a given TDMR must be marked as a "reserved area".  Each TDMR only
> supports a maximum number of reserved areas reported by the TDX module.
> 
> As shown above, the root cause of this failure is when the kernel tries
> to construct a TDMR to cover address range [0x0, 0x80000000), there
> are too many memory holes within that range and the number of memory
> holes exceeds the maximum number of reserved areas.
> 
> The E820 table of that platform (see [1] below) reflects this: the
> number of memory holes among e820 "usable" entries exceeds 16, which is
> the maximum number of reserved areas TDX module supports in practice.
> 
> === Fix ===
> 
> There are two options to fix this: 1) put less memory holes as "reserved
> area" when constructing a TDMR; 2) reduce the TDMR's size to cover less
> memory regions, thus less memory holes.
> 
> Option 1) is possible, and in fact is easier and preferable:
> 
> TDX actually has a concept of "Convertible Memory Regions" (CMRs).  TDX
> reports a list of CMRs that meet TDX's security requirements on memory.
> TDX requires all the "TDX-usable memory regions" that the kernel passes
> to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> must be convertible memory.
> 
> In other words, if a memory hole is indeed CMR, then it's not mandatory

So TDX requires all TDMR to be CMR, and CMR regions are reported by the 
BIOS, how did you arrive at the conclusion that if a hole is CMR there 
is no point in creating a TDMR for it?

> for the kernel to add it to the reserved areas.  The number of consumed
> reserved areas can be reduced if the kernel doesn't add those memory
> holes as reserved area.  Note this doesn't have security impact because
> the kernel is out of TDX's TCB anyway.
> 
> This is feasible because in practice the CMRs just reflect the nature of
> whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> large range w/o being split into small areas, e.g., in the way the e820
> table does to contain a lot "ACPI *" entries.  [2] below shows the CMRs
> reported on the problematic platform (using the off-tree TDX code).
> 
> So for this particular module initialization failure, the memory holes
> that are within [0x0, 0x80000000) are mostly indeed CMR.  By not adding
> them to the reserved areas, the number of consumed reserved areas for
> the TDMR [0x0, 0x80000000) can be dramatically reduced.
> 
> On the other hand, although option 2) is also theoretically feasible, it
> requires more complicated logic to handle around splitting TDMR into
> smaller ones.  E.g., today one memory region must be fully in one TDMR,
> while splitting TDMR will result in each TDMR only covering part of some
> memory region.  And this also increases the total number of TDMRs, which
> also cannot exceed a maximum value that TDX module supports.
> 

<snip>

> 
> Signed-off-by: Kai Huang <kai.huang@...el.com>
> ---
>   arch/x86/virt/vmx/tdx/tdx.c | 149 ++++++++++++++++++++++++++++++++----
>   arch/x86/virt/vmx/tdx/tdx.h |  13 ++++
>   2 files changed, 146 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index ced40e3b516e..88a0c8b788b7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -293,6 +293,10 @@ static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
>   	return 0;
>   }
>   
> +/* Wrapper to read one metadata field to u8/u16/u32/u64 */
> +#define stbuf_read_sysmd_single(_field_id, _pdata)	\
> +	stbuf_read_sysmd_field(_field_id, _pdata, 0, sizeof(typeof(*(_pdata))))

What value does adding yet another level of indirection bring here?

> +
>   struct field_mapping {
>   	u64 field_id;
>   	int offset;
> @@ -349,6 +353,76 @@ static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
>   	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
>   }
>   
> +/* Update the @cmr_info->num_cmrs to trim tail empty CMRs */
> +static void trim_empty_tail_cmrs(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> +	int i;
> +
> +	for (i = 0; i < cmr_info->num_cmrs; i++) {
> +		u64 cmr_base = cmr_info->cmr_base[i];
> +		u64 cmr_size = cmr_info->cmr_size[i];
> +
> +		if (!cmr_size) {
> +			WARN_ON_ONCE(cmr_base);
> +			break;
> +		}
> +
> +		/* TDX architecture: CMR must be 4KB aligned */
> +		WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> +				!PAGE_ALIGNED(cmr_size));
> +	}
> +
> +	cmr_info->num_cmrs = i;
> +}

That function is somewhat weird, on the one hand its name suggests it's 
doing some "optimisation" i.e removing empty cmrs, at the same time it 
will simply cap the number of CMRs until it meets the first empty CMR, 
what aif we have and will also WARN. In fact it could even crash the 
machine if panic_on_warn is enabled, furthermore the alignement checks 
suggest it's actually some sanity checking function. Furthermore if we 
have:"

ORDINARY_CMR,EMPTY_CMR,ORDINARY_CMR

(Is such a scenario even possible), in this case we'll ommit also the 
last ordinary cmr region?

> +
> +#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member)	\
> +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_cmr_info, _member)

nit: Again, no real value in introducing yet another level of 
indirection in this case.

> +
> +static int get_tdx_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> +	int i, ret;
> +
> +	ret = stbuf_read_sysmd_single(MD_FIELD_ID_NUM_CMRS,
> +			&cmr_info->num_cmrs);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < cmr_info->num_cmrs; i++) {
> +		const struct field_mapping fields[] = {
> +			TD_SYSINFO_MAP_CMR_INFO(CMR_BASE0 + i, cmr_base[i]),
> +			TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE0 + i, cmr_size[i]),
> +		};
> +
> +		ret = stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields),
> +				cmr_info);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/*
> +	 * The TDX module may just report the maximum number of CMRs that
> +	 * TDX architecturally supports as the actual number of CMRs,
> +	 * despite the latter is smaller.  In this case all the tail
> +	 * CMRs will be empty.  Trim them away.
> +	 */
> +	trim_empty_tail_cmrs(cmr_info);
> +
> +	return 0;
> +}
> +
> +static void print_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> +	int i;
> +
> +	for (i = 0; i < cmr_info->num_cmrs; i++) {
> +		u64 cmr_base = cmr_info->cmr_base[i];
> +		u64 cmr_size = cmr_info->cmr_size[i];
> +
> +		pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
> +				cmr_base + cmr_size);
> +	}
> +}

Do we really want to always print all CMR regions, won't that become way 
too spammy and isn't this really useful in debug scenarios? Perhaps gate 
this particular information behind a debug flag?

> +
>   static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
>   {
>   	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;

<snip>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ