[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7809a177-e170-46f5-b463-3713b79acf22@suse.com>
Date: Tue, 18 Jun 2024 18:10:20 +0300
From: Nikolay Borisov <nik.borisov@...e.com>
To: Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, dave.hansen@...el.com, dan.j.williams@...el.com,
kirill.shutemov@...ux.intel.com, rick.p.edgecombe@...el.com,
peterz@...radead.org, tglx@...utronix.de, bp@...en8.de, mingo@...hat.com,
hpa@...or.com, seanjc@...gle.com, pbonzini@...hat.com, kvm@...r.kernel.org,
isaku.yamahata@...el.com, binbin.wu@...ux.intel.com
Subject: Re: [PATCH 8/9] x86/virt/tdx: Exclude memory region hole within CMR
as TDMR's reserved area
On 16.06.24 г. 15:01 ч., Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
>
> virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> virt/tdx: module initialization failed (-28)
>
> As a step of initializing the TDX module, the kernel tells the TDX
> module all the "TDX-usable memory regions" via a set of TDX architecture
> defined structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB
> aligned and in 1GB granularity, and all "non-TDX-usable memory holes" in
> a given TDMR must be marked as a "reserved area". Each TDMR only
> supports a maximum number of reserved areas reported by the TDX module.
>
> As shown above, the root cause of this failure is when the kernel tries
> to construct a TDMR to cover address range [0x0, 0x80000000), there
> are too many memory holes within that range and the number of memory
> holes exceeds the maximum number of reserved areas.
>
> The E820 table of that platform (see [1] below) reflects this: the
> number of memory holes among e820 "usable" entries exceeds 16, which is
> the maximum number of reserved areas TDX module supports in practice.
>
> === Fix ===
>
> There are two options to fix this: 1) put less memory holes as "reserved
> area" when constructing a TDMR; 2) reduce the TDMR's size to cover less
> memory regions, thus less memory holes.
>
> Option 1) is possible, and in fact is easier and preferable:
>
> TDX actually has a concept of "Convertible Memory Regions" (CMRs). TDX
> reports a list of CMRs that meet TDX's security requirements on memory.
> TDX requires all the "TDX-usable memory regions" that the kernel passes
> to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> must be convertible memory.
>
> In other words, if a memory hole is indeed CMR, then it's not mandatory
So TDX requires all TDMR to be CMR, and CMR regions are reported by the
BIOS, how did you arrive at the conclusion that if a hole is CMR there
is no point in creating a TDMR for it?
> for the kernel to add it to the reserved areas. The number of consumed
> reserved areas can be reduced if the kernel doesn't add those memory
> holes as reserved area. Note this doesn't have security impact because
> the kernel is out of TDX's TCB anyway.
>
> This is feasible because in practice the CMRs just reflect the nature of
> whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> large range w/o being split into small areas, e.g., in the way the e820
> table does to contain a lot "ACPI *" entries. [2] below shows the CMRs
> reported on the problematic platform (using the off-tree TDX code).
>
> So for this particular module initialization failure, the memory holes
> that are within [0x0, 0x80000000) are mostly indeed CMR. By not adding
> them to the reserved areas, the number of consumed reserved areas for
> the TDMR [0x0, 0x80000000) can be dramatically reduced.
>
> On the other hand, although option 2) is also theoretically feasible, it
> requires more complicated logic to handle around splitting TDMR into
> smaller ones. E.g., today one memory region must be fully in one TDMR,
> while splitting TDMR will result in each TDMR only covering part of some
> memory region. And this also increases the total number of TDMRs, which
> also cannot exceed a maximum value that TDX module supports.
>
<snip>
>
> Signed-off-by: Kai Huang <kai.huang@...el.com>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 149 ++++++++++++++++++++++++++++++++----
> arch/x86/virt/vmx/tdx/tdx.h | 13 ++++
> 2 files changed, 146 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index ced40e3b516e..88a0c8b788b7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -293,6 +293,10 @@ static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
> return 0;
> }
>
> +/* Wrapper to read one metadata field to u8/u16/u32/u64 */
> +#define stbuf_read_sysmd_single(_field_id, _pdata) \
> + stbuf_read_sysmd_field(_field_id, _pdata, 0, sizeof(typeof(*(_pdata))))
What value does adding yet another level of indirection bring here?
> +
> struct field_mapping {
> u64 field_id;
> int offset;
> @@ -349,6 +353,76 @@ static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
> return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
> }
>
> +/* Update the @cmr_info->num_cmrs to trim tail empty CMRs */
> +static void trim_empty_tail_cmrs(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> + int i;
> +
> + for (i = 0; i < cmr_info->num_cmrs; i++) {
> + u64 cmr_base = cmr_info->cmr_base[i];
> + u64 cmr_size = cmr_info->cmr_size[i];
> +
> + if (!cmr_size) {
> + WARN_ON_ONCE(cmr_base);
> + break;
> + }
> +
> + /* TDX architecture: CMR must be 4KB aligned */
> + WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> + !PAGE_ALIGNED(cmr_size));
> + }
> +
> + cmr_info->num_cmrs = i;
> +}
That function is somewhat weird, on the one hand its name suggests it's
doing some "optimisation" i.e removing empty cmrs, at the same time it
will simply cap the number of CMRs until it meets the first empty CMR,
what aif we have and will also WARN. In fact it could even crash the
machine if panic_on_warn is enabled, furthermore the alignement checks
suggest it's actually some sanity checking function. Furthermore if we
have:"
ORDINARY_CMR,EMPTY_CMR,ORDINARY_CMR
(Is such a scenario even possible), in this case we'll ommit also the
last ordinary cmr region?
> +
> +#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
> + TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_cmr_info, _member)
nit: Again, no real value in introducing yet another level of
indirection in this case.
> +
> +static int get_tdx_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> + int i, ret;
> +
> + ret = stbuf_read_sysmd_single(MD_FIELD_ID_NUM_CMRS,
> + &cmr_info->num_cmrs);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < cmr_info->num_cmrs; i++) {
> + const struct field_mapping fields[] = {
> + TD_SYSINFO_MAP_CMR_INFO(CMR_BASE0 + i, cmr_base[i]),
> + TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE0 + i, cmr_size[i]),
> + };
> +
> + ret = stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields),
> + cmr_info);
> + if (ret)
> + return ret;
> + }
> +
> + /*
> + * The TDX module may just report the maximum number of CMRs that
> + * TDX architecturally supports as the actual number of CMRs,
> + * despite the latter is smaller. In this case all the tail
> + * CMRs will be empty. Trim them away.
> + */
> + trim_empty_tail_cmrs(cmr_info);
> +
> + return 0;
> +}
> +
> +static void print_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
> +{
> + int i;
> +
> + for (i = 0; i < cmr_info->num_cmrs; i++) {
> + u64 cmr_base = cmr_info->cmr_base[i];
> + u64 cmr_size = cmr_info->cmr_size[i];
> +
> + pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
> + cmr_base + cmr_size);
> + }
> +}
Do we really want to always print all CMR regions, won't that become way
too spammy and isn't this really useful in debug scenarios? Perhaps gate
this particular information behind a debug flag?
> +
> static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
> {
> struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
<snip>
Powered by blists - more mailing lists