lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66b1aaee6e7d9_4fc72941e@dwillia2-xfh.jf.intel.com.notmuch>
Date: Mon, 5 Aug 2024 21:47:42 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Kai Huang <kai.huang@...el.com>, <dave.hansen@...el.com>,
	<kirill.shutemov@...ux.intel.com>, <bp@...en8.de>, <tglx@...utronix.de>,
	<peterz@...radead.org>, <mingo@...hat.com>, <hpa@...or.com>,
	<seanjc@...gle.com>, <pbonzini@...hat.com>, <dan.j.williams@...el.com>
CC: <x86@...nel.org>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<rick.p.edgecombe@...el.com>, <isaku.yamahata@...el.com>,
	<chao.gao@...el.com>, <binbin.wu@...ux.intel.com>, <kai.huang@...el.com>
Subject: Re: [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by
 using CMRs to find memory holes

Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
> 
>   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>   virt/tdx: module initialization failed (-28)
> 
> As part of initializing the TDX module, the kernel informs the TDX
> module of all "TDX-usable memory regions" using an array of TDX defined
> structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> given TDMR must be marked as "reserved areas".  The TDX module reports a
> maximum number of reserved areas that can be supported per TDMR.
> 
> Currently, the kernel finds those "non-TDX-usable memory holes" within a
> given TDMR by walking over a list of "TDX-usable memory regions", which
> essentially reflects the "usable" regions in the e820 table (w/o memory
> hotplug operations precisely, but this is not relevant here).
> 
> As shown above, the root cause of this failure is when the kernel tries
> to construct a TDMR to cover address range [0x0, 0x80000000), there
> are too many memory holes within that range and the number of memory
> holes exceeds the maximum number of reserved areas.
> 
> The E820 table of that platform (see [1] below) reflects this: the
> number of memory holes among e820 "usable" entries exceeds 16, which is
> the maximum number of reserved areas TDX module supports in practice.
> 
> === Fix ===
> 
> There are two options to fix this: 1) reduce the number of memory holes
> when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
> size to cover fewer memory regions, thus fewer memory holes.

What about option3? Fix the BIOS. If the BIOS scattershots low memory
holes why does the kernel need to suffer the burden of putting it back
together?

What about option4? Don't use for_each_mem_pfn_range() to populate TDMRs
and instead walk the resource tree asking if each resource is covered by
a CMR then add it to the TDMR list. In other words starting from
page-allocatable memory to populate the list seems like the wrong
starting point.

> Option 1) is possible, and in fact is easier and preferable:
> 
> TDX actually has a concept of "Convertible Memory Regions" (CMRs).  TDX
> reports a list of CMRs that meet TDX's security requirements on memory.
> TDX requires all the "TDX-usable memory regions" that the kernel passes
> to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> must be convertible memory.
> 
> In other words, if a memory hole is indeed CMR, then it's not mandatory
> for the kernel to add it to the reserved areas.  By doing so, the number
> of consumed reserved areas can be reduced w/o having any functional
> impact.  The kernel still allocates TDX memory from the page allocator.
> There's no harm if the kernel tells the TDX module some memory regions
> are "TDX-usable" but they will never be allocated by the kernel as TDX
> memory.
> 
> Note this doesn't have any security impact either because the kernel is
> out of TDX's TCB anyway.
> 
> This is feasible because in practice the CMRs just reflect the nature of
> whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> large, uninterrupted range of memory, i.e., unlike the e820 table which
> contains numerous "ACPI *" entries in the first 2G range.  Refer to [2]
> for CMRs reported on the problematic platform using off-tree TDX code.
> 
> So for this particular module initialization failure, the memory holes
> that are within [0x0, 0x80000000) are mostly indeed CMR.  By not adding
> them to the reserved areas, the number of consumed reserved areas for
> the TDMR [0x0, 0x80000000) can be dramatically reduced.
> 
> Option 2) is also theoretically feasible, but it is not desired:
> 
> It requires more complicated logic to handle splitting TDMR into smaller
> ones, which isn't trivial.  There are limitations to splitting TDMR too,
> thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
> be split any further; 2) This also increases the total number of TDMRs,
> which also has a maximum value limited by the TDX module.
> 
> So, fix this issue by using option 1):
> 
> 1) reading out the CMRs from the TDX module global metadata, and
> 2) changing to find memory holes for a given TDMR based on CMRs, but not
>    based on the list of "TDX-usable memory regions".
> 
> Also dump the CMRs in dmesg.  They are helpful when something goes wrong
> around "constructing the TDMRs and configuring the TDX module with
> them".  Note there are no existing userspace tools that the user can get
> CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).
> 
> [1] BIOS-E820 table of the problematic platform:
> 
>   BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
>   BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
>   BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
>   BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
>   BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
>   BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
>   BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
>   BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
>   BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
>   BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
>   BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
>   BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
>   BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
>   BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
>   BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
>   BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
>   BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
>   BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
>   BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
>   BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
>   BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
>   BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
>   BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
>   BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
>   BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
>   BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
>   BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
>   BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
>   BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
>   BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
>   BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
>   BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
>   BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
>   BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
>   BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
>   BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
>   ......
> 
> [2] Convertible Memory Regions of the problematic platform:
> 
>   virt/tdx: CMR: [0x100000, 0x6f800000)
>   virt/tdx: CMR: [0x100000000, 0x107a000000)
>   virt/tdx: CMR: [0x1080000000, 0x207c000000)
>   virt/tdx: CMR: [0x2080000000, 0x307c000000)
>   virt/tdx: CMR: [0x3080000000, 0x407c000000)
> 
> Signed-off-by: Kai Huang <kai.huang@...el.com>

It bothers me that this fix buried behind a bunch of other cleanup, but
I guess that is ok if this issue is not urgent. If it *is* urgent then
maybe a fix without so many dependencies would be more appropriate.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ