lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83df85a3b318e6578628692ce0d28b9cf736061e.camel@intel.com>
Date: Tue, 10 Dec 2024 02:26:44 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Hansen, Dave" <dave.hansen@...el.com>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "bp@...en8.de" <bp@...en8.de>, "peterz@...radead.org"
	<peterz@...radead.org>, "hpa@...or.com" <hpa@...or.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>
CC: "nik.borisov@...e.com" <nik.borisov@...e.com>, "Hunter, Adrian"
	<adrian.hunter@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>, "x86@...nel.org" <x86@...nel.org>, "Yamahata,
 Isaku" <isaku.yamahata@...el.com>
Subject: Re: [PATCH v8 8.2/9] x86/virt/tdx: Reduce TDMR's reserved areas by
 using CMRs to find memory holes

On Mon, 2024-12-09 at 14:54 -0800, Dave Hansen wrote:
> On 12/8/24 22:50, Kai Huang wrote:
> > A TDX module initialization failure was reported on an Emerald Rapids
> > platform [*]:
> > 
> >   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> >   virt/tdx: module initialization failed (-28)
> > 
> > The kernel informs the TDX module of "TDX-usable memory regions" via the
> > structure "TD Memory Region" (TDMR).  Each TDMR contains a limited
> > number of "reserved areas" to inform the TDX module of the regions that
> > cannot be used by TDX.
> > 
> > The kernel builds the list of "TDX-usable memory regions" from memblock
> > (which reflects e820) and marks all memory holes as "reserved areas" in
> > TDMRs.  It turns out on some large systems the holes in memblock can be
> > too fine-grained [1] and exceed the number of reserved areas that the
> > module can track per TDMR, resulting in the failure mentioned above.
> > 
> > The TDX module also reports TDX-capable memory as "Convertible Memory
> > Regions" (CMRs).  CMRs tend to be coarser-grained [2] than the e820.
> > Use CMRs to find memory holes when populating reserved areas to reduce
> > their consumption.
> > 
> > Note the kernel does not prevent non-CMR memory from being added to
> > "TDX-usable memory regions" but depends on the TDX module to catch in
> > the TDH.SYS.CONFIG.  After switching to using CMRs to populate reserved
> > areas this will no longer work.  To ensure no non-CMR memory is included
> > in the TDMRs, verify that the memory region is truly TDX convertible
> > before adding it as a TDX-usable memory region at early stage.
> 
> Thanks for trimming the changelog down.  But this changelog never
> actually says what the fix is. It's also quite heavy on the "what" and
> very light on the "why".
> 
> I think the "why" boils down to the fact that the kernel is treating RAM
> -- as defined by the platform and TDX module -- as non-RAM.

Yes.

> 
> > -	ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
> > +	/*
> > +	 * On some large systems, the TDX memory blocks (which reflects
> > +	 * e820) in the first 1GB can be too fine-grained.  Using them
> > +	 * to populate reserved areas may result in reserved areas being
> > +	 * exhausted.  CMRs are coarser-grained than e820.  Use CMRs to
> > +	 * populate reserved areas to reduce their consumption.
> > +	 */
> 
> I think there are still too many details here for a comment. This
> comment is describing *highly* implementation and platform-specific
> details particular to this bug you are fixing today. They will be
> irrelevant to anyone reading this code tomorrow.
> 
> So in the end, I buy that the CMR's have something to offer here. But I
> think that "why" I mentioned above casts doubt on whether
> for_each_mem_pfn_range() is the right primitive on which to build the
> TDX memblocks in the first place.

We can change to just use CMRs as TDX memory blocks, i.e., always cover all CMRs
in TDMRs, but this will have much wider impact.

The main concern is the PAMT allocation: PAMT is allocated from page allocator,
but the CMRs -- the RAM as defined by the platform and the TDX module - - can
cover more, and sometimes much more, regions than the regions end up to the page
allocator.

E.g., today we can use 'memmap=' to reserve part of memory for other purpose. 
And in the future CMRs may cover CXL memory regions which could be much larger
IIUC.

If we change to cover CMRs in TDMRs, we could end up with a much larger TDMR
ranges.  In this case we may end up with wasting PAMTs (e.g., if the admin wants
to use CXL for other non-TDX purpose), increasing the failure rate, or a
complete failure of PAMT allocation.

> I suspect there's a much simpler solution that will emerge when
> considering a deeper fix as opposed to adding CMRs as a band-aid.

I don't have an immediate solution other than using CMRs to fill up reserved
areas.  I will think more.

Perhaps we can try to split the TDMR to make it cover less reserved areas.  But
this won't work when the TDMR is already 1GB.  And this will result in new cases
where "one TDX memory block can end up to multiple TDMRs" etc.  To me it's over-
complicated and is not as good as using the CMR.  (This is listed as an
alternative in the initial changelog but was removed to trim down the log).

Btw, Rick is concerning with the overall KVM TDX upstream because this series is
a dependency to the rest KVM TDX patches.  Technically this bugfix is not
related to the KVM TDX support.  We are going to look at dropping the CMR staff
for now (the code which reads CMRs in patch 3, and patch 7-8), as the TDX KVM
patches can live without it for initial support.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ