linux-kernel - Re: [PATCH v3 00/21] TDX host kernel support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fc1ca04d94ad45e79c0297719d5ef50a7c33c352.camel@intel.com>
Date:   Fri, 29 Apr 2022 13:40:13 +1200
From:   Kai Huang <kai.huang@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     seanjc@...gle.com, pbonzini@...hat.com, len.brown@...el.com,
        tony.luck@...el.com, rafael.j.wysocki@...el.com,
        reinette.chatre@...el.com, dan.j.williams@...el.com,
        peterz@...radead.org, ak@...ux.intel.com,
        kirill.shutemov@...ux.intel.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com,
        isaku.yamahata@...el.com
Subject: Re: [PATCH v3 00/21] TDX host kernel support

On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote:
> On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote:
> > On 4/27/22 17:37, Kai Huang wrote:
> > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote:
> > > > In 5 years, if someone takes this code and runs it on Intel hardware
> > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens?
> > > 
> > > I thought we could document this in the documentation saying that this code can
> > > only work on TDX machines that don't have above capabilities (SPR for now).  We
> > > can change the code and the documentation  when we add the support of those
> > > features in the future, and update the documentation.
> > > 
> > > If 5 years later someone takes this code, he/she should take a look at the
> > > documentation and figure out that he/she should choose a newer kernel if the
> > > machine support those features.
> > > 
> > > I'll think about design solutions if above doesn't look good for you.
> > 
> > No, it doesn't look good to me.
> > 
> > You can't just say:
> > 
> > 	/*
> > 	 * This code will eat puppies if used on systems with hotplug.
> > 	 */
> > 
> > and merrily await the puppy bloodbath.
> > 
> > If it's not compatible, then you have to *MAKE* it not compatible in a
> > safe, controlled way.
> > 
> > > > You can't just ignore the problems because they're not present on one
> > > > version of the hardware.
> > 
> > Please, please read this again ^^
> 
> OK.  I'll think about solutions and come back later.
> > 

Hi Dave,

I think we have two approaches to handle memory hotplug interaction with the TDX
module initialization.  

The first approach is simple.  We just block memory from being added as system
RAM managed by page allocator when the platform supports TDX [1]. It seems we
can add some arch-specific-check to __add_memory_resource() and reject the new
memory resource if platform supports TDX.  __add_memory_resource() is called by
both __add_memory() and add_memory_driver_managed() so it prevents from adding
NVDIMM as system RAM and normal ACPI memory hotplug [2].

The second approach is relatively more complicated.  Instead of directly
rejecting the new memory resource in __add_memory_resource(), we check whether
the memory resource can be added based on CMR and the TDX module initialization
status.   This is feasible as with the latest public P-SEAMLDR spec, we can get
CMR from P-SEAMLDR SEAMCALL[3].  So we can detect P-SEAMLDR and get CMR info
during kernel boots.  And in __add_memory_resource() we do below check:

	tdx_init_disable();	/*similar to cpu_hotplug_disable() */
	if (tdx_module_initialized())
		// reject memory hotplug
	else if (new_memory_resource NOT in CMRs)
		// reject memory hotplug
	else
		allow memory hotplug
	tdx_init_enable();	/*similar to cpu_hotplug_enable() */

tdx_init_disable() temporarily disables TDX module initialization by trying to
grab the mutex.  If the TDX module initialization is already on going, then it
waits until it completes.

This should work better for future platforms, but would requires non-trivial
more code as we need to add VMXON/VMXOFF support to the core-kernel to detect
CMR using  SEAMCALL.  A side advantage is with VMXON in core-kernel we can
shutdown the TDX module in kexec().

But for this series I think the second approach is overkill and we can choose to
use the first simple approach?

Any suggestions?

[1] Platform supports TDX means SEAMRR is enabled, and there are at least 2 TDX
keyIDs.  Or we can just check SEAMRR is enabled, as in practice a SEAMRR is
enabled means the machine is TDX-capable, and for now a TDX-capable machine
doesn't support ACPI memory hotplug.

[2] It prevents adding legacy PMEM as system RAM too but I think it's fine.  If
user wants legacy PMEM then it is unlikely user will add it back and use as
system RAM.  User is unlikely to use legacy PMEM as TD guest memory directly as
TD guests is likely to use a new memfd backend which allows private page not
accessible from usrspace, so in this way we can exclude legacy PMEM from TDMRs.

[3] Please refer to SEAMLDR.SEAMINFO SEAMCALL in latest P-SEAMLDR spec:
https://www.intel.com/content/dam/develop/external/us/en/documents-tps/intel-tdx-seamldr-interface-specification.pdf
> > >