[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b20f1878d31658a9e3cd3edaf3826fe8731346e.camel@intel.com>
Date: Thu, 21 Jul 2022 13:52:59 +1200
From: Kai Huang <kai.huang@...el.com>
To: Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org
Cc: seanjc@...gle.com, pbonzini@...hat.com, len.brown@...el.com,
tony.luck@...el.com, rafael.j.wysocki@...el.com,
reinette.chatre@...el.com, dan.j.williams@...el.com,
peterz@...radead.org, ak@...ux.intel.com,
kirill.shutemov@...ux.intel.com,
sathyanarayanan.kuppuswamy@...ux.intel.com,
isaku.yamahata@...el.com
Subject: Re: [PATCH v5 07/22] x86/virt/tdx: Implement SEAMCALL function
On Wed, 2022-07-20 at 09:48 -0700, Dave Hansen wrote:
> On 7/20/22 03:18, Kai Huang wrote:
> > Try to close on how to handle memory hotplug. After discussion, below will be
> > architectural behaviour of TDX in terms of ACPI memory hotplug:
> >
> > 1) During platform boot, CMRs must be physically present. MCHECK verifies all
> > CMRs are physically present and are actually TDX convertible memory.
>
> I doubt this is strictly true. This makes it sound like MCHECK is doing
> *ACTUAL* verification that the memory is, in practice, convertible.
> That would mean actually writing to it, which would take a long time for
> a large system.
The "verify" is used in many places in the specs. In the public TDX module
spec, it is also used:
Table 1.1: Intel TDX Glossary
CMR: A range of physical memory configured by BIOS and verified by MCHECK
I guess to verify, MCHECK doesn't need to actually write something to memory.
For example, when memory is present, it can know what type it is so it can
determine.
>
> Does it *ACTUALLY* verify this?
Yes. This is what spec says. And this is what Intel colleagues said.
>
> Also, it's very odd to say that "CMRs must be physically present". A
> CMR itself is a logical construct. The physical memory *backing* a CMR
> is, something else entirely.
OK. But I think it is easy to interpret this actually means the physical memory
*backing* a CMR.
>
> > 2) CMRs are static after platform boots and don't change at runtime. TDX
> > architecture doesn't support hot-add or hot-removal of CMR memory.
> > 3) TDX architecture doesn't forbid non-CMR memory hotplug.
> >
> > Also, although TDX doesn't trust BIOS in terms of security, a non-buggy BIOS
> > should prevent CMR memory from being hot-removed. If kernel ever receives such
> > event, it's a BIOS bug, or even worse, the BIOS is compromised and under attack.
> >
> > As a result, the kernel should also never receive event of hot-add CMR memory.
> > It is very much likely TDX is under attack (physical attack) in such case, i.e.
> > someone is trying to physically replace any CMR memory.
> >
> > In terms of how to handle ACPI memory hotplug, my thinking is -- ideally, if the
> > kernel can get the CMRs during kernel boot when detecting whether TDX is enabled
> > by BIOS, we can do below:
> >
> > - For memory hot-removal, if the removed memory falls into any CMR, then kernel
> > can speak loudly it is a BIOS bug. But when this happens, the hot-removal has
> > been handled by BIOS thus kernel cannot actually prevent, so kernel can either
> > BUG(), or just print error message. If the removed memory doesn't fall into
> > CMR, we do nothing.
>
> Hold on a sec. Hot-removal is a two-step process. The kernel *MUST*
> know in advance that the removal is going to occur. It follows that up
> with evacuating the memory, giving the "all clear", then the actual
> physical removal can occur.
After looking more, looks "the hot-removal has been handled by BIOS" is wrong.
And you are right there's a previous step must be done (it is device offline).
But the "kernel cannot actually prevent" means in the device removal callback,
the kernel cannot prevent it from being removed.
This is my understanding by reading the ACPI spec and the code:
Firstly, the BIOS will send a "Eject Request" notification to the kernel. Upon
receiving this event, the kernel will firstly try to offline the device (which
can fail due to -EBUSY, etc). If offline is successful, the kernel will call
device's remove callback to remove the device. But this remove callback doesn't
return error code (which means it doesn't fail). Instead, after the remove
callback is done, the kernel calls _EJ0 ACPI method to actually do the ejection.
>
> I'm not sure what you're getting at with the "kernel cannot actually
> prevent" bit. No sane system actively destroys perfect good memory
> content and tells the kernel about it after the fact.
The kernel will offline the device first. This guarantees all good memory
content has been migrated.
>
> > - For memory hot-add, if the new memory falls into any CMR, then kernel should
> > speak loudly it is a BIOS bug, or even say "TDX is under attack" as this is only
> > possible when CMR memory has been previously hot-removed.
>
> I don't think this is strictly true. It's totally possible to get a
> hot-add *event* for memory which is in a CMR. It would be another BIOS
> bug, of course, but hot-remove is not a prerequisite purely for an event.
OK.
>
> > And kernel should
> > reject the new memory for security reason. If the new memory doesn't fall into
> > any CMR, then we (also) just reject the new memory, as we want to guarantee all
> > memory in page allocator are TDX pages. But this is basically due to kernel
> > policy but not due to TDX architecture.
>
> Agreed.
>
> > BUT, since as the first step, we cannot get the CMR during kernel boot (as it
> > requires additional code to put CPU into VMX operation), I think for now we can
> > handle ACPI memory hotplug in below way:
> >
> > - For memory hot-removal, we do nothing.
>
> This doesn't seem right to me. *If* we get a known-bogus hot-remove
> event, we need to reject it. Remember, removal is a two-step process.
If so, we need to reject the (CMR) memory offline. Or we just BUG() in the ACPI
memory removal callback?
But either way this will requires us to get the CMRs during kernel boot.
Do you think we need to add this support in the first series?
>
> > - For memory hot-add, we simply reject the new memory when TDX is enabled by
> > BIOS. This not only prevents the potential "physical attack of replacing any
> > CMR memory",
>
> I don't think there's *any* meaningful attack mitigation here. Even if
> someone managed to replace the physical address space that backed some
> private memory, the integrity checksums won't match. Memory integrity
> mitigates physical replacement, not software.
My thinking is rejecting the new memory is a more aggressive defence than
waiting until integrity checksum failure.
Btw, the integrity checksum support isn't a mandatory requirement for TDX
architecture. In fact, TDX also supports a mode which doesn't require integrity
check (for instance, TDX on client machines).
>
> > but also makes sure no non-CMR memory will be added to page
> > allocator during runtime via ACPI memory hot-add.
>
> Agreed. This one _is_ important and since it supports an existing
> policy, it makes sense to enforce this in the kernel.
>
> > We can improve this in next stage when we can get CMRs during kernel boot.
> >
> > For the concern that on a TDX BIOS enabled system, people may not want to use
> > TDX at all but just use it as normal system, as I replied to Dan regarding to
> > the driver-managed memory hotplug, we can provide a kernel commandline, i.e.
> > use_tdx={on|off}, to allow user to *choose* between TDX and memory hotplug.
> > When use_tdx=off, we continue to allow memory hotplug and driver-managed hotplug
> > as normal but refuse to initialize TDX module.
>
> That doesn't sound like a good resolution to me.
>
> It conflates pure "software" hotplug operations like transitioning
> memory ownership from the core mm to a driver (like device DAX).
>
> TDX should not have *ANY* impact on purely software operations. Period.
The hard requirement is: Once TDX module gets initialized, we cannot add any
*new* memory to core-mm.
But if some memory block is included to TDX memory when the module gets
initialized, then we should be able to move it from core-mm to driver or vice
versa. In this case, we can select all memory regions that the kernel wants to
use as TDX memory at some point (during kernel boot I guess).
Adding any non-selected-TDX memory regions to core-mm should always be rejected
(therefore there's no removal of them from core-mm either), although it is
"software" hotplug. If user wants this, he/she cannot use TDX. This is what I
mean we can provide command line to allow user to *choose*.
Also, if I understand correctly above, your suggestion is we want to prevent any
CMR memory going offline so it won't be hot-removed (assuming we can get CMRs
during boot). This looks contradicts to the requirement of being able to allow
moving memory from core-mm to driver. When we offline the memory, we cannot
know whether the memory will be used by driver, or later hot-removed.
Powered by blists - more mailing lists