[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a5143cc-3102-5e30-08b4-c07e44f1a2fc@intel.com>
Date: Fri, 29 Apr 2022 10:18:09 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Kai Huang <kai.huang@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
KVM list <kvm@...r.kernel.org>,
Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Brown, Len" <len.brown@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
Rafael J Wysocki <rafael.j.wysocki@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Andi Kleen <ak@...ux.intel.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v3 00/21] TDX host kernel support
On 4/29/22 08:18, Dan Williams wrote:
> Yes, I want to challenge the idea that all core-mm memory must be TDX
> capable. Instead, this feels more like something that wants a
> hugetlbfs / dax-device like capability to ask the kernel to gather /
> set-aside the enumerated TDX memory out of all the general purpose
> memory it knows about and then VMs use that ABI to get access to
> convertible memory. Trying to ensure that all page allocator memory is
> TDX capable feels too restrictive with all the different ways pfns can
> get into the allocator.
The KVM users are the problem here. They use a variety of ABIs to get
memory and then hand it to KVM. KVM basically just consumes the
physical addresses from the page tables.
Also, there's no _practical_ problem here today. I can't actually think
of a case where any memory that ends up in the allocator on today's TDX
systems is not TDX capable.
Tomorrow's systems are going to be the problem. They'll (presumably)
have a mix of CXL devices that will have varying capabilities. Some
will surely lack the metadata storage for checksums and TD-owner bits.
TDX use will be *safe* on those systems: if you take this code and run
it on one tomorrow's systems, it will notice the TDX-incompatible memory
and will disable TDX.
The only way around this that I can see is to introduce ABI today that
anticipates the needs of the future systems. We could require that all
the KVM memory be "validated" before handing it to TDX. Maybe a new
syscall that says: "make sure this mapping works for TDX". It could be
new sysfs ABI which specifies which NUMA nodes contain TDX-capable memory.
But, neither of those really help with, say, a device-DAX mapping of
TDX-*IN*capable memory handed to KVM. The "new syscall" would just
throw up its hands and leave users with the same result: TDX can't be
used. The new sysfs ABI for NUMA nodes wouldn't clearly apply to
device-DAX because they don't respect the NUMA policy ABI.
I'm open to ideas here. If there's a viable ABI we can introduce to
train TDX users today that will work tomorrow too, I'm all for it.
Powered by blists - more mailing lists