[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YnjuFHvyGwa9yHat@kernel.org>
Date: Mon, 9 May 2022 13:33:56 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Kai Huang <kai.huang@...el.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Dave Hansen <dave.hansen@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
KVM list <kvm@...r.kernel.org>,
Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"Brown, Len" <len.brown@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
Rafael J Wysocki <rafael.j.wysocki@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Andi Kleen <ak@...ux.intel.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v3 00/21] TDX host kernel support
On Sun, May 08, 2022 at 10:00:39PM +1200, Kai Huang wrote:
> On Fri, 2022-05-06 at 20:09 -0400, Mike Rapoport wrote:
> > On Thu, May 05, 2022 at 06:51:20AM -0700, Dan Williams wrote:
> > > [ add Mike ]
> > >
> > > On Thu, May 5, 2022 at 2:54 AM Kai Huang <kai.huang@...el.com> wrote:
> > > [..]
> > > >
> > > > Hi Dave,
> > > >
> > > > Sorry to ping (trying to close this).
> > > >
> > > > Given we don't need to consider kmem-hot-add legacy PMEM after TDX module
> > > > initialization, I think for now it's totally fine to exclude legacy PMEMs from
> > > > TDMRs. The worst case is when someone tries to use them as TD guest backend
> > > > directly, the TD will fail to create. IMO it's acceptable, as it is supposedly
> > > > that no one should just use some random backend to run TD.
> > >
> > > The platform will already do this, right? I don't understand why this
> > > is trying to take proactive action versus documenting the error
> > > conditions and steps someone needs to take to avoid unconvertible
> > > memory. There is already the CONFIG_HMEM_REPORTING that describes
> > > relative performance properties between initiators and targets, it
> > > seems fitting to also add security properties between initiators and
> > > targets so someone can enumerate the numa-mempolicy that avoids
> > > unconvertible memory.
> > >
> > > No, special casing in hotplug code paths needed.
> > >
> > > >
> > > > I think w/o needing to include legacy PMEM, it's better to get all TDX memory
> > > > blocks based on memblock, but not e820. The pages managed by page allocator are
> > > > from memblock anyway (w/o those from memory hotplug).
> > > >
> > > > And I also think it makes more sense to introduce 'tdx_memblock' and
> > > > 'tdx_memory' data structures to gather all TDX memory blocks during boot when
> > > > memblock is still alive. When TDX module is initialized during runtime, TDMRs
> > > > can be created based on the 'struct tdx_memory' which contains all TDX memory
> > > > blocks we gathered based on memblock during boot. This is also more flexible to
> > > > support other TDX memory from other sources such as CLX memory in the future.
> > > >
> > > > Please let me know if you have any objection? Thanks!
> > >
> > > It's already the case that x86 maintains sideband structures to
> > > preserve memory after exiting the early memblock code. Mike, correct
> > > me if I am wrong, but adding more is less desirable than just keeping
> > > the memblock around?
> >
> > TBH, I didn't read the entire thread yet, but at the first glance, keeping
> > memblock around is much more preferable that adding yet another { .start,
> > .end, .flags } data structure. To keep memblock after boot all is needed is
> > something like
> >
> > select ARCH_KEEP_MEMBLOCK if INTEL_TDX_HOST
> >
> > I'll take a closer look next week on the entire series, maybe I'm missing
> > some details.
> >
>
> Hi Mike,
>
> Thanks for feedback.
>
> Perhaps I haven't put a lot details of the new TDX data structures, so let me
> point out that the new two data structures 'struct tdx_memblock' and 'struct
> tdx_memory' that I am proposing are mostly supposed to be used by TDX code only,
> which is pretty standalone. They are not supposed to be some basic
> infrastructure that can be widely used by other random kernel components.
We already have "pretty standalone" numa_meminfo that originally was used
to setup NUMA memory topology, but now it's used by other code as well.
And e820 tables also contain similar data and they are supposedly should be
used only at boot time, but in reality there are too much callbacks into
e820 way after the system is booted.
So any additional memory representation will only add to the overall
complexity and well have even more "eventually consistent" collections of
{ .start, .end, .flags } structures.
> In fact, currently the only operation we need is to allow memblock to register
> all memory regions as TDX memory blocks when the memblock is still alive.
> Therefore, in fact, the new data structures can even be completely invisible to
> other kernel components. For instance, TDX code can provide below API w/o
> exposing any data structures to other kernel components:
>
> int tdx_add_memory_block(phys_addr_t start, phys_addr_t end, int nid);
>
> And we call above API for each memory region in memblock when it is alive.
>
> TDX code internally manages those memory regions via the new data structures
> that I mentioned above, so we don't need to keep memblock after boot. The
> advantage of this approach is it is more flexible to support other potential TDX
> memory resources (such as CLX memory) in the future.
Please let keep things simple. If other TDX memory resources will need
different handling it can be implemented then. For now, just enable
ARCH_KEEP_MEMBLOCK and use memblock to track TDX memory.
> Otherwise, we can do as you suggested to select ARCH_KEEP_MEMBLOCK when
> INTEL_TDX_HOST is on and TDX code internally uses memblock API directly.
>
> --
> Thanks,
> -Kai
>
>
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists