linux-kernel - Re: [PATCH v3 00/21] TDX host kernel support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b40b3658e1fc7ec15d2adafe7f9562d42bc256f3.camel@intel.com>
Date:   Fri, 06 May 2022 12:45:54 +1200
From:   Kai Huang <kai.huang@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Dave Hansen <dave.hansen@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Brown, Len" <len.brown@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Rafael J Wysocki <rafael.j.wysocki@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Isaku Yamahata <isaku.yamahata@...el.com>,
        Mike Rapoport <rppt@...nel.org>
Subject: Re: [PATCH v3 00/21] TDX host kernel support

On Thu, 2022-05-05 at 17:22 -0700, Dan Williams wrote:
> On Thu, May 5, 2022 at 3:14 PM Kai Huang <kai.huang@...el.com> wrote:
> > 
> > Thanks for feedback!
> > 
> > On Thu, 2022-05-05 at 06:51 -0700, Dan Williams wrote:
> > > [ add Mike ]
> > > 
> > > 
> > > On Thu, May 5, 2022 at 2:54 AM Kai Huang <kai.huang@...el.com> wrote:
> > > [..]
> > > > 
> > > > Hi Dave,
> > > > 
> > > > Sorry to ping (trying to close this).
> > > > 
> > > > Given we don't need to consider kmem-hot-add legacy PMEM after TDX module
> > > > initialization, I think for now it's totally fine to exclude legacy PMEMs from
> > > > TDMRs.  The worst case is when someone tries to use them as TD guest backend
> > > > directly, the TD will fail to create.  IMO it's acceptable, as it is supposedly
> > > > that no one should just use some random backend to run TD.
> > > 
> > > The platform will already do this, right?
> > > 
> > 
> > In the current v3 implementation, we don't have any code to handle memory
> > hotplug, therefore nothing prevents people from adding legacy PMEMs as system
> > RAM using kmem driver.  In order to guarantee all pages managed by page
> 
> That's the fundamental question I am asking why is "guarantee all
> pages managed by page allocator are TDX memory". That seems overkill
> compared to indicating the incompatibility after the fact.

As I explained, the reason is I don't want to modify page allocator to
distinguish TDX and non-TDX allocation, for instance, having to have a ZONE_TDX
and GFP_TDX.

KVM depends on host's page fault handler to allocate the page.  In fact KVM only
consumes PFN from host's page tables.  For now only RAM is TDX memory.  By
guaranteeing all pages in page allocator is TDX memory, we can easily use
anonymous pages as TD guest memory.  This also allows us to easily extend the
shmem to support a new fd-based backend which doesn't require having to mmap()
TD guest memory to host userspace:

https://lore.kernel.org/kvm/20220310140911.50924-1-chao.p.peng@linux.intel.com/

Also, besides TD guest memory, there are some per-TD control data structures
(which must be TDX memory too) need to be allocated for each TD.  Normal memory
allocation APIs can be used for such allocation if we guarantee all pages in
page allocator is TDX memory.

> 
> > allocator are all TDX memory, the v3 implementation needs to always include
> > legacy PMEMs as TDX memory so that even people truly add  legacy PMEMs as system
> > RAM, we can still guarantee all pages in page allocator are TDX memory.
> 
> Why?

If we don't include legacy PMEMs as TDX memory, then after they are hot-added as
system RAM using kmem driver, the assumption of "all pages in page allocator are
TDX memory" is broken.  A TD can be killed during runtime.

> 
> > Of course, a side benefit of always including legacy PMEMs is people
> > theoretically can use them directly as TD guest backend, but this is just a
> > bonus but not something that we need to guarantee.
> > 
> > 
> > > I don't understand why this
> > > is trying to take proactive action versus documenting the error
> > > conditions and steps someone needs to take to avoid unconvertible
> > > memory. There is already the CONFIG_HMEM_REPORTING that describes
> > > relative performance properties between initiators and targets, it
> > > seems fitting to also add security properties between initiators and
> > > targets so someone can enumerate the numa-mempolicy that avoids
> > > unconvertible memory.
> > 
> > I don't think there's anything related to performance properties here.  The only
> > goal here is to make sure all pages in page allocator are TDX memory pages.
> 
> Please reconsider or re-clarify that goal.
> 
> > 
> > > 
> > > No, special casing in hotplug code paths needed.
> > > 
> > > > 
> > > > I think w/o needing to include legacy PMEM, it's better to get all TDX memory
> > > > blocks based on memblock, but not e820.  The pages managed by page allocator are
> > > > from memblock anyway (w/o those from memory hotplug).
> > > > 
> > > > And I also think it makes more sense to introduce 'tdx_memblock' and
> > > > 'tdx_memory' data structures to gather all TDX memory blocks during boot when
> > > > memblock is still alive.  When TDX module is initialized during runtime, TDMRs
> > > > can be created based on the 'struct tdx_memory' which contains all TDX memory
> > > > blocks we gathered based on memblock during boot.  This is also more flexible to
> > > > support other TDX memory from other sources such as CLX memory in the future.
> > > > 
> > > > Please let me know if you have any objection?  Thanks!
> > > 
> > > It's already the case that x86 maintains sideband structures to
> > > preserve memory after exiting the early memblock code.
> > > 
> > 
> > May I ask what data structures are you referring to?
> 
> struct numa_meminfo.
> 
> > Btw, the purpose of 'tdx_memblock' and 'tdx_memory' is not only just to preserve
> > memblock info during boot.  It is also used to provide a common data structure
> > that the "constructing TDMRs" code can work on.  If you look at patch 11-14, the
> > logic (create TDMRs, allocate PAMTs, sets up reserved areas) around how to
> > construct TDMRs doesn't have hard dependency on e820.  If we construct TDMRs
> > based on a common 'tdx_memory' like below:
> > 
> >         int construct_tdmrs(struct tdx_memory *tmem, ...);
> > 
> > It would be much easier to support other TDX memory resources in the future.
> 
> "in the future" is a prompt to ask "Why not wait until that future /
> need arrives before adding new infrastructure?"

Fine to me.

-- 
Thanks,
-Kai