linux-kernel - Re: [PATCH v8 07/16] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be85aa450b221bb730013d3b6ec0d4e71b51c228.camel@intel.com>
Date:   Thu, 12 Jan 2023 11:33:06 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "Luck, Tony" <tony.luck@...el.com>,
        "bagasdotme@...il.com" <bagasdotme@...il.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "Wysocki, Rafael J" <rafael.j.wysocki@...el.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Yamahata, Isaku" <isaku.yamahata@...el.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "Shahar, Sagi" <sagis@...gle.com>,
        "imammedo@...hat.com" <imammedo@...hat.com>,
        "Gao, Chao" <chao.gao@...el.com>,
        "Brown, Len" <len.brown@...el.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "sathyanarayanan.kuppuswamy@...ux.intel.com" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>
Subject: Re: [PATCH v8 07/16] x86/virt/tdx: Use all system memory when
 initializing TDX module as TDX memory

On Mon, 2023-01-09 at 08:51 -0800, Dave Hansen wrote:
> > > > > > +       list_for_each_entry(tmb, &tdx_memlist, list) {
> > > > > > +               /*
> > > > > > +                * The new range is TDX memory if it is fully
> > > > > > covered by
> > > > > > +                * any TDX memory block.
> > > > > > +                *
> > > > > > +                * Note TDX memory blocks are originated from
> > > > > > memblock
> > > > > > +                * memory regions, which can only be contiguous when
> > > > > > two
> > > > > > +                * regions have different NUMA nodes or flags. 
> > > > > > Therefore
> > > > > > +                * the new range cannot cross multiple TDX memory
> > > > > > blocks.
> > > > > > +                */
> > > > > > +               if (start_pfn >= tmb->start_pfn && end_pfn <= tmb-
> > > > > > >end_pfn)
> > > > > > +                       return true;
> > > > > > +       }
> > > > > > +       return false;
> > > > > > +}
> > > > 
> > > > I don't really like that comment.  It should first state its behavior
> > > > and assumptions, like:
> > > > 
> > > >      This check assumes that the start_pfn<->end_pfn range does not
> > > >      cross multiple tdx_memlist entries.
> > > > 
> > > > Only then should it describe why that is OK:
> > > > 
> > > >      A single memory hotplug even across mutliple memblocks (from
> > > >      which tdx_memlist entries are derived) is impossible.  ... then
> > > >      actually explain
> > > > 
> > 
> > How about below?
> > 
> >          /*
> >           * This check assumes that the start_pfn<->end_pfn range does not
> > cross
> >           * multiple tdx_memlist entries. A single memory hotplug event
> > across
> >           * multiple memblocks (from which tdx_memlist entries are derived)
> > is
> >           * impossible. That means start_pfn<->end_pfn range cannot exceed a
> >           * tdx_memlist entry, and the new range is TDX memory if it is
> > fully
> >           * covered by any tdx_memlist entry.
> >           */
> 
> I was hoping you would actually explain why it is impossible.
> 
> Is there something fundamental that keeps a memory area that spans two
> nodes from being removed and then a new area added that is comprised of
> a single node?
> 
> Boot time:
> 
> 	| memblock  |  memblock |
> 	<--Node=0--> <--Node=1-->
> 
> Funky hotplug... nothing to see here, then:
> 
> 	<--------Node=2-------->
> 
> I would believe that there is no current bare-metal TDX system that has
> an implementation like this.  But, the comments above speak like it's
> fundamentally impossible.  That should be clarified.
> 
> In other words, that comment talks about memblock attributes as being
> the core underlying reason that that simplified check is OK.  Is that
> it, or is it really the reduced hotplug feature set on TDX systems?

Hi Dave,

I think I have been forgetting that we have switched to reject non-TDX memory in
memory online, but not in memory hot-add.  

Memory offline/online is done on granularity of 'struct memory_block', but not
memblock.  In fact, the hotpluggable memory region (one memblock) must be
multiple of memory_block, and a "to-be-online" memory_block must be full range
memory (no memory hole).

So if I am not missing something, IIUC that means if the start_pfn<->end_pfn is
TDX memory, it must be fully within some @tdx_memlist entry, but cannot cross
multiple small entries.  And the memory hotplug case in your above diagram
actually shouldn't matter.

If above stands, how about below?

        /*
         * This check assumes that the start_pfn<->end_pfn range does not 
         * cross multiple @tdx_memlist entries.  A single memory online   
         * event across multiple @tdx_memlist entries (which are derived  
         * from memblocks at the time of module initialization) is not    
         * possible.
         *
         * This is because memory offline/online is done on granularity   
         * of 'struct memory_block', and the hotpluggable memory region   
         * (one memblock) must be multiple of memory_block.  Also, the    
         * "to-be-online" memory_block must be full memory (no memory     
         * hole, i.e. containing multiple small memblocks).
         *
         * This means if the start_pfn<->end_pfn range is TDX memory, it  
         * must be fully within one @tdx_memlist entry, but cannot cross  
         * multiple small entries.
         */
        list_for_each_entry(tmb, &tdx_memlist, list) {
                if (start_pfn >= tmb->start_pfn && end_pfn <= tmb->end_pfn)
                        return true;
        }