linux-kernel - Re: [PATCHv11 1/9] mm: Add support for unaccepted memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230516213245.oruzw2kinbfqcwwl@box.shutemov.name>
Date:   Wed, 17 May 2023 00:32:45 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Tom Lendacky <thomas.lendacky@....com>
Cc:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        marcelo.cerri@...onical.com, tim.gardner@...onical.com,
        khalid.elmously@...onical.com, philip.cox@...onical.com,
        aarcange@...hat.com, peterx@...hat.com, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
        Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCHv11 1/9] mm: Add support for unaccepted memory

On Tue, May 16, 2023 at 02:44:00PM -0500, Tom Lendacky wrote:
> On 5/13/23 17:04, Kirill A. Shutemov wrote:
> > UEFI Specification version 2.9 introduces the concept of memory
> > acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD
> > SEV-SNP, require memory to be accepted before it can be used by the
> > guest. Accepting happens via a protocol specific to the Virtual Machine
> > platform.
> > 
> > There are several ways kernel can deal with unaccepted memory:
> > 
> >   1. Accept all the memory during the boot. It is easy to implement and
> >      it doesn't have runtime cost once the system is booted. The downside
> >      is very long boot time.
> > 
> >      Accept can be parallelized to multiple CPUs to keep it manageable
> >      (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate
> >      memory bandwidth and does not scale beyond the point.
> > 
> >   2. Accept a block of memory on the first use. It requires more
> >      infrastructure and changes in page allocator to make it work, but
> >      it provides good boot time.
> > 
> >      On-demand memory accept means latency spikes every time kernel steps
> >      onto a new memory block. The spikes will go away once workload data
> >      set size gets stabilized or all memory gets accepted.
> > 
> >   3. Accept all memory in background. Introduce a thread (or multiple)
> >      that gets memory accepted proactively. It will minimize time the
> >      system experience latency spikes on memory allocation while keeping
> >      low boot time.
> > 
> >      This approach cannot function on its own. It is an extension of #2:
> >      background memory acceptance requires functional scheduler, but the
> >      page allocator may need to tap into unaccepted memory before that.
> > 
> >      The downside of the approach is that these threads also steal CPU
> >      cycles and memory bandwidth from the user's workload and may hurt
> >      user experience.
> > 
> > The patch implements #1 and #2 for now. #2 is the default. Some
> > workloads may want to use #1 with accept_memory=eager in kernel
> > command line. #3 can be implemented later based on user's demands.
> > 
> > Support of unaccepted memory requires a few changes in core-mm code:
> > 
> >    - memblock has to accept memory on allocation;
> > 
> >    - page allocator has to accept memory on the first allocation of the
> >      page;
> > 
> > Memblock change is trivial.
> > 
> > The page allocator is modified to accept pages. New memory gets accepted
> > before putting pages on free lists. It is done lazily: only accept new
> > pages when we run out of already accepted memory. The memory gets
> > accepted until the high watermark is reached.
> > 
> > EFI code will provide two helpers if the platform supports unaccepted
> > memory:
> > 
> >   - accept_memory() makes a range of physical addresses accepted.
> > 
> >   - range_contains_unaccepted_memory() checks anything within the range
> >     of physical addresses requires acceptance.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Acked-by: Mike Rapoport <rppt@...ux.ibm.com>	# memblock
> > Reviewed-by: Vlastimil Babka <vbabka@...e.cz>
> > ---
> >   drivers/base/node.c    |   7 ++
> >   fs/proc/meminfo.c      |   5 ++
> >   include/linux/mm.h     |  19 +++++
> >   include/linux/mmzone.h |   8 ++
> >   mm/internal.h          |   1 +
> >   mm/memblock.c          |   9 +++
> >   mm/mm_init.c           |   7 ++
> >   mm/page_alloc.c        | 173 +++++++++++++++++++++++++++++++++++++++++
> >   mm/vmstat.c            |   3 +
> >   9 files changed, 232 insertions(+)
> > 
> 
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 68410c6d97ac..b1db7ba5f57d 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1099,4 +1099,5 @@ struct vma_prepare {
> >   	struct vm_area_struct *remove;
> >   	struct vm_area_struct *remove2;
> >   };
> > +
> 
> Looks like an unintentional change.

Yep, will fix.

> >   #endif	/* __MM_INTERNAL_H */
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 3feafea06ab2..50b921119600 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1436,6 +1436,15 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> >   		 */
> >   		kmemleak_alloc_phys(found, size, 0);
> > +	/*
> > +	 * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP,
> > +	 * require memory to be accepted before it can be used by the
> > +	 * guest.
> > +	 *
> > +	 * Accept the memory of the allocated buffer.
> > +	 */
> > +	accept_memory(found, found + size);
> 
> I'm not an mm or memblock expert, but do we need to worry about freed memory
> from memblock_phys_free() being possibly doubly accepted? A double
> acceptance will trigger a guest termination on SNP.

There will be no double acceptance. accept_memory() will consult the
bitmap before accepting any memory. For already accepted memory it is a
nop.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov