linux-kernel - Re: [PATCHv14 5/9] efi: Add unaccepted memory support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231013123358.y4pcdp5fgtt4ax6g@box.shutemov.name>
Date:   Fri, 13 Oct 2023 15:33:58 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Michael Roth <michael.roth@....com>
Cc:     Borislav Petkov <bp@...en8.de>, Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tom Lendacky <thomas.lendacky@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        marcelo.cerri@...onical.com, tim.gardner@...onical.com,
        khalid.elmously@...onical.com, philip.cox@...onical.com,
        aarcange@...hat.com, peterx@...hat.com, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv14 5/9] efi: Add unaccepted memory support

On Tue, Oct 10, 2023 at 04:05:18PM -0500, Michael Roth wrote:
> On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote:
> > efi_config_parse_tables() reserves memory that holds unaccepted memory
> > configuration table so it won't be reused by page allocator.
> > 
> > Core-mm requires few helpers to support unaccepted memory:
> > 
> >  - accept_memory() checks the range of addresses against the bitmap and
> >    accept memory if needed.
> > 
> >  - range_contains_unaccepted_memory() checks if anything within the
> >    range requires acceptance.
> > 
> > Architectural code has to provide efi_get_unaccepted_table() that
> > returns pointer to the unaccepted memory configuration table.
> > 
> > arch_accept_memory() handles arch-specific part of memory acceptance.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Reviewed-by: Ard Biesheuvel <ardb@...nel.org>
> > Reviewed-by: Tom Lendacky <thomas.lendacky@....com>
> > ---
> >  arch/x86/platform/efi/efi.c              |   3 +
> >  drivers/firmware/efi/Makefile            |   1 +
> >  drivers/firmware/efi/efi.c               |  25 +++++
> >  drivers/firmware/efi/unaccepted_memory.c | 112 +++++++++++++++++++++++
> >  include/linux/efi.h                      |   1 +
> >  5 files changed, 142 insertions(+)
> >  create mode 100644 drivers/firmware/efi/unaccepted_memory.c
> > 
> > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
> > new file mode 100644
> > index 000000000000..08a9a843550a
> > --- /dev/null
> > +++ b/drivers/firmware/efi/unaccepted_memory.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +#include <linux/efi.h>
> > +#include <linux/memblock.h>
> > +#include <linux/spinlock.h>
> > +#include <asm/unaccepted_memory.h>
> > +
> > +/* Protects unaccepted memory bitmap */
> > +static DEFINE_SPINLOCK(unaccepted_memory_lock);
> > +
> > +/*
> > + * accept_memory() -- Consult bitmap and accept the memory if needed.
> > + *
> > + * Only memory that is explicitly marked as unaccepted in the bitmap requires
> > + * an action. All the remaining memory is implicitly accepted and doesn't need
> > + * acceptance.
> > + *
> > + * No need to accept:
> > + *  - anything if the system has no unaccepted table;
> > + *  - memory that is below phys_base;
> > + *  - memory that is above the memory that addressable by the bitmap;
> > + */
> > +void accept_memory(phys_addr_t start, phys_addr_t end)
> > +{
> > +	struct efi_unaccepted_memory *unaccepted;
> > +	unsigned long range_start, range_end;
> > +	unsigned long flags;
> > +	u64 unit_size;
> > +
> > +	unaccepted = efi_get_unaccepted_table();
> > +	if (!unaccepted)
> > +		return;
> > +
> > +	unit_size = unaccepted->unit_size;
> > +
> > +	/*
> > +	 * Only care for the part of the range that is represented
> > +	 * in the bitmap.
> > +	 */
> > +	if (start < unaccepted->phys_base)
> > +		start = unaccepted->phys_base;
> > +	if (end < unaccepted->phys_base)
> > +		return;
> > +
> > +	/* Translate to offsets from the beginning of the bitmap */
> > +	start -= unaccepted->phys_base;
> > +	end -= unaccepted->phys_base;
> > +
> > +	/* Make sure not to overrun the bitmap */
> > +	if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> > +		end = unaccepted->size * unit_size * BITS_PER_BYTE;
> > +
> > +	range_start = start / unit_size;
> > +
> > +	spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > +	for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
> > +				   DIV_ROUND_UP(end, unit_size)) {
> > +		unsigned long phys_start, phys_end;
> > +		unsigned long len = range_end - range_start;
> > +
> > +		phys_start = range_start * unit_size + unaccepted->phys_base;
> > +		phys_end = range_end * unit_size + unaccepted->phys_base;
> > +
> > +		arch_accept_memory(phys_start, phys_end);
> > +		bitmap_clear(unaccepted->bitmap, range_start, len);
> > +	}
> > +	spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> > +}
> 
> While testing SNP guests running today's tip/master (ef19bc9dddc3) I ran
> into what seems to be fairly significant lock contention due to the
> unaccepted_memory_lock spinlock above, which results in a constant stream
> of soft-lockups until the workload gets all its memory accepted/faulted
> in if the guest has around 16+ vCPUs.
> 
> I've included the guest dmesg traces I was seeing below.
> 
> In this case I was running a 32 vCPU guest with 200GB of memory running on
> a 256 thread EPYC (Milan) system, and can trigger the above situation fairly
> reliably by running the following workload in a freshly-booted guests:
> 
>   stress --vm 32 --vm-bytes 5G --vm-keep
> 
> Scaling up the number of stress threads and vCPUs should make it easier
> to reproduce.
> 
> Other than unresponsiveness/lockup messages until the memory is accepted,
> the guest seems to continue running fine, but for large guests where
> unaccepted memory is more likely to be useful, it seems like it could be
> an issue, especially when consider 100+ vCPU guests.

Okay, sorry for delay. It took time to reproduce it with TDX.

I will look what can be done.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov