linux-kernel - Re: [PATCHv14 5/9] efi: Add unaccepted memory support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3577c8a5-3f88-45b8-9b41-2fb5cb6dc53a@amd.com>
Date:   Fri, 13 Oct 2023 12:45:20 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Michael Roth <michael.roth@....com>
Cc:     Borislav Petkov <bp@...en8.de>, Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        marcelo.cerri@...onical.com, tim.gardner@...onical.com,
        khalid.elmously@...onical.com, philip.cox@...onical.com,
        aarcange@...hat.com, peterx@...hat.com, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv14 5/9] efi: Add unaccepted memory support

On 10/13/23 11:22, Kirill A. Shutemov wrote:
> On Fri, Oct 13, 2023 at 03:33:58PM +0300, Kirill A. Shutemov wrote:
>>> While testing SNP guests running today's tip/master (ef19bc9dddc3) I ran
>>> into what seems to be fairly significant lock contention due to the
>>> unaccepted_memory_lock spinlock above, which results in a constant stream
>>> of soft-lockups until the workload gets all its memory accepted/faulted
>>> in if the guest has around 16+ vCPUs.
>>>
>>> I've included the guest dmesg traces I was seeing below.
>>>
>>> In this case I was running a 32 vCPU guest with 200GB of memory running on
>>> a 256 thread EPYC (Milan) system, and can trigger the above situation fairly
>>> reliably by running the following workload in a freshly-booted guests:
>>>
>>>    stress --vm 32 --vm-bytes 5G --vm-keep
>>>
>>> Scaling up the number of stress threads and vCPUs should make it easier
>>> to reproduce.
>>>
>>> Other than unresponsiveness/lockup messages until the memory is accepted,
>>> the guest seems to continue running fine, but for large guests where
>>> unaccepted memory is more likely to be useful, it seems like it could be
>>> an issue, especially when consider 100+ vCPU guests.
>>
>> Okay, sorry for delay. It took time to reproduce it with TDX.
>>
>> I will look what can be done.
> 
> Could you check if the patch below helps?
> 
> diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
> index 853f7dc3c21d..591da3f368fa 100644
> --- a/drivers/firmware/efi/unaccepted_memory.c
> +++ b/drivers/firmware/efi/unaccepted_memory.c
> @@ -8,6 +8,14 @@
>   /* Protects unaccepted memory bitmap */
>   static DEFINE_SPINLOCK(unaccepted_memory_lock);
>   
> +struct accept_range {
> +	struct list_head list;
> +	unsigned long start;
> +	unsigned long end;
> +};
> +
> +static LIST_HEAD(accepting_list);
> +
>   /*
>    * accept_memory() -- Consult bitmap and accept the memory if needed.
>    *
> @@ -24,6 +32,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
>   {
>   	struct efi_unaccepted_memory *unaccepted;
>   	unsigned long range_start, range_end;
> +	struct accept_range range, *entry;
>   	unsigned long flags;
>   	u64 unit_size;
>   
> @@ -80,7 +89,25 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
>   
>   	range_start = start / unit_size;
>   
> +	range.start = start;
> +	range.end = end;
> +retry:
>   	spin_lock_irqsave(&unaccepted_memory_lock, flags);
> +
> +	list_for_each_entry(entry, &accepting_list, list) {
> +		if (entry->end < start)
> +			continue;
> +		if (entry->start > end)

Should this be a >= check since start and end are page aligned values?

> +			continue;
> +		spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> +
> +		/* Somebody else accepting the range */
> +		cpu_relax();
> +		goto retry;

Could you set some kind of flag here so that ...

> +	}
> +

... once you get here, that means that area was accepted and removed from 
the list, so I think you could just drop the lock and exit now, right? 
Because at that point the bitmap will have been updated and you wouldn't 
be accepting any memory anyway?

Thanks,
Tom

> +	list_add(&range.list, &accepting_list);
> +
>   	for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
>   				   DIV_ROUND_UP(end, unit_size)) {
>   		unsigned long phys_start, phys_end;
> @@ -89,9 +116,15 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
>   		phys_start = range_start * unit_size + unaccepted->phys_base;
>   		phys_end = range_end * unit_size + unaccepted->phys_base;
>   
> +		spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> +
>   		arch_accept_memory(phys_start, phys_end);
> +
> +		spin_lock_irqsave(&unaccepted_memory_lock, flags);
>   		bitmap_clear(unaccepted->bitmap, range_start, len);
>   	}
> +
> +	list_del(&range.list);
>   	spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
>   }
>