[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e828b48f-dcd8-6404-fc30-6e1dd682252f@redhat.com>
Date: Fri, 5 Aug 2022 16:22:45 +0200
From: David Hildenbrand <david@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Sean Christopherson <seanjc@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>,
Ard Biesheuvel <ardb@...nel.org>
Cc: Andi Kleen <ak@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Dario Faggioli <dfaggioli@...e.com>,
Dave Hansen <dave.hansen@...el.com>,
Mike Rapoport <rppt@...nel.org>, marcelo.cerri@...onical.com,
tim.gardner@...onical.com, khalid.elmously@...onical.com,
philip.cox@...onical.com, x86@...nel.org, linux-mm@...ck.org,
linux-coco@...ts.linux.dev, linux-efi@...r.kernel.org,
linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...ux.ibm.com>,
Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory
On 05.08.22 15:38, Vlastimil Babka wrote:
> On 8/5/22 14:09, David Hildenbrand wrote:
>> On 05.08.22 13:49, Vlastimil Babka wrote:
>>> On 6/14/22 14:02, Kirill A. Shutemov wrote:
>>>> UEFI Specification version 2.9 introduces the concept of memory
>>>> acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD
>>>> SEV-SNP, require memory to be accepted before it can be used by the
>>>> guest. Accepting happens via a protocol specific to the Virtual Machine
>>>> platform.
>>>>
>>>> There are several ways kernel can deal with unaccepted memory:
>>>>
>>>> 1. Accept all the memory during the boot. It is easy to implement and
>>>> it doesn't have runtime cost once the system is booted. The downside
>>>> is very long boot time.
>>>>
>>>> Accept can be parallelized to multiple CPUs to keep it manageable
>>>> (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate
>>>> memory bandwidth and does not scale beyond the point.
>>>>
>>>> 2. Accept a block of memory on the first use. It requires more
>>>> infrastructure and changes in page allocator to make it work, but
>>>> it provides good boot time.
>>>>
>>>> On-demand memory accept means latency spikes every time kernel steps
>>>> onto a new memory block. The spikes will go away once workload data
>>>> set size gets stabilized or all memory gets accepted.
>>>>
>>>> 3. Accept all memory in background. Introduce a thread (or multiple)
>>>> that gets memory accepted proactively. It will minimize time the
>>>> system experience latency spikes on memory allocation while keeping
>>>> low boot time.
>>>>
>>>> This approach cannot function on its own. It is an extension of #2:
>>>> background memory acceptance requires functional scheduler, but the
>>>> page allocator may need to tap into unaccepted memory before that.
>>>>
>>>> The downside of the approach is that these threads also steal CPU
>>>> cycles and memory bandwidth from the user's workload and may hurt
>>>> user experience.
>>>>
>>>> Implement #2 for now. It is a reasonable default. Some workloads may
>>>> want to use #1 or #3 and they can be implemented later based on user's
>>>> demands.
>>>>
>>>> Support of unaccepted memory requires a few changes in core-mm code:
>>>>
>>>> - memblock has to accept memory on allocation;
>>>>
>>>> - page allocator has to accept memory on the first allocation of the
>>>> page;
>>>>
>>>> Memblock change is trivial.
>>>>
>>>> The page allocator is modified to accept pages on the first allocation.
>>>> The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is
>>>> used to indicate that the page requires acceptance.
>>>>
>>>> Architecture has to provide two helpers if it wants to support
>>>> unaccepted memory:
>>>>
>>>> - accept_memory() makes a range of physical addresses accepted.
>>>>
>>>> - range_contains_unaccepted_memory() checks anything within the range
>>>> of physical addresses requires acceptance.
>>>>
>>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>>>> Acked-by: Mike Rapoport <rppt@...ux.ibm.com> # memblock
>>>> Reviewed-by: David Hildenbrand <david@...hat.com>
>>>
>>> Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed
>>> before, but it's really not great how this affects the core page allocator
>>> paths. Wouldn't it be possible to only release pages to page allocator when
>>> accepted, and otherwise use some new per-zone variables together with the
>>> bitmap to track how much exactly is where to accept? Then it could be hooked
>>> in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT -
>>> if we fail zone_watermark_fast() and there are unaccepted pages in the zone,
>>> accept them and continue. With a static key to flip in case we eventually
>>> accept everything. Because this is really similar scenario to the deferred
>>> init and that one was solved in a way that adds minimal overhead.
>>
>> I kind of like just having the memory stats being correct (e.g., free
>> memory) and acceptance being an internal detail to be triggered when
>> allocating pages -- just like the arch_alloc_page() callback.
>
> Hm, good point about the stats. Could be tweaked perhaps so it appears
> correct on the outside, but might be tricky.
>
>> I'm sure we could optimize for the !unaccepted memory via static keys
>> also in this version with some checks at the right places if we find
>> this to hurt performance?
>
> It would be great if we would at least somehow hit the necessary code only
> when dealing with a >=pageblock size block. The bitmap approach and
> accepting everything smaller uprofront actually seems rather compatible. Yet
> in the current patch we e.g. check PageUnaccepted(buddy) on every buddy size
> while merging.
>
> A list that sits besides the existing free_area, contains only >=pageblock
> order sizes of unaccepted pages (no migratetype distinguished) and we tap
> into it approximately before __rmqueue_fallback()? There would be some
> trickery around releasing zone-lock for doing accept_memory(), but should be
> manageable.
>
Just curious, do we have a microbenchmark that is able to reveal the
impact of such code changes before we start worrying?
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists