linux-kernel - Re: [PATCHv7 02/14] mm: Add support for unaccepted memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb9d3310-3bc0-8ecf-5e71-becce980235f@redhat.com>
Date:   Fri, 5 Aug 2022 14:09:31 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>
Cc:     Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Mike Rapoport <rppt@...nel.org>, marcelo.cerri@...onical.com,
        tim.gardner@...onical.com, khalid.elmously@...onical.com,
        philip.cox@...onical.com, x86@...nel.org, linux-mm@...ck.org,
        linux-coco@...ts.linux.dev, linux-efi@...r.kernel.org,
        linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...ux.ibm.com>,
        Mel Gorman <mgorman@...hsingularity.net>
Subject: Re: [PATCHv7 02/14] mm: Add support for unaccepted memory

On 05.08.22 13:49, Vlastimil Babka wrote:
> On 6/14/22 14:02, Kirill A. Shutemov wrote:
>> UEFI Specification version 2.9 introduces the concept of memory
>> acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD
>> SEV-SNP, require memory to be accepted before it can be used by the
>> guest. Accepting happens via a protocol specific to the Virtual Machine
>> platform.
>>
>> There are several ways kernel can deal with unaccepted memory:
>>
>>  1. Accept all the memory during the boot. It is easy to implement and
>>     it doesn't have runtime cost once the system is booted. The downside
>>     is very long boot time.
>>
>>     Accept can be parallelized to multiple CPUs to keep it manageable
>>     (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate
>>     memory bandwidth and does not scale beyond the point.
>>
>>  2. Accept a block of memory on the first use. It requires more
>>     infrastructure and changes in page allocator to make it work, but
>>     it provides good boot time.
>>
>>     On-demand memory accept means latency spikes every time kernel steps
>>     onto a new memory block. The spikes will go away once workload data
>>     set size gets stabilized or all memory gets accepted.
>>
>>  3. Accept all memory in background. Introduce a thread (or multiple)
>>     that gets memory accepted proactively. It will minimize time the
>>     system experience latency spikes on memory allocation while keeping
>>     low boot time.
>>
>>     This approach cannot function on its own. It is an extension of #2:
>>     background memory acceptance requires functional scheduler, but the
>>     page allocator may need to tap into unaccepted memory before that.
>>
>>     The downside of the approach is that these threads also steal CPU
>>     cycles and memory bandwidth from the user's workload and may hurt
>>     user experience.
>>
>> Implement #2 for now. It is a reasonable default. Some workloads may
>> want to use #1 or #3 and they can be implemented later based on user's
>> demands.
>>
>> Support of unaccepted memory requires a few changes in core-mm code:
>>
>>   - memblock has to accept memory on allocation;
>>
>>   - page allocator has to accept memory on the first allocation of the
>>     page;
>>
>> Memblock change is trivial.
>>
>> The page allocator is modified to accept pages on the first allocation.
>> The new page type (encoded in the _mapcount) -- PageUnaccepted() -- is
>> used to indicate that the page requires acceptance.
>>
>> Architecture has to provide two helpers if it wants to support
>> unaccepted memory:
>>
>>  - accept_memory() makes a range of physical addresses accepted.
>>
>>  - range_contains_unaccepted_memory() checks anything within the range
>>    of physical addresses requires acceptance.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>> Acked-by: Mike Rapoport <rppt@...ux.ibm.com>	# memblock
>> Reviewed-by: David Hildenbrand <david@...hat.com>
> 
> Hmm I realize it's not ideal to raise this at v7, and maybe it was discussed
> before, but it's really not great how this affects the core page allocator
> paths. Wouldn't it be possible to only release pages to page allocator when
> accepted, and otherwise use some new per-zone variables together with the
> bitmap to track how much exactly is where to accept? Then it could be hooked
> in get_page_from_freelist() similarly to CONFIG_DEFERRED_STRUCT_PAGE_INIT -
> if we fail zone_watermark_fast() and there are unaccepted pages in the zone,
> accept them and continue. With a static key to flip in case we eventually
> accept everything. Because this is really similar scenario to the deferred
> init and that one was solved in a way that adds minimal overhead.

I kind of like just having the memory stats being correct (e.g., free
memory) and acceptance being an internal detail to be triggered when
allocating pages -- just like the arch_alloc_page() callback.

I'm sure we could optimize for the !unaccepted memory via static keys
also in this version with some checks at the right places if we find
this to hurt performance?

-- 
Thanks,

David / dhildenb