[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220413113024.ycvocn6ynerl3b7m@box.shutemov.name>
Date: Wed, 13 Apr 2022 14:30:24 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: David Hildenbrand <david@...hat.com>
Cc: Dave Hansen <dave.hansen@...el.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Sean Christopherson <seanjc@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>,
Ard Biesheuvel <ardb@...nel.org>,
Andi Kleen <ak@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Varad Gautam <varad.gautam@...e.com>,
Dario Faggioli <dfaggioli@...e.com>,
Brijesh Singh <brijesh.singh@....com>,
Mike Rapoport <rppt@...nel.org>, x86@...nel.org,
linux-mm@...ck.org, linux-coco@...ts.linux.dev,
linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
Mike Rapoport <rppt@...ux.ibm.com>
Subject: Re: [PATCHv4 1/8] mm: Add support for unaccepted memory
On Wed, Apr 13, 2022 at 12:36:11PM +0200, David Hildenbrand wrote:
> On 12.04.22 18:08, Dave Hansen wrote:
> > On 4/12/22 01:15, David Hildenbrand wrote:
> >> Can we simply automate this using a kthread or smth like that, which
> >> just traverses the free page lists and accepts pages (similar, but
> >> different to free page reporting)?
> >
> > That's definitely doable.
> >
> > The downside is that this will force premature consumption of physical
> > memory resources that the guest may never use. That's a particular
> > problem on TDX systems since there is no way for a VMM to reclaim guest
> > memory short of killing the guest.
>
> IIRC, the hypervisor will usually effectively populate all guest RAM
> either way right now.
No, it is not usual. By default QEMU/KVM uses anonymous mapping and
fault-in memory on demand.
Yes, there's an option to pre-populate guest memory, but it is not the
default.
> So yes, for hypervisors that might optimize for
> that, that statement would be true. But I lost track how helpful it
> would be in the near future e.g., with the fd-based private guest memory
> -- maybe they already optimize for delayed acceptance of memory, turning
> it into delayed population.
>
> >
> > In other words, I can see a good argument either way:
> > 1. The kernel should accept everything to avoid the perf nastiness
> > 2. The kernel should accept only what it needs in order to reduce memory
> > use
> >
> > I'm kinda partial to #1 though, if I had to pick only one.
> >
> > The other option might be to tie this all to DEFERRED_STRUCT_PAGE_INIT.
> > Have the rule that everything that gets a 'struct page' must be
> > accepted. If you want to do delayed acceptance, you do it via
> > DEFERRED_STRUCT_PAGE_INIT.
>
> That could also be an option, yes. At least being able to chose would be
> good. But IIRC, DEFERRED_STRUCT_PAGE_INIT will still make the system get
> stuck during boot and wait until everything was accepted.
Right. It deferred page init has to be done before init.
> I see the following variants:
>
> 1) Slow boot; after boot, all memory is already accepted.
> 2) Fast boot; after boot, all memory will slowly but steadily get
> accepted in the background. After a while, all memory is accepted and
> can be signaled to user space.
> 3) Fast boot; after boot, memory gets accepted on demand. This is what
> we have in this series.
>
> I somehow don't quite like 3), but with deferred population in the
> hypervisor, it might just make sense.
Conceptionally, 3 is not different from what happens now. The first time
normal VM touches the page (like on handling __GFP_ZERO) the page gets
allocated on host. It can take very long time if it kicks in direct
reclaim on the host.
The only difference is that it is *usually* slower.
I guest we can make a case for making 1 an option to match pre-populated
use case for normal VMs.
Frankly, I think option 2 is the worst one. You still CPU cycles from the
workload after boot to do the job that may or may not be needed. It is an
half-measure that helps nobody.
--
Kirill A. Shutemov
Powered by blists - more mailing lists