[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_KtFnmExftpotmR@kernel.org>
Date: Sun, 6 Apr 2025 19:34:30 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Pratyush Yadav <ptyadav@...zon.de>
Cc: Jason Gunthorpe <jgg@...dia.com>, Changyuan Lyu <changyuanl@...gle.com>,
linux-kernel@...r.kernel.org, graf@...zon.com,
akpm@...ux-foundation.org, luto@...nel.org,
anthony.yznaga@...cle.com, arnd@...db.de, ashish.kalra@....com,
benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
dave.hansen@...ux.intel.com, dwmw2@...radead.org,
ebiederm@...ssion.com, mingo@...hat.com, jgowans@...zon.com,
corbet@....net, krzk@...nel.org, mark.rutland@....com,
pbonzini@...hat.com, pasha.tatashin@...een.com, hpa@...or.com,
peterz@...radead.org, robh+dt@...nel.org, robh@...nel.org,
saravanak@...gle.com, skinsburskii@...ux.microsoft.com,
rostedt@...dmis.org, tglx@...utronix.de, thomas.lendacky@....com,
usama.arif@...edance.com, will@...nel.org,
devicetree@...r.kernel.org, kexec@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory
preservation
On Fri, Apr 04, 2025 at 04:15:28PM +0000, Pratyush Yadav wrote:
> Hi Mike,
>
> On Fri, Apr 04 2025, Mike Rapoport wrote:
>
> [...]
> > As for the optimizations of memblock reserve path, currently it what hurts
> > the most in my and Pratyush experiments. They are not very representative,
> > but still, preserving lots of pages/folios spread all over would have it's
> > toll on the mm initialization. And I don't think invasive changes to how
> > buddy and memory map initialization are the best way to move forward and
> > optimize that. Quite possibly we'd want to be able to minimize amount of
> > *ranges* that we preserve.
> >
> > So from the three alternatives we have now (xarrays + bitmaps, tables +
> > bitmaps and maple tree for ranges) maple tree seems to be the simplest and
> > efficient enough to start with.
>
> But you'd need to somehow serialize the maple tree ranges into some
> format. So you would either end up going back to the kho_mem ranges we
> had, or have to invent something more complex. The sample code you wrote
> is pretty much going back to having kho_mem ranges.
It's a bit better and it's not a part of FDT which Jason was so much
against :)
> And if you say that we should minimize the amount of ranges, the table +
> bitmaps is still a fairly good data structure. You can very well have a
> higher order table where your entire range is a handful of bits. This
> lets you track a small number of ranges fairly efficiently -- both in
> terms of memory and in terms of CPU. I think the only place where it
> doesn't work as well as a maple tree is if you want to merge or split a
> lot ranges quickly. But if you say that you only want to have a handful
> of ranges, does that really matter?
Until we all agree that we are bypassing memblock_reserve() and
reimplementing memory map and free lists initialization for KHO we must
minimize the amount of memblock_reserve() calls. And maple tree allows
easily merge ranges where appropriate resulting in much smaller amount of
ranges that kho_mem had.
> Also, I think the allocation pattern depends on which use case you have
> in mind. For hypervisor live update, you might very well only have a
> handful of ranges. The use case I have in mind is for taking a userspace
> process, quickly checkpointing it by dumping its memory contents to a
> memfd, and restoring it after KHO. For that, the ability to do random
> sparse allocations quickly helps a lot.
>
> So IMO the table works well for both sparse and dense allocations. So
> why have a data structure that only solves one problem when we can have
> one that solves both? And honestly, I don't think the table is that much
> more complex either -- both in terms of understanding the idea and in
> terms of code -- the whole thing is like 200 lines.
It's more than 200 line longer than maple tree if we count the lines.
My point is both table and xarrays are trying to optimize for an unknown
goal. kho_mem with all it's drawbacks was an obvious baseline. Maple tree
improves that baseline and it is more straightforward than the
alternatives.
> Also, I think changes to buddy initialization _is_ the way to optimize
> boot times. Having maple tree ranges and moving them around into
> memblock ranges does not really scale very well for anything other than
> a handful of ranges, and we shouldn't limit ourselves to that without
> good reason.
As I said, this means an alternative implementation of the memory map and
free lists, which has been and remains quite fragile.
So we'd better start with something that does not require that in the
roadmap.
> --
> Regards,
> Pratyush Yadav
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists