[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z_Y4k4rDO-BbMjqs@kernel.org>
Date: Wed, 9 Apr 2025 12:06:27 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Pratyush Yadav <ptyadav@...zon.de>,
Changyuan Lyu <changyuanl@...gle.com>, linux-kernel@...r.kernel.org,
graf@...zon.com, akpm@...ux-foundation.org, luto@...nel.org,
anthony.yznaga@...cle.com, arnd@...db.de, ashish.kalra@....com,
benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
dave.hansen@...ux.intel.com, dwmw2@...radead.org,
ebiederm@...ssion.com, mingo@...hat.com, jgowans@...zon.com,
corbet@....net, krzk@...nel.org, mark.rutland@....com,
pbonzini@...hat.com, pasha.tatashin@...een.com, hpa@...or.com,
peterz@...radead.org, robh+dt@...nel.org, robh@...nel.org,
saravanak@...gle.com, skinsburskii@...ux.microsoft.com,
rostedt@...dmis.org, tglx@...utronix.de, thomas.lendacky@....com,
usama.arif@...edance.com, will@...nel.org,
devicetree@...r.kernel.org, kexec@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory
preservation
On Mon, Apr 07, 2025 at 02:03:05PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 07, 2025 at 07:31:21PM +0300, Mike Rapoport wrote:
> >
> > Ok, let's stick with memdesc then. Put aside the name it looks like we do
> > agree that KHO needs to provide a way to preserve memory allocated from
> > buddy along with some of the metadata describing that memory, like order
> > for multi-order allocations.
>
> +1
>
> > The issue I see with bitmaps is that there's nothing except the order that
> > we can save. And if sometime later we'd have to recreate memdesc for that
> > memory, that would mean allocating a correct data structure, i.e. struct
> > folio, struct slab, struct vmalloc maybe.
>
> Yes. The caller would have to take care of this using a caller
> specific serialization of any memdesc data. Like slab would have to
> presumably record the object size and the object allocation bitmap.
>
> > I'm not sure we are going to preserve slabs at least at the foreseeable
> > future, but vmalloc seems like something that we'd have to address.
>
> And I suspect vmalloc doesn't need to preserve any memdesc information?
> It can all be recreated
vmalloc does not have anything in memdesc now, just plain order-0 pages
from alloc_pages variants.
Now we've settled with terminology, and given that currently memdesc ==
struct page, I think we need kho_preserve_folio(struct *folio) for actual
struct folios and, apparently other high order allocations, and
kho_preserve_pages(struct page *, int nr) for memblock, vmalloc and
alloc_pages_exact.
On the restore path kho_restore_folio() will recreate multi-order thingy by
doing parts of what prep_new_page() does. And kho_restore_pages() will
recreate order-0 pages as if they were allocated from buddy.
If the caller needs more in its memdesc, it is responsible to fill in the
missing bits.
> > > Also the bitmap scanning to optimize the memblock reserve isn't
> > > implemented for xarray.. I don't think this is representative..
> >
> > I believe that even with optimization of bitmap scanning maple tree would
> > perform much better when the memory is not fragmented.
>
> Hard to guess, bitmap scanning is not free, especially if there are
> lots of zeros, but memory allocating maple tree nodes and locking them
> is not free either so who knows where things cross over..
>
> > And when it is fragmented both will need to call memblock_reserve()
> > similar number of times and there won't be real difference. Of
> > course maple tree will consume much more memory in the worst case.
>
> Yes.
>
> bitmaps are bounded like the comment says, 512K for 16G of memory with
> arbitary order 0 fragmentation.
>
> Assuming absolute worst case fragmentation maple tree (@24 bytes per
> range, alternating allocated/freed pattern) would require around
> 50M. Then almost doubled since we have the maple tree and then the
> serialized copy.
>
> 100Mb vs 512k - I will pick the 512K :)
Nah, memory is cheap nowadays :)
Ok, let's start with bitmaps and then see what are the actual bottlenecks
we have to optimize.
> Jason
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists