linux-kernel - Re: [PATCH v5 09/16] kexec: enable KHO support for memory preservation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z_Y4k4rDO-BbMjqs@kernel.org>
Date: Wed, 9 Apr 2025 12:06:27 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Pratyush Yadav <ptyadav@...zon.de>,
	Changyuan Lyu <changyuanl@...gle.com>, linux-kernel@...r.kernel.org,
	graf@...zon.com, akpm@...ux-foundation.org, luto@...nel.org,
	anthony.yznaga@...cle.com, arnd@...db.de, ashish.kalra@....com,
	benh@...nel.crashing.org, bp@...en8.de, catalin.marinas@....com,
	dave.hansen@...ux.intel.com, dwmw2@...radead.org,
	ebiederm@...ssion.com, mingo@...hat.com, jgowans@...zon.com,
	corbet@....net, krzk@...nel.org, mark.rutland@....com,
	pbonzini@...hat.com, pasha.tatashin@...een.com, hpa@...or.com,
	peterz@...radead.org, robh+dt@...nel.org, robh@...nel.org,
	saravanak@...gle.com, skinsburskii@...ux.microsoft.com,
	rostedt@...dmis.org, tglx@...utronix.de, thomas.lendacky@....com,
	usama.arif@...edance.com, will@...nel.org,
	devicetree@...r.kernel.org, kexec@...ts.infradead.org,
	linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
	linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v5 09/16] kexec: enable KHO support for memory
 preservation

On Mon, Apr 07, 2025 at 02:03:05PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 07, 2025 at 07:31:21PM +0300, Mike Rapoport wrote:
> >
> > Ok, let's stick with memdesc then. Put aside the name it looks like we do
> > agree that KHO needs to provide a way to preserve memory allocated from
> > buddy along with some of the metadata describing that memory, like order
> > for multi-order allocations.
> 
> +1
> 
> > The issue I see with bitmaps is that there's nothing except the order that
> > we can save. And if sometime later we'd have to recreate memdesc for that
> > memory, that would mean allocating a correct data structure, i.e. struct
> > folio, struct slab, struct vmalloc maybe.
> 
> Yes. The caller would have to take care of this using a caller
> specific serialization of any memdesc data. Like slab would have to
> presumably record the object size and the object allocation bitmap.
> 
> > I'm not sure we are going to preserve slabs at least at the foreseeable
> > future, but vmalloc seems like something that we'd have to address.
> 
> And I suspect vmalloc doesn't need to preserve any memdesc information?
> It can all be recreated

vmalloc does not have anything in memdesc now, just plain order-0 pages
from alloc_pages variants.

Now we've settled with terminology, and given that currently memdesc ==
struct page, I think we need kho_preserve_folio(struct *folio) for actual
struct folios and, apparently other high order allocations, and
kho_preserve_pages(struct page *, int nr) for memblock, vmalloc and
alloc_pages_exact.

On the restore path kho_restore_folio() will recreate multi-order thingy by
doing parts of what prep_new_page() does. And kho_restore_pages() will
recreate order-0 pages as if they were allocated from buddy.

If the caller needs more in its memdesc, it is responsible to fill in the
missing bits.
 
> > > Also the bitmap scanning to optimize the memblock reserve isn't
> > > implemented for xarray.. I don't think this is representative..
> > 
> > I believe that even with optimization of bitmap scanning maple tree would
> > perform much better when the memory is not fragmented. 
> 
> Hard to guess, bitmap scanning is not free, especially if there are
> lots of zeros, but memory allocating maple tree nodes and locking them
> is not free either so who knows where things cross over..
> 
> > And when it is fragmented both will need to call memblock_reserve()
> > similar number of times and there won't be real difference. Of
> > course maple tree will consume much more memory in the worst case.
> 
> Yes.
> 
> bitmaps are bounded like the comment says, 512K for 16G of memory with
> arbitary order 0 fragmentation.
> 
> Assuming absolute worst case fragmentation maple tree (@24 bytes per
> range, alternating allocated/freed pattern) would require around
> 50M. Then almost doubled since we have the maple tree and then the
> serialized copy.
> 
> 100Mb vs 512k - I will pick the 512K :)

Nah, memory is cheap nowadays :)

Ok, let's start with bitmaps and then see what are the actual bottlenecks
we have to optimize.
 
> Jason

-- 
Sincerely yours,
Mike.