[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPTztWazH=bJHTvpLfqHK3cYRnO=TXcLWEUJKYsxW1WV8XifrA@mail.gmail.com>
Date: Wed, 26 Mar 2025 09:25:59 -0700
From: Frank van der Linden <fvdl@...gle.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Changyuan Lyu <changyuanl@...gle.com>, linux-kernel@...r.kernel.org, graf@...zon.com,
akpm@...ux-foundation.org, luto@...nel.org, anthony.yznaga@...cle.com,
arnd@...db.de, ashish.kalra@....com, benh@...nel.crashing.org, bp@...en8.de,
catalin.marinas@....com, dave.hansen@...ux.intel.com, dwmw2@...radead.org,
ebiederm@...ssion.com, mingo@...hat.com, jgowans@...zon.com, corbet@....net,
krzk@...nel.org, mark.rutland@....com, pbonzini@...hat.com,
pasha.tatashin@...een.com, hpa@...or.com, peterz@...radead.org,
ptyadav@...zon.de, robh+dt@...nel.org, robh@...nel.org, saravanak@...gle.com,
skinsburskii@...ux.microsoft.com, rostedt@...dmis.org, tglx@...utronix.de,
thomas.lendacky@....com, usama.arif@...edance.com, will@...nel.org,
devicetree@...r.kernel.org, kexec@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers
On Wed, Mar 26, 2025 at 4:59 AM Mike Rapoport <rppt@...nel.org> wrote:
[...]
> > There has, for example, been some talk about making hugetlbfs
> > persistent. You could have hugetlb_cma active. The hugetlb CMA areas
> > are set up quite early, quite some time before KHO restores memory. So
> > that would have to be changed somehow if the location of the KHO init
> > call would remain as close as possible to buddy init as possible. I
> > suspect there may be other uses.
>
> I think we can address this when/if implementing preservation for hugetlbfs
> and it will be tricky.
> If hugetlb in the first kernel uses a lot of memory, we just won't have
> enough scratch space for early hugetlb reservations in the second kernel
> regardless of hugetlb_cma. On the other hand, we already have the preserved
> hugetlbfs memory, so we'd probably need to reserve less memory in the
> second kernel.
>
> But anyway, it's completely different discussion about how to preserve
> hugetlbfs.
Right, there would have to be a KHO interface way to carry over the
early reserved memory and reinit it early too.
>
> > > > current requirement in the patch set seems to be "after sparse/page
> > > > init", but I'm not sure why it needs to be as close as possibly to
> > > > buddy init.
> > >
> > > Why would you say that sparse/page init would be a requirement here?
> >
> > At least in its current form, the KHO code expects vmemmap to be
> > initialized, as it does its restore base on page structures, as
> > deserialize_bitmap expects them. I think the use of the page->private
> > field was discussed in a separate thread, I think. If that is done
> > differently, it wouldn't rely on vmemmap being initialized.
>
> In the current form KHO does relies on vmemmap being allocated, but it does
> not rely on it being initialized. Marking memblock ranges NOINT ensures
> nothing touches the corresponding struct pages and KHO can use their fields
> up to the point the memory is returned to KHO callers.
>
> > A few more things I've noticed (not sure if these were discussed before):
> >
> > * Should KHO depend on CONFIG_DEFERRED_STRUCT_PAGE_INIT? Essentially,
> > marking memblock ranges as NOINIT doesn't work without
> > DEFERRED_STRUCT_PAGE_INIT. Although, if the page->private use
> > disappears, this wouldn't be an issue anymore.
>
> It does.
> memmap_init_reserved_pages() is called always, no matter of
> CONFIG_DEFERRED_STRUCT_PAGE_INIT is set or not and it skips initialization
> of NOINIT regions.
Yeah, I see - the ordering makes this work out.
MEMBLOCK_RSRV_NOINIT is a bit confusing in the sense that if you do a
memblock allocation in the !CONFIG_DEFERRED_STRUCT_PAGE_INIT case, and
that allocation is done before free_area_init(), the pages will always
get initialized regardless, since memmap_init_range() will do it. But
this is done before the KHO deserialize, so it works out.
>
> > * As a future extension, it could be nice to store vmemmap init
> > information in the KHO FDT. Then you can use that to init ranges in an
> > optimized way (HVO hugetlb or DAX-style persisted ranges) straight
> > away.
>
> These days memmap contents is unstable because of the folio/memdesc
> project, but in general carrying memory map data from kernel to kernel is
> indeed something to consider.
Yes, I think we might have a need for that, but we'll see.
Thanks,
- Frank
Powered by blists - more mailing lists