[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bD101cSC_7+OWkMei2QvUVjSaRLy+LboLe8Sz7KS3by4g@mail.gmail.com>
Date: Fri, 19 Dec 2025 11:28:50 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Evangelos Petrongonas <epetron@...zon.de>, Pratyush Yadav <pratyush@...nel.org>,
Alexander Graf <graf@...zon.com>, Andrew Morton <akpm@...ux-foundation.org>,
Jason Miu <jasonmiu@...gle.com>, linux-kernel@...r.kernel.org,
kexec@...ts.infradead.org, linux-mm@...ck.org, nh-open-source@...zon.com
Subject: Re: [PATCH] kho: add support for deferred struct page init
On Fri, Dec 19, 2025 at 4:19 AM Mike Rapoport <rppt@...nel.org> wrote:
>
> On Tue, Dec 16, 2025 at 10:36:01AM -0500, Pasha Tatashin wrote:
> > On Tue, Dec 16, 2025 at 10:19 AM Mike Rapoport <rppt@...nel.org> wrote:
> > >
> > > On Tue, Dec 16, 2025 at 10:05:27AM -0500, Pasha Tatashin wrote:
> > > > > > +static struct page *__init kho_get_preserved_page(phys_addr_t phys,
> > > > > > + unsigned int order)
> > > > > > +{
> > > > > > + unsigned long pfn = PHYS_PFN(phys);
> > > > > > + int nid = early_pfn_to_nid(pfn);
> > > > > > +
> > > > > > + for (int i = 0; i < (1 << order); i++)
> > > > > > + init_deferred_page(pfn + i, nid);
> > > > >
> > > > > This will skip pages below node->first_deferred_pfn, we need to use
> > > > > __init_page_from_nid() here.
> > > >
> > > > Mike, but those struct pages should be initialized early anyway. If
> > > > they are not yet initialized we have a problem, as they are going to
> > > > be re-initialized later.
> > >
> > > Can say I understand your point. Which pages should be initialized earlt?
> >
> > All pages below node->first_deferred_pfn.
> >
> > > And which pages will be reinitialized?
> >
> > kho_memory_init() is called after free_area_init() (which calls
> > memmap_init_range to initialize low memory struct pages). So, if we
> > use __init_page_from_nid() as suggested, we would be blindly running
> > __init_single_page() again on those low-memory pages that
> > memmap_init_range() already set up. This would cause double
> > initialization and corruptions due to losing the order information.
> >
> > > > > > +
> > > > > > + return pfn_to_page(pfn);
> > > > > > +}
> > > > > > +
> > > > > > static void __init deserialize_bitmap(unsigned int order,
> > > > > > struct khoser_mem_bitmap_ptr *elm)
> > > > > > {
> > > > > > @@ -449,7 +466,7 @@ static void __init deserialize_bitmap(unsigned int order,
> > > > > > int sz = 1 << (order + PAGE_SHIFT);
> > > > > > phys_addr_t phys =
> > > > > > elm->phys_start + (bit << (order + PAGE_SHIFT));
> > > > > > - struct page *page = phys_to_page(phys);
> > > > > > + struct page *page = kho_get_preserved_page(phys, order);
> > > > >
> > > > > I think it's better to initialize deferred struct pages later in
> > > > > kho_restore_page. deserialize_bitmap() runs before SMP and it already does
> > > >
> > > > The KHO memory should still be accessible early in boot, right?
> > >
> > > The memory is accessible. And we anyway should not use struct page for
> > > preserved memory before kho_restore_{folio,pages}.
> >
> > This makes sense, what happens if someone calls kho_restore_folio()
> > before deferred pages are initialized?
>
> That's fine, because this memory is still memblock_reserve()ed and deferred
> init skips reserved ranges.
> There is a problem however with the calls to kho_restore_{pages,folio}()
> after memblock is gone because we can't use early_pfn_to_nid() then.
I agree with the regarding memblock and early_pfn_to_nid(), but I
don't think we need to rely on early_pfn_to_nid() during the restore
phase.
> I think we can start with Evangelos' approach that initializes struct pages
> at deserialize time and then we'll see how to optimize it.
Let's do the lazy tail initialization that I proposed to you in a
chat: we initialize only the head struct page during
deserialize_bitmap(). Since this happens while memblock is still
active, we can safely use early_pfn_to_nid() to set the nid in the
head page's flags, and also preserve order as we do today.
Then, we can defer the initialization of all tail pages to
kho_restore_folio(). At that stage, we no longer need memblock or
early_pfn_to_nid(); we can simply inherit the nid from the head page
using page_to_nid(head).
This approach seems to give us the best of both worlds: It avoids the
memblock dependency during restoration. It keeps the serial work in
deserialize_bitmap() to a minimum (O(1)O(1) per region). It allows the
heavy lifting of tail page initialization to be done later in the boot
process, potentially in parallel, as you suggested.
Thanks,
Pasha
Powered by blists - more mailing lists