[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKyMfvWe8JetkbRL@kernel.org>
Date: Mon, 25 Aug 2025 19:17:02 +0300
From: Mike Rapoport <rppt@...nel.org>
To: David Hildenbrand <david@...hat.com>
Cc: Mika Penttilä <mpenttil@...hat.com>,
	linux-kernel@...r.kernel.org,
	Alexander Potapenko <glider@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Brendan Jackman <jackmanb@...gle.com>,
	Christoph Lameter <cl@...two.org>, Dennis Zhou <dennis@...nel.org>,
	Dmitry Vyukov <dvyukov@...gle.com>, dri-devel@...ts.freedesktop.org,
	intel-gfx@...ts.freedesktop.org, iommu@...ts.linux.dev,
	io-uring@...r.kernel.org, Jason Gunthorpe <jgg@...dia.com>,
	Jens Axboe <axboe@...nel.dk>, Johannes Weiner <hannes@...xchg.org>,
	John Hubbard <jhubbard@...dia.com>, kasan-dev@...glegroups.com,
	kvm@...r.kernel.org, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-arm-kernel@...s.com, linux-arm-kernel@...ts.infradead.org,
	linux-crypto@...r.kernel.org, linux-ide@...r.kernel.org,
	linux-kselftest@...r.kernel.org, linux-mips@...r.kernel.org,
	linux-mmc@...r.kernel.org, linux-mm@...ck.org,
	linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
	linux-scsi@...r.kernel.org,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Marco Elver <elver@...gle.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Michal Hocko <mhocko@...e.com>, Muchun Song <muchun.song@...ux.dev>,
	netdev@...r.kernel.org, Oscar Salvador <osalvador@...e.de>,
	Peter Xu <peterx@...hat.com>, Robin Murphy <robin.murphy@....com>,
	Suren Baghdasaryan <surenb@...gle.com>, Tejun Heo <tj@...nel.org>,
	virtualization@...ts.linux.dev, Vlastimil Babka <vbabka@...e.cz>,
	wireguard@...ts.zx2c4.com, x86@...nel.org, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH RFC 10/35] mm/hugetlb: cleanup
 hugetlb_folio_init_tail_vmemmap()
On Mon, Aug 25, 2025 at 05:42:33PM +0200, David Hildenbrand wrote:
> On 25.08.25 16:59, Mike Rapoport wrote:
> > On Mon, Aug 25, 2025 at 04:38:03PM +0200, David Hildenbrand wrote:
> > > On 25.08.25 16:32, Mike Rapoport wrote:
> > > > On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote:
> > > > > On 23.08.25 10:59, Mike Rapoport wrote:
> > > > > > On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote:
> > > > > > > On 22.08.25 06:09, Mika Penttilä wrote:
> > > > > > > > 
> > > > > > > > On 8/21/25 23:06, David Hildenbrand wrote:
> > > > > > > > 
> > > > > > > > > All pages were already initialized and set to PageReserved() with a
> > > > > > > > > refcount of 1 by MM init code.
> > > > > > > > 
> > > > > > > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to
> > > > > > > > initialize struct pages?
> > > > > > > 
> > > > > > > Excellent point, I did not know about that one.
> > > > > > > 
> > > > > > > Spotting that we don't do the same for the head page made me assume that
> > > > > > > it's just a misuse of __init_single_page().
> > > > > > > 
> > > > > > > But the nasty thing is that we use memblock_reserved_mark_noinit() to only
> > > > > > > mark the tail pages ...
> > > > > > 
> > > > > > And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
> > > > > > disabled struct pages are initialized regardless of
> > > > > > memblock_reserved_mark_noinit().
> > > > > > 
> > > > > > I think this patch should go in before your updates:
> > > > > 
> > > > > Shouldn't we fix this in memblock code?
> > > > > 
> > > > > Hacking around that in the memblock_reserved_mark_noinit() user sound wrong
> > > > > -- and nothing in the doc of memblock_reserved_mark_noinit() spells that
> > > > > behavior out.
> > > > 
> > > > We can surely update the docs, but unfortunately I don't see how to avoid
> > > > hacking around it in hugetlb.
> > > > Since it's used to optimise HVO even further to the point hugetlb open
> > > > codes memmap initialization, I think it's fair that it should deal with all
> > > > possible configurations.
> > > 
> > > Remind me, why can't we support memblock_reserved_mark_noinit() when
> > > CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled?
> > 
> > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled we initialize the entire
> > memmap early (setup_arch()->free_area_init()), and we may have a bunch of
> > memblock_reserved_mark_noinit() afterwards
> 
> Oh, you mean that we get effective memblock modifications after already
> initializing the memmap.
> 
> That sounds ... interesting :)
It's memmap, not the free lists. Without deferred init, memblock is active
for a while after memmap initialized and before the memory goes to the free
lists.
 
> So yeah, we have to document this for memblock_reserved_mark_noinit().
> 
> Is it also a problem for kexec_handover?
With KHO it's also interesting, but it does not support deferred struct
page init for now :)
 
> We should do something like:
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 154f1d73b61f2..ed4c563d72c32 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1091,13 +1091,16 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
>  /**
>   * memblock_reserved_mark_noinit - Mark a reserved memory region with flag
> - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized
> - * for this region.
> + * MEMBLOCK_RSRV_NOINIT which allows for the "struct pages" corresponding
> + * to this region not getting initialized, because the caller will take
> + * care of it.
>   * @base: the base phys addr of the region
>   * @size: the size of the region
>   *
> - * struct pages will not be initialized for reserved memory regions marked with
> - * %MEMBLOCK_RSRV_NOINIT.
> + * "struct pages" will not be initialized for reserved memory regions marked
> + * with %MEMBLOCK_RSRV_NOINIT if this function is called before initialization
> + * code runs. Without CONFIG_DEFERRED_STRUCT_PAGE_INIT, it is more likely
> + * that this function is not effective.
>   *
>   * Return: 0 on success, -errno on failure.
>   */
I have a different version :)
 
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b96746376e17..d20d091c6343 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -40,8 +40,9 @@ extern unsigned long long max_possible_pfn;
  * via a driver, and never indicated in the firmware-provided memory map as
  * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
  * kernel resource tree.
- * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are
- * not initialized (only for reserved regions).
+ * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages don't have
+ * PG_Reserved set and are completely not initialized when
+ * %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled (only for reserved regions).
  * @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use,
  * either explictitly with memblock_reserve_kern() or via memblock
  * allocation APIs. All memblock allocations set this flag.
diff --git a/mm/memblock.c b/mm/memblock.c
index 154f1d73b61f..02de5ffb085b 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1091,13 +1091,15 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
 
 /**
  * memblock_reserved_mark_noinit - Mark a reserved memory region with flag
- * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized
- * for this region.
+ * MEMBLOCK_RSRV_NOINIT
+ *
  * @base: the base phys addr of the region
  * @size: the size of the region
  *
- * struct pages will not be initialized for reserved memory regions marked with
- * %MEMBLOCK_RSRV_NOINIT.
+ * The struct pages for the reserved regions marked %MEMBLOCK_RSRV_NOINIT will
+ * not have %PG_Reserved flag set.
+ * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, setting this flags also
+ * completly bypasses the initialization of struct pages for this region.
  *
  * Return: 0 on success, -errno on failure.
  */
 
> Optimizing the hugetlb code could be done, but I am not sure how high
> the priority is (nobody complained so far about the double init).
> 
> -- 
> Cheers
> 
> David / dhildenb
> 
-- 
Sincerely yours,
Mike.
Powered by blists - more mailing lists
 
