lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ff6fc46-49f1-49b0-b7e4-4cb37ec10a57@lucifer.local>
Date: Mon, 4 Aug 2025 18:18:43 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Ryan Roberts <ryan.roberts@....com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>, Vlastimil Babka <vbabka@...e.cz>,
        Zi Yan <ziy@...dia.com>, Mike Rapoport <rppt@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>, Nico Pache <npache@...hat.com>,
        Dev Jain <dev.jain@....com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, willy@...radead.org, x86@...nel.org,
        linux-block@...r.kernel.org, Ritesh Harjani <ritesh.list@...il.com>,
        linux-fsdevel@...r.kernel.org, "Darrick J . Wong" <djwong@...nel.org>,
        mcgrof@...nel.org, gost.dev@...sung.com, hch@....de,
        Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH 3/5] mm: add static huge zero folio

On Mon, Aug 04, 2025 at 07:07:06PM +0200, David Hildenbrand wrote:
> > Yeah I really don't like this. This seems overly complicated and too
> > fiddly. Also if I want a static PMD, do I want to wait a minute for next
> > attempt?
> >
> > Also doing things this way we might end up:
> >
> > 0. Enabling CONFIG_STATIC_HUGE_ZERO_FOLIO
> > 1. Not doing anything that needs a static PMD for a while + get fragmentation.
> > 2. Do something that needs it - oops can't get order-9 page, and waiting 60
> >     seconds between attempts
> > 3. This is silent so you think you have it switched on but are actually getting
> >     bad performance.
> >
> > I appreciate wanting to reuse this code, but we need to find a way to do this
> > really really early, and get rid of this arbitrary time out. It's very aribtrary
> > and we have no easy way of tracing how this might behave under workload.
> >
> > Also we end up pinning an order-9 page either way, so no harm in getting it
> > first thing?
>
> What we could do, to avoid messing with memblock and two ways of initializing a huge zero folio early, and just disable the shrinker.

Nice, I like this approach!

>
> Downside is that the page is really static (not just when actually used at least once). I like it:

Well I'm not sure this is a downside :P

User is explicitly enabling an option that says 'I'm cool to lose an order-9
page for this'.

>
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0ce86e14ab5e1..8e2aa18873098 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -153,6 +153,7 @@ config X86
>  	select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP	if X86_64
>  	select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64
>  	select ARCH_WANTS_THP_SWAP		if X86_64
> +	select ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO if X86_64
>  	select ARCH_HAS_PARANOID_L1D_FLUSH
>  	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
>  	select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7748489fde1b7..ccfa5c95f14b1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -495,6 +495,17 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
>  struct folio *mm_get_huge_zero_folio(struct mm_struct *mm);
>  void mm_put_huge_zero_folio(struct mm_struct *mm);
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> +	if (!IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO))
> +		return NULL;
> +
> +	if (unlikely(!huge_zero_folio))
> +		return NULL;
> +
> +	return huge_zero_folio;
> +}
> +
>  static inline bool thp_migration_supported(void)
>  {
>  	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
> @@ -685,6 +696,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb,
>  {
>  	return 0;
>  }
> +
> +static inline struct folio *get_static_huge_zero_folio(void)
> +{
> +	return NULL;
> +}
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  static inline int split_folio_to_list_to_order(struct folio *folio,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e443fe8cd6cf2..366a6d2d771e3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -823,6 +823,27 @@ config ARCH_WANT_GENERAL_HUGETLB
>  config ARCH_WANTS_THP_SWAP
>  	def_bool n
> +config ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO
> +	def_bool n
> +
> +config STATIC_HUGE_ZERO_FOLIO
> +	bool "Allocate a PMD sized folio for zeroing"
> +	depends on ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO && TRANSPARENT_HUGEPAGE
> +	help
> +	  Without this config enabled, the huge zero folio is allocated on
> +	  demand and freed under memory pressure once no longer in use.
> +	  To detect remaining users reliably, references to the huge zero folio
> +	  must be tracked precisely, so it is commonly only available for mapping
> +	  it into user page tables.
> +
> +	  With this config enabled, the huge zero folio can also be used
> +	  for other purposes that do not implement precise reference counting:
> +	  it is allocated statically and never freed, allowing for more
> +	  wide-spread use, for example, when performing I/O similar to the
> +	  traditional shared zeropage.
> +
> +	  Not suitable for memory constrained systems.
> +
>  config MM_ID
>  	def_bool n
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index ff06dee213eb2..f65ba3e6f0824 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -866,9 +866,14 @@ static int __init thp_shrinker_init(void)
>  	huge_zero_folio_shrinker->scan_objects = shrink_huge_zero_folio_scan;
>  	shrinker_register(huge_zero_folio_shrinker);
> -	deferred_split_shrinker->count_objects = deferred_split_count;
> -	deferred_split_shrinker->scan_objects = deferred_split_scan;
> -	shrinker_register(deferred_split_shrinker);
> +	if (IS_ENABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) {
> +		if (!get_huge_zero_folio())
> +			pr_warn("Allocating static huge zero folio failed\n");
> +	} else {
> +		deferred_split_shrinker->count_objects = deferred_split_count;
> +		deferred_split_shrinker->scan_objects = deferred_split_scan;
> +		shrinker_register(deferred_split_shrinker);
> +	}
>  	return 0;
>  }
> --
> 2.50.1
>
>
> Now, one thing I do not like is that we have "ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO" but
> then have a user-selectable option.
>
> Should we just get rid of ARCH_WANTS_STATIC_HUGE_ZERO_FOLIO?

Yeah, though I guess we probably need to make it need CONFIG_MMU if so?
Probably don't want to provide it if it might somehow break things?

I guess we could keep it as long as CONFIG_STATIC_HUGE_ZERO_FOLIO depend on
something sensible like CONFIG_MMU maybe 64-bit too?

Anyway this approach looks generally good!

>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ