lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbcb6038-43a9-4d47-8cf7-f5ca32824079@redhat.com>
Date: Tue, 15 Jul 2025 16:53:23 +0200
From: David Hildenbrand <david@...hat.com>
To: "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>,
 Suren Baghdasaryan <surenb@...gle.com>, Ryan Roberts <ryan.roberts@....com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, Borislav Petkov <bp@...en8.de>,
 Ingo Molnar <mingo@...hat.com>, "H . Peter Anvin" <hpa@...or.com>,
 Vlastimil Babka <vbabka@...e.cz>, Zi Yan <ziy@...dia.com>,
 Mike Rapoport <rppt@...nel.org>, Dave Hansen <dave.hansen@...ux.intel.com>,
 Michal Hocko <mhocko@...e.com>, Lorenzo Stoakes
 <lorenzo.stoakes@...cle.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Thomas Gleixner <tglx@...utronix.de>, Nico Pache <npache@...hat.com>,
 Dev Jain <dev.jain@....com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Jens Axboe <axboe@...nel.dk>
Cc: linux-kernel@...r.kernel.org, willy@...radead.org, linux-mm@...ck.org,
 x86@...nel.org, linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 "Darrick J . Wong" <djwong@...nel.org>, mcgrof@...nel.org,
 gost.dev@...sung.com, hch@....de, Pankaj Raghav <p.raghav@...sung.com>
Subject: Re: [PATCH v2 3/5] mm: add static PMD zero page

On 15.07.25 16:21, David Hildenbrand wrote:
> On 07.07.25 16:23, Pankaj Raghav (Samsung) wrote:
>> From: Pankaj Raghav <p.raghav@...sung.com>
>>
>> There are many places in the kernel where we need to zeroout larger
>> chunks but the maximum segment we can zeroout at a time by ZERO_PAGE
>> is limited by PAGE_SIZE.
>>
>> This is especially annoying in block devices and filesystems where we
>> attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage
>> bvec support in block layer, it is much more efficient to send out
>> larger zero pages as a part of single bvec.
>>
>> This concern was raised during the review of adding LBS support to
>> XFS[1][2].
>>
>> Usually huge_zero_folio is allocated on demand, and it will be
>> deallocated by the shrinker if there are no users of it left. At moment,
>> huge_zero_folio infrastructure refcount is tied to the process lifetime
>> that created it. This might not work for bio layer as the completitions
>> can be async and the process that created the huge_zero_folio might no
>> longer be alive.
> 
> Of course, what we could do is indicating that there is any untracked
> reference to the huge zero folio, and then simply refuse to free it for
> all eternity.
> 
> Essentially, every any non-mm reference -> un-shrinkable.
> 
> We'd still be allocating the huge zero folio dynamically. We could try
> allocating it on first usage either from memblock, or from the buddy if
> already around.
> 
> Then, we'd only need a config option to allow for that to happen.

Something incomplete and very hacky just to give an idea. It would try allocating
it if there is actual code running that would need it, and then have it
stick around forever.


diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e0a27f80f390d..357e29e98d8d2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -481,6 +481,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf);
  
  extern struct folio *huge_zero_folio;
  extern unsigned long huge_zero_pfn;
+atomic_t huge_zero_folio_is_static;
  
  static inline bool is_huge_zero_folio(const struct folio *folio)
  {
@@ -499,6 +500,16 @@ static inline bool is_huge_zero_pmd(pmd_t pmd)
  
  struct folio *mm_get_huge_zero_folio(struct mm_struct *mm);
  void mm_put_huge_zero_folio(struct mm_struct *mm);
+struct folio *__get_static_huge_zero_folio(void);
+
+static inline struct folio *get_static_huge_zero_folio(void)
+{
+       if (!IS_ENMABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO))
+               return NULL;
+       if (likely(atomic_read(&huge_zero_folio_is_static)))
+               return huge_zero_folio;
+       return get_static_huge_zero_folio();
+}
  
  static inline bool thp_migration_supported(void)
  {
@@ -509,7 +520,6 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
                            pmd_t *pmd, bool freeze);
  bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
                            pmd_t *pmdp, struct folio *folio);
-
  #else /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  static inline bool folio_test_pmd_mappable(struct folio *folio)
@@ -690,6 +700,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb,
  {
         return 0;
  }
+
+static inline struct folio *static_huge_zero_folio(void)
+{
+       return NULL;
+}
  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  static inline int split_folio_to_list_to_order(struct folio *folio,
@@ -703,4 +718,14 @@ static inline int split_folio_to_order(struct folio *folio, int new_order)
         return split_folio_to_list_to_order(folio, NULL, new_order);
  }
  
+static inline struct folio *largest_zero_folio(void)
+{
+       struct folio *folio;
+
+       folio = get_static_huge_zero_folio();
+       if (folio)
+               return folio;
+       return page_folio(ZERO_PAGE(0));
+}
+
  #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 31b5c4e61a574..eb49c69f9c8e2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -77,6 +77,7 @@ static bool split_underused_thp = true;
  static atomic_t huge_zero_refcount;
  struct folio *huge_zero_folio __read_mostly;
  unsigned long huge_zero_pfn __read_mostly = ~0UL;
+atomic_t huge_zero_folio_is_static __read_mostly;
  unsigned long huge_anon_orders_always __read_mostly;
  unsigned long huge_anon_orders_madvise __read_mostly;
  unsigned long huge_anon_orders_inherit __read_mostly;
@@ -266,6 +267,25 @@ void mm_put_huge_zero_folio(struct mm_struct *mm)
                 put_huge_zero_page();
  }
  
+#ifdef CONFIG_STATIC_HUGE_ZERO_FOLIO
+struct folio *__get_static_huge_zero_folio(void)
+{
+       /*
+        * Our raised reference will prevent the shrinker from ever having
+        * success -> static.
+        */
+       if (atomic_read(&huge_zero_folio_is_static))
+               return huge_zero_folio;
+       /* TODO: memblock allocation if buddy is not up yet? Or Reject that earlier. */
+       if (!get_huge_zero_page())
+               return NULL;
+       if (atomic_cmpxchg(&huge_zero_folio_is_static, 0, 1) != 0)
+               put_huge_zero_page();
+       return huge_zero_folio;
+
+}
+#endif /* CONFIG_STATIC_HUGE_ZERO_FOLIO */
+
  static unsigned long shrink_huge_zero_page_count(struct shrinker *shrink,
                                         struct shrink_control *sc)
  {


-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ