lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251219183346.3627510-1-jiaqiyan@google.com>
Date: Fri, 19 Dec 2025 18:33:43 +0000
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: jackmanb@...gle.com, hannes@...xchg.org, linmiaohe@...wei.com, 
	ziy@...dia.com, harry.yoo@...cle.com, willy@...radead.org
Cc: nao.horiguchi@...il.com, david@...hat.com, lorenzo.stoakes@...cle.com, 
	william.roche@...cle.com, tony.luck@...el.com, wangkefeng.wang@...wei.com, 
	jane.chu@...cle.com, akpm@...ux-foundation.org, osalvador@...e.de, 
	muchun.song@...ux.dev, rientjes@...gle.com, duenwen@...gle.com, 
	jthoughton@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com, 
	mhocko@...e.com, Jiaqi Yan <jiaqiyan@...gle.com>
Subject: [PATCH v2 0/3] Only free healthy pages in high-order HWPoison folio

At the end of dissolve_free_hugetlb_folio that a free HugeTLB
folio becomes non-HugeTLB, it is released to buddy allocator
as a high-order folio, e.g. a folio that contains 262144 pages
if the folio was a 1G HugeTLB hugepage.

This is problematic if the HugeTLB hugepage contained HWPoison
subpages. In that case, since buddy allocator does not check
HWPoison for non-zero-order folio, the raw HWPoison page can
be given out with its buddy page and be re-used by either
kernel or userspace.

Memory failure recovery (MFR) in kernel does attempt to take
raw HWPoison page off buddy allocator after
dissolve_free_hugetlb_folio. However, there is always a time
window between dissolve_free_hugetlb_folio frees a HWPoison
high-order folio to buddy allocator and MFR takes HWPoison
raw page off buddy allocator.

One obvious way to avoid this problem is to add page sanity
checks in page allocate or free path. However, it is against
the past efforts to reduce sanity check overhead [1,2,3].

Introduce free_has_hwpoison_pages to only free the healthy
pages and excludes the HWPoison ones in the high-order folio.
The idea is to iterate through the sub-pages of the folio to
identify contiguous ranges of healthy pages. Instead of freeing
pages one by one, decompose healthy ranges into the largest
possible blocks. Each block meets the requirements to be freed
to buddy allocator by calling __free_frozen_pages directly.

free_has_hwpoison_pages has linear time complexity O(N) wrt the
number of pages in the folio. While the power-of-two decomposition
ensures that the number of calls to the buddy allocator is
logarithmic for each contiguous healthy range, the mandatory
linear scan of pages to identify PageHWPoison defines the
overall time complexity.

I tested with some test-only code [4] and hugetlb-mfr [5], by
checking the status of pcplist and freelist immediately after
dissolve_free_hugetlb_folio a free hugetlb page that contains
3 HWPoison raw pages:

* HWPoison pages are excluded by free_has_hwpoison_pages.

* Some healthy pages can be in zone->per_cpu_pageset (pcplist)
  because pcp_count is not high enough.

* Many healthy pages are already in some order's
  zone->free_area[order].free_list (freelist).

* In rare cases, some healthy pages are in neither pcplist
  nor freelist. My best guest is they are allocated before
  the test checks.

[1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/
[2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/
[3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz
[4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
[5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com

Jiaqi Yan (3):
  mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio
  mm/page_alloc: only free healthy pages in high-order HWPoison folio
  mm/memory-failure: simplify __page_handle_poison

 include/linux/page-flags.h |   2 +-
 mm/memory-failure.c        |  32 +++---------
 mm/page_alloc.c            | 101 +++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 27 deletions(-)

-- 
2.52.0.322.g1dd061c0dc-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ