lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250108233128.14484-1-npache@redhat.com>
Date: Wed,  8 Jan 2025 16:31:16 -0700
From: Nico Pache <npache@...hat.com>
To: linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Cc: ryan.roberts@....com,
	anshuman.khandual@....com,
	catalin.marinas@....com,
	cl@...two.org,
	vbabka@...e.cz,
	mhocko@...e.com,
	apopple@...dia.com,
	dave.hansen@...ux.intel.com,
	will@...nel.org,
	baohua@...nel.org,
	jack@...e.cz,
	srivatsa@...il.mit.edu,
	haowenchao22@...il.com,
	hughd@...gle.com,
	aneesh.kumar@...nel.org,
	yang@...amperecomputing.com,
	peterx@...hat.com,
	ioworker0@...il.com,
	wangkefeng.wang@...wei.com,
	ziy@...dia.com,
	jglisse@...gle.com,
	surenb@...gle.com,
	vishal.moola@...il.com,
	zokeefe@...gle.com,
	zhengqi.arch@...edance.com,
	jhubbard@...dia.com,
	21cnbao@...il.com,
	willy@...radead.org,
	kirill.shutemov@...ux.intel.com,
	david@...hat.com,
	aarcange@...hat.com,
	raquini@...hat.com,
	dev.jain@....com,
	sunnanyong@...wei.com,
	usamaarif642@...il.com,
	audra@...hat.com,
	akpm@...ux-foundation.org
Subject: [RFC 00/11] khugepaged: mTHP support

The following series provides khugepaged and madvise collapse with the 
capability to collapse regions to mTHPs.

To achieve this we generalize the khugepaged functions to no longer depend
on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pages
(defined by MTHP_MIN_ORDER) that are fully utilized. This info is tracked
using a bitmap. After the PMD scan is done, we do binary recursion on the
bitmap to find the optimal mTHP sizes for the PMD range. The restriction
on max_ptes_none is removed during the scan, to make sure we account for
the whole PMD range. max_ptes_none is mapped to a 0-100 range to 
determine how full a mTHP order needs to be before collapsing it.

Some design choices to note: 
 - bitmap structures are allocated dynamically because on some arch's 
    (like PowerPC) the value of MTHP_BITMAP_SIZE cannot be computed at
    compile time leading to warnings.
 - The recursion is masked through a stack structure.
 - A MTHP_MIN_ORDER was added to compress the bitmap, and ensure it was
    64bit on x86. This provides some optimization on the bitmap operations.
    if other arches/configs that have larger than 512 PTEs per PMD want to 
    compress their bitmap further we can change this value per arch.

Patch 1-2:  Some refactoring to combine madvise_collapse and khugepaged
Patch 3:    A minor "fix"/optimization
Patch 4:    Refactor/rename hpage_collapse
Patch 5-7:  Generalize khugepaged functions for arbitrary orders
Patch 8-11: The mTHP patches

This series acts as an alternative to Dev Jain's approach [1]. The two 
series differ in a few ways:
  - My approach uses a bitmap to store the state of the linear scan_pmd to
    then determine potential mTHP batches. Devs incorporates his directly
    into the scan, and will try each available order. 
  - Dev is attempting to optimize the locking, while my approach keeps the
    locking changes to a minimum. I believe his changes are not safe for
    uffd.
  - Dev's changes only work for khugepaged not madvise_collapse (although
    i think that was by choice and it could easily support madvise)
  - Dev scales all khugepaged sysfs tunables by order, while im removing 
    the restriction of max_ptes_none and converting it to a scale to 
    determine a (m)THP threshold.
  - Dev turns on khugepaged if any order is available while mine still 
    only runs if PMDs are enabled. I like Dev's approach and will most
    likely do the same in my PATCH posting.
  - mTHPs need their ref count updated to 1<<order, which Dev is missing.

Patch 11 was inspired by one of Dev's changes.

[1] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/

Nico Pache (11):
  introduce khugepaged_collapse_single_pmd to collapse a single pmd
  khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot
  khugepaged: Don't allocate khugepaged mm_slot early
  khugepaged: rename hpage_collapse_* to khugepaged_*
  khugepaged: generalize hugepage_vma_revalidate for mTHP support
  khugepaged: generalize alloc_charge_folio for mTHP support
  khugepaged: generalize __collapse_huge_page_* for mTHP support
  khugepaged: introduce khugepaged_scan_bitmap for mTHP support
  khugepaged: add mTHP support
  khugepaged: remove max_ptes_none restriction on the pmd scan
  khugepaged: skip collapsing mTHP to smaller orders

 include/linux/khugepaged.h |   4 +-
 mm/huge_memory.c           |   3 +-
 mm/khugepaged.c            | 436 +++++++++++++++++++++++++------------
 3 files changed, 306 insertions(+), 137 deletions(-)

-- 
2.47.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ